Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-27386

WOPI pub/sub crashes in cluster with Redis

    XMLWordPrintable

    Details

      Description

      Seen in production site in cluster.
      Was not visible preproduction while it was not in cluster.

      redis.clients.jedis.exceptions.JedisDataException: ERR only (P)SUBSCRIBE / (P)UNSUBSCRIBE / PING / QUIT allowed in this context
      	at redis.clients.jedis.Protocol.processError(Protocol.java:127) ~[jedis-2.9.0.jar:?]
      	at redis.clients.jedis.Protocol.process(Protocol.java:161) ~[jedis-2.9.0.jar:?]
      	at redis.clients.jedis.Protocol.read(Protocol.java:215) ~[jedis-2.9.0.jar:?]
      	at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:340) ~[jedis-2.9.0.jar:?]
      	at redis.clients.jedis.Connection.getBinaryBulkReply(Connection.java:259) ~[jedis-2.9.0.jar:?]
      	at redis.clients.jedis.BinaryJedis.get(BinaryJedis.java:244) ~[jedis-2.9.0.jar:?]
      	at org.nuxeo.ecm.core.redis.contribs.RedisKeyValueStore.lambda$get$5(RedisKeyValueStore.java:163) ~[nuxeo-core-redis-10.10.jar:?]
      	at org.nuxeo.ecm.core.redis.RedisPoolExecutor.execute(RedisPoolExecutor.java:49) ~[nuxeo-runtime-redis-10.10.jar:?]
      	at org.nuxeo.ecm.core.redis.RedisFailoverExecutor$1.retry(RedisFailoverExecutor.java:62) ~[nuxeo-runtime-redis-10.10.jar:?]
      	at org.nuxeo.ecm.core.redis.retry.Retry.retry(Retry.java:64) ~[nuxeo-runtime-redis-10.10.jar:?]
      	at org.nuxeo.ecm.core.redis.RedisFailoverExecutor.executeWithRetryPolicy(RedisFailoverExecutor.java:57) ~[nuxeo-runtime-redis-10.10.jar:?]
      	at org.nuxeo.ecm.core.redis.RedisFailoverExecutor.execute(RedisFailoverExecutor.java:43) ~[nuxeo-runtime-redis-10.10.jar:?]
      	at org.nuxeo.ecm.core.redis.contribs.RedisKeyValueStore.get(RedisKeyValueStore.java:163) ~[nuxeo-core-redis-10.10.jar:?]
      	at org.nuxeo.wopi.WOPIServiceImpl.getDiscovery(WOPIServiceImpl.java:299) ~[nuxeo-wopi-10.10-HF01.jar:?]
      	at org.nuxeo.wopi.WOPIServiceImpl$WOPIDiscoveryInvalidator.receivedMessage(WOPIServiceImpl.java:330) ~[nuxeo-wopi-10.10-HF01.jar:?]
      	at org.nuxeo.wopi.WOPIServiceImpl$WOPIDiscoveryInvalidator.receivedMessage(WOPIServiceImpl.java:320) ~[nuxeo-wopi-10.10-HF01.jar:?]
      	at org.nuxeo.runtime.pubsub.AbstractPubSubBroker.subscriber(AbstractPubSubBroker.java:140) ~[nuxeo-runtime-pubsub-10.10.jar:?]
      	at org.nuxeo.runtime.pubsub.AbstractPubSubProvider.localPublish(AbstractPubSubProvider.java:63) ~[nuxeo-runtime-pubsub-10.10.jar:?]
      	at org.nuxeo.ecm.core.redis.contribs.RedisPubSubProvider$Dispatcher.onMessage(RedisPubSubProvider.java:168) ~[nuxeo-core-redis-10.10.jar:?]
      	at org.nuxeo.ecm.core.redis.contribs.RedisPubSubProvider$Dispatcher.onPMessage(RedisPubSubProvider.java:172) ~[nuxeo-core-redis-10.10.jar:?]
      	at org.nuxeo.ecm.core.redis.contribs.RedisPubSubProvider$Dispatcher.processBinary(RedisPubSubProvider.java:220) ~[nuxeo-core-redis-10.10.jar:?]
      	at org.nuxeo.ecm.core.redis.contribs.RedisPubSubProvider$Dispatcher.proceedWithPatterns(RedisPubSubProvider.java:186) ~[nuxeo-core-redis-10.10.jar:?]
      	at redis.clients.jedis.Jedis.psubscribe(Jedis.java:2697) ~[jedis-2.9.0.jar:?]
      	at org.nuxeo.ecm.core.redis.RedisExecutor.lambda$psubscribe$1(RedisExecutor.java:87) ~[nuxeo-runtime-redis-10.10.jar:?]
      	at org.nuxeo.ecm.core.redis.RedisPoolExecutor.execute(RedisPoolExecutor.java:63) ~[nuxeo-runtime-redis-10.10.jar:?]
      	at org.nuxeo.ecm.core.redis.RedisFailoverExecutor$1.retry(RedisFailoverExecutor.java:62) ~[nuxeo-runtime-redis-10.10.jar:?]
      	at org.nuxeo.ecm.core.redis.retry.Retry.retry(Retry.java:64) ~[nuxeo-runtime-redis-10.10.jar:?]
      	at org.nuxeo.ecm.core.redis.RedisFailoverExecutor.executeWithRetryPolicy(RedisFailoverExecutor.java:57) ~[nuxeo-runtime-redis-10.10.jar:?]
      	at org.nuxeo.ecm.core.redis.RedisFailoverExecutor.execute(RedisFailoverExecutor.java:43) ~[nuxeo-runtime-redis-10.10.jar:?]
      	at org.nuxeo.ecm.core.redis.RedisExecutor.psubscribe(RedisExecutor.java:86) ~[nuxeo-runtime-redis-10.10.jar:?]
      	at org.nuxeo.ecm.core.redis.contribs.RedisPubSubProvider$Dispatcher.run(RedisPubSubProvider.java:140) ~[nuxeo-core-redis-10.10.jar:?]
      	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
      

      Slack discussion:

      [antoine]
      seems like this error happens when the same Redis connection is used for publishing and subsribing

      [bdelbosc]
      yes this could happen if you access redis from the subscriber thread, this is what shows the stack I think, the subscriber thread is accessing the kv store (edited)

      [bdelbosc]
      and it is possible that the pb is random if the jedis pool returns a different connection then it works (edited)

      [Florent Guillaume]
      Yeah but I don't understand how this could be new, this code hasn't changed in years

      [Florent Guillaume]
      The connection we use for the subscribe() shouldn't be reused at all, it's kept open

      [thomas]

      the subscriber thread is accessing the kv store

      In the WOPIServiceImpl#start we are registering the WOPIDiscoveryInvalidator
      https://github.com/nuxeo/nuxeo/blob/master/addons/nuxeo-wopi/nuxeo-wopi/src/main/java/org/nuxeo/wopi/WOPIServiceImpl.java#L98
      and then doing a call on the kv (on the same thread so…)
      https://github.com/nuxeo/nuxeo/blob/master/addons/nuxeo-wopi/nuxeo-wopi/src/main/java/org/nuxeo/wopi/WOPIServiceImpl.java#L284
      Could that explain the issue here?

      [Florent Guillaume]
      Yes that's it, RedisPoolExecutor has a thread-local and will reuse the same connection in the same thread...

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0 minutes
                0m
                Logged:
                Time Spent - 1 hour
                1h