[NXP-23016] Fix infinite cross-instance cache invalidations in cluster mode - Nuxeo Issue Tracker

XML

Word

Printable

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: 9.3-SNAPSHOT
Fix Version/s: 9.3
Component/s: Cache, Redis

Tags:
Sprint:
nxcore 9.3.5
Story Points:
2

Description

In cluster mode, receiving an invalidation triggers a re-send of this invalidation to other nodes, creating an infinite loop:

2017-09-04 11:34:03,805 ERROR [Nuxeo-PubSub-Redis] [org.nuxeo.ecm.core.pubsub.AbstractPubSubProvider] Exception in subscriber for topic: cacheinval
redis.clients.jedis.exceptions.JedisDataException: ERR only (P)SUBSCRIBE / (P)UNSUBSCRIBE / QUIT allowed in this context
	at redis.clients.jedis.Protocol.processError(Protocol.java:127)
	at redis.clients.jedis.Protocol.process(Protocol.java:161)
	at redis.clients.jedis.Protocol.read(Protocol.java:215)
	at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:340)
	at redis.clients.jedis.Connection.getIntegerReply(Connection.java:265)
	at redis.clients.jedis.BinaryJedis.publish(BinaryJedis.java:3064)
	at org.nuxeo.ecm.core.redis.contribs.RedisPubSubProvider.lambda$publish$1(RedisPubSubProvider.java:255)
	at org.nuxeo.ecm.core.redis.RedisPoolExecutor.execute(RedisPoolExecutor.java:49)
	at org.nuxeo.ecm.core.redis.RedisFailoverExecutor$1.retry(RedisFailoverExecutor.java:62)
	at org.nuxeo.ecm.core.redis.retry.Retry.retry(Retry.java:61)
	at org.nuxeo.ecm.core.redis.RedisFailoverExecutor.executeWithRetryPolicy(RedisFailoverExecutor.java:57)
	at org.nuxeo.ecm.core.redis.RedisFailoverExecutor.execute(RedisFailoverExecutor.java:43)
	at org.nuxeo.ecm.core.redis.contribs.RedisPubSubProvider.publish(RedisPubSubProvider.java:255)
	at org.nuxeo.ecm.core.pubsub.PubSubServiceImpl.publish(PubSubServiceImpl.java:137)
	at org.nuxeo.ecm.core.pubsub.AbstractPubSubBroker.sendMessage(AbstractPubSubBroker.java:113)
	at org.nuxeo.ecm.core.cache.CacheServiceImpl$CachePubSubInvalidator.sendInvalidationsAll(CacheServiceImpl.java:190)
	at org.nuxeo.ecm.core.cache.CacheInvalidator.invalidateAll(CacheInvalidator.java:54)
	at org.nuxeo.ecm.core.cache.CacheServiceImpl$CachePubSubInvalidator.receivedMessage(CacheServiceImpl.java:199)
	at org.nuxeo.ecm.core.cache.CacheServiceImpl$CachePubSubInvalidator.receivedMessage(CacheServiceImpl.java:176)
	at org.nuxeo.ecm.core.pubsub.AbstractPubSubBroker.subscriber(AbstractPubSubBroker.java:140)
	at org.nuxeo.ecm.core.pubsub.AbstractPubSubProvider.localPublish(AbstractPubSubProvider.java:63)
	at org.nuxeo.ecm.core.redis.contribs.RedisPubSubProvider$Dispatcher.onMessage(RedisPubSubProvider.java:170)
	at org.nuxeo.ecm.core.redis.contribs.RedisPubSubProvider$Dispatcher.onPMessage(RedisPubSubProvider.java:174)
	at org.nuxeo.ecm.core.redis.contribs.RedisPubSubProvider$Dispatcher.processBinary(RedisPubSubProvider.java:222)
	at org.nuxeo.ecm.core.redis.contribs.RedisPubSubProvider$Dispatcher.proceedWithPatterns(RedisPubSubProvider.java:188)
	at redis.clients.jedis.Jedis.psubscribe(Jedis.java:2697)
	at org.nuxeo.ecm.core.redis.RedisExecutor.lambda$psubscribe$1(RedisExecutor.java:87)
	at org.nuxeo.ecm.core.redis.RedisPoolExecutor.execute(RedisPoolExecutor.java:63)
	at org.nuxeo.ecm.core.redis.RedisFailoverExecutor$1.retry(RedisFailoverExecutor.java:62)
	at org.nuxeo.ecm.core.redis.retry.Retry.retry(Retry.java:61)
	at org.nuxeo.ecm.core.redis.RedisFailoverExecutor.executeWithRetryPolicy(RedisFailoverExecutor.java:57)
	at org.nuxeo.ecm.core.redis.RedisFailoverExecutor.execute(RedisFailoverExecutor.java:43)
	at org.nuxeo.ecm.core.redis.RedisExecutor.psubscribe(RedisExecutor.java:86)
	at org.nuxeo.ecm.core.redis.contribs.RedisPubSubProvider$Dispatcher.run(RedisPubSubProvider.java:142)

The error seen here is due to the fact that during the receiveMessage part of the pub/sub mechanism in Jedis we re-use the same connection (in the same thread) to send the next (and incorrect) publish, but this connection is reserved to receiving pub/sub messages.

Attachments

Issue Links

depends on

NXP-22786 CacheService using local caching with distributed invalidations

Resolved

is related to

NXBT-2245 Investigate on REST read response time regression since 17w37

Resolved

Activity

People

Assignee:

Florent Guillaume

Reporter:

Florent Guillaume

Participants:

Florent Guillaume, Jenkins

Votes:

0 Vote for this issue

Watchers:

2 Start watching this issue

Dates

Created:

2017-09-04 16:21

Updated:

2018-08-16 15:59

Resolved:

2017-09-04 23:29

Time Tracking

Estimated:

Not Specified

Remaining:

Logged: