Affects Version/s: 10.10
Component/s: Core DBS
When updating an ACL on a folderish document (with more than 500 docs) in a Nuxeo cluster,
we might update wrongly the read ACL for children depending on the state of the folderish document in worker nodes caches.
I've been able to identify a case where a work starts before all invalidations have been processed.
Setup: 2 Nuxeo nodes (1 portal and 1 worker) with MongoDB and Kafka where nuxeo.work.queue.common.enabled is set to true only on the worker node
Steps to observe the problem:
- on portal, update the permission on a container which has more than 500 children, ideally with a tree structure. In the example below, I grant the READ permission to "thierry"
- on worker, the read acls are propagated to all children thanks to FindReadAclsWork/UpdateReadAclsWork
- this sequence is observed in the logs where one of the first document updated with the propagated read acls does not contain "thierry" whereas the following document has the expected value for the read acls
These logs correspond to messages I've added to track the problem.
The full name of the thread is "defaultPool-01,in:7,inCheckpoint:7,out:0,lastRead:1646859595463,lastTimer:0,wm:215857180896133121,loop:24196,checkpoint:238056104979578.1268777355"
Is it possible to be sure all invalidations are processed (or at least the ones including the documents involved in the process) before processing the works?