On DBS, when an ACL is updated for a doc a FindReadAclsWork Work is scheduled.
The FindReadAclsWork scrolls all the children of the doc, using a batch of 500 and scroll timeout of 1min
- for each batch, it schedules a UpdateReadAclsWork with the 500 doc ids
- the transaction is committed and started
The UpdateReadAclsWork updates ACL of the docs with a batch of 50 docs in 10 transactions.
All works are scheduled on the common queue with a retry of 1.
Problems with a large repository:
- FindReadAclsWork potentially can match the entire repository and takes hours to complete, this is going to block other Works in the common queue.
- FindReadAclsWork can fail during scroll because of MongoDBSocketTimeout or because the leader has changed which interrupts the query with a MongoQueryException. In this case, the retry will start the process again submitting duplicate UpdateReadAclsWork.
- There is no way other than introspecting the common Work queue and doing thread dump to understand what is going on in this massive processing.
- There is no status of the action, any Works involved can end up in the DLQ after a retry resulting in a partial ACL propagation.
- The update of ACL could trigger other listeners ((not confirmed on local test with a stock Nuxeo but probably on prod),
- This generates lots of cache invalidations (every 50 docs) loading the pub-sub topic
Also, event if tuned changing root ACL means touching all docs and reindexing *this will always be a heavy process that should be avoided on large repos*,
the project should be designed with role groups (like MemberRead MemberWrite ...) set at root levels from the beginning, you give access to user or group manipulating only the groups/user directories without having to update ACL.
Need to be checked but the indexing could be duplicated, the change at the root level trigger an indexing scroller that is going to update all children, it seems that when children's ACL is updated they are also reindexed.