Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-30928

DBS ReadACL propagation might be corrupted when distributed

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 10.10
    • Fix Version/s: 10.10-HF61, 2021.20
    • Component/s: Core DBS
    • Release Notes Summary:
      DBS ReadACL propagation is more robust.
    • Backlog priority:
      900
    • Sprint:
      nxplatform #58, nxplatform #59, nxplatform #60
    • Story Points:
      3

      Description

      When updating an ACL on a folderish document (with more than 500 docs) in a Nuxeo cluster,
      we might update wrongly the read ACL for children depending on the state of the folderish document in worker nodes caches.

       


      I've been able to identify a case where a work starts before all invalidations have been processed.

      Setup: 2 Nuxeo nodes (1 portal and 1 worker) with MongoDB and Kafka where nuxeo.work.queue.common.enabled is set to true only on the worker node

      Steps to observe the problem:

      1. on portal, update the permission on a container which has more than 500 children, ideally with a tree structure. In the example below, I grant the READ permission to "thierry"
      2. on worker, the read acls are propagated to all children thanks to FindReadAclsWork/UpdateReadAclsWork
      3. this sequence is observed in the logs where one of the first document updated with the propagated read acls does not contain "thierry" whereas the following document has the expected value for the read acls
        22:01:10,940 INFO [defaultPool-01] [DBSTransactionState] updateDocumentReadAclsNoCache: 50063a35-ad8f-46e6-b790-3083ffee2a93
        22:01:10,940 INFO [defaultPool-01] [DBSTransactionState]    -> getReadACL = [Administrator]
        22:01:10,941 INFO [defaultPool-01] [DBSTransactionState] updateDocumentReadAclsNoCache: 6ba7333f-0fc1-41a0-84dc-393e06d748fd
        22:01:10,942 INFO [defaultPool-01] [DBSTransactionState]    -> getReadACL = [Administrator, thierry]
        

         

      These logs correspond to messages I've added to track the problem.

      The full name of the thread is "defaultPool-01,in:7,inCheckpoint:7,out:0,lastRead:1646859595463,lastTimer:0,wm:215857180896133121,loop:24196,checkpoint:238056104979578.1268777355"

      Is it possible to be sure all invalidations are processed (or at least the ones including the documents involved in the process) before processing the works?

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  PagerDuty

                  Error rendering 'com.pagerduty.jira-server-plugin:PagerDuty'. Please contact your Jira administrators.