-
Type: Bug
-
Status: Resolved
-
Priority: Major
-
Resolution: Fixed
-
Affects Version/s: 2021
-
Component/s: Core DBS
-
Release Notes Summary:Moving large folder is now more scalable and asynchronous
-
Tags:
-
Backlog priority:800
-
Sprint:nxplatform #89, nxplatform #90
-
Story Points:8
When moving a folder, the ecm:ancestorIds field must be updated for all descendants, today this is done atomically in sync before materializing the new read acl (updading all descendants ecm:racl) which is done asynchronously for sub-folder descendants.
Because read acl computation depends on ancestors, both cannot be run concurrently.
Also, it has been observed that the current implementation is limited by the number of docs being move (around 800k) because of query filter trying to ignore the documents ids manipulate in the current transaction (current implementation is loading all descendants), the mongodb query filter is bigger than the 16MB limit:
org.nuxeo.ecm.automation.OperationException Failed to invoke operation Document.Move BsonBinaryWriter.java#validateSize bson-4.7.2.jar org.bson.BsonMaximumSizeExceededException Document size of 18500467 is larger than maximum of 16793600. MongoDBConnection#stream:759 nuxeo-core-storage-mongodb-2021.39.2-PR-1214-BUILD-3.jar DBSSession#getVersionsIds DBSTransactionState#updateTreeReadAcls
Materialized fields could be merged (ancestors + read ACL) into one update to be done asynchronously for sub-folders, fixing both the sync latency and scaling limitation.
The move operation should be part of the continuous integration benchmark.
----------
A client reported slow processing when moving or copying folderish documents containing more than 100k objects. In a support team discussion it was noted that there is an asynchronous process for updating ACLs - we expect it would be feasible to use a similar technique for updating document ancestors.