Review the architecture of how processing on documents is performed so as to get the platform more robust to repository actions whatever number of documents is involved, and to allow asynchronous design for such actions. Some of the challenges are:
- ability to provide a completeness status
- manage the “ eventually consistent” aspect of such a design, especially so as to be able to alert a user that a given document might be affected by an asynchronous processing action
- Handle errors and error report
One constraint is that we want all this to happen at the repository level, i.e we want to make sure that anything happening on any document will correctly fire a related repository event. i.e we do not want to have things happening at the database level silently.
One of the expected gains is to be able to provide paralleled computing for those bulk changes, and, longer term to even provide required elasticity to Finnish a goal, depending on how many documents are concerned.
Typical processing that would benefit from such architectural change:
- ACL Updates
- Lifecycle changes
- Path changes (Move)
- Deletion of large amount of document (>100k docs)
- Applying an automation chain (or single operation) robustly across a large set of documents. (Setting a property (inheritance))
User stories all mostly focus on REST API interaction, but of course the Java API should provide the relevant signatures to manage this in all related services.
- Create the set of documents to put in a stream. You get a key, and you know when the set is built. It is then possible to know also the offset of the records that need to be processed.
- Then you define the action (to know if can run concurrently)
- Running a processor: set of computation, doing batching, reading doc, creating exports, ... . Done as a stream processor. Can handle fail over, can redistribute to other nodes, ... . Can have concurrent processors running in parallel.
- Provide computation statuses
- Job is done, can provide a specific status, specific ID stored in a cluster wide key value store, and is persistent.
Part 1, BAF 1: Generate a set of Documents for bulk -
As a user I would like to be able to create a document set so that I can be able to run a bulk action on it.
Definition of Done:
- Creation of BulkService
- Creation of REST API on top of service
- Service can consume NXQL queries
- A bulkActionId can be retrieved on bulk action (instance) creation
- The document set creation status can be checked (obtained from bulkActionId)
- I can make a REST request to create/initiate my DocumentSet
- I can make a REST request to check my DocumentSet initialisation status
Part 2 BAF 2: Execute a Bulk action on a document set -
As a user I would like to be able to execute a Bulk action on a document set.
Definition of Done:
- Bulk service consumes a BulkCommand containing necessary information to build the document set and containing the action to run
- The Bulk action status and progression can be checked
- I can make a REST request to run a Bulk action
- I can make a REST request to check a Bulk action status and progression
- I can make a REST request to pause or resume an existing/running Bulk action.