Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-27388

Record management - Turn off deduplication for records



    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: BlobManager, Retention



      SEC-17a-4 (17 CFR § 240.17a-4 - Records to be preserved by certain exchange members, brokers and dealers.) is a US regulatory related to the records preservation.

      The main areas are related to secured storage, retention management, change and deletion prevention, legal hold, and audit trail.



      For the record documents storage, we will use Amazon S3 capabilities with a bucket with the following parameters:

      • Versioning turned on
      • Compliance mode turned on
      • No default retention in the bucket (or default retention as 0)

      cf. https://github.com/awsdocs/amazon-s3-developer-guide/blob/master/doc_source/object-lock-overview.md

      cf. https://docs.aws.amazon.com/AmazonS3/latest/dev/object-lock.html


      User stories

      • As a broker dealer, I want to guarantee that a record is deleted once a granted user requested a deletion of the record, so that I am compliant with the legal regulation



      Once a document becomes a record, we propose that Nuxeo doesn't handle the deduplication anymore, for the following reasons:

      • Prevent to record several times the same document is the customer responsibility.
      • Handle the deduplication would require to handle the longest retention period (among the different documents referring to the same content) and automatically update the retain until date on S3 accordingly
      • This logic would involve that we can’t guarantee the deletion of a record in case of several documents refer to the same record with different expiration time, which would be complex to explain for the certification and later on to our prospects and customers



      • Generate a UID of the blob based on the md5, version ID (based on version series), timestamp
      • Add a configuration to turn-on / turn-off the deduplication


      Acceptance criteria

      • When I create 2 documents with the exact same content file (with same md5 digest), there are 2 different blobs stored on S3



          Issue Links



              • Assignee:
                jaubenque Julien Aubenque
                jaubenque Julien Aubenque
              • Votes:
                0 Vote for this issue
                1 Start watching this issue


                • Created: