Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-32338

Full blob GC in dryRun should trace samples of blobs to remove

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2021.50, 2023.8
    • Component/s: BlobManager
    • Release Notes Summary:
      A new "nuxeo.bulk.action.garbageCollectOrphanBlobs.sample.modulo" property to trace samples of blobs to remove
    • Release Notes Description:
      Hide

      While running a Full Blob GC in "dryRun" mode, with the following property:

      nuxeo.bulk.action.garbageCollectOrphanBlobs.sample.modulo=1000
      

      every 1000th blob deletion will be logged at WARN level with such message:

      dryRun sample: GC would have deleted blob: b967e77cce9e0582af118bfb467fbab9 of size 12 bytes.
      

      You can assert the blob can effectively be deleted in MongoDB with such a query:

      db.default.count({"ecm:blobKeys": "b967e77cce9e0582af118bfb467fbab9"})
      

      should return 0.
      Assuming, the blob was referenced by the "file:content" document property, such MongoDB query

      db.default.count({"content.data": "b967e77cce9e0582af118bfb467fbab9"})
      

      should return 0.

      Assuming the "file:content" is used as default document content, if you want to assert that the "ecm:blobKeys" migration went through without mistake, the MongoDB query:

      db.default.count({"ecm:blobKeys": {$exists: false}, "content.data": {$exists: true}})
      

      should return 0.

      Show
      While running a Full Blob GC in "dryRun" mode, with the following property: nuxeo.bulk.action.garbageCollectOrphanBlobs.sample.modulo=1000 every 1000th blob deletion will be logged at WARN level with such message: dryRun sample: GC would have deleted blob: b967e77cce9e0582af118bfb467fbab9 of size 12 bytes. You can assert the blob can effectively be deleted in MongoDB with such a query: db. default .count({ "ecm:blobKeys" : "b967e77cce9e0582af118bfb467fbab9" }) should return 0. Assuming, the blob was referenced by the "file:content" document property, such MongoDB query db. default .count({ "content.data" : "b967e77cce9e0582af118bfb467fbab9" }) should return 0. Assuming the "file:content" is used as default document content, if you want to assert that the "ecm:blobKeys" migration went through without mistake, the MongoDB query: db. default .count({ "ecm:blobKeys" : {$exists: false }, "content.data" : {$exists: true }}) should return 0.
    • Team:
      PLATFORM
    • Sprint:
      nxplatform #107
    • Story Points:
      2

      Description

      Today, the result of a Full blob GC in dryRun mode is just an aggregation of blobs and sizes.

      It's difficult to blindly decide to run the GC for real if the number of deletions is very high.

      The dryRun mode could output few blob identifiers that will be deleted, so we can manually check and make sure it's correct.

      A possible implementation is to trace at warn level instead of debug every 1000 items a blob id in GarbageCollectOrphanBlobsComputation#compute.

      Also, in this ticket we should provide some mongo db queries to assert that everything is correct:

      • make sure blob keys are up to date:
        -- should not return anything
        db.default.findOne({"ecm:blobKeys": {$exists: false}, "content.data": {$exists: true}})
        
      • make sure a blob traced for deletion is not referenced:
        ...

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: