Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-28840

MongoDB ecm:id duplicate check

    XMLWordPrintable

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 11.1, 2021.0
    • Component/s: Core MongoDB
    • Sprint:
      nxFG 11.1.13
    • Story Points:
      3

      Description

      For Nuxeo MongoDB databases created before NXP-27654 it's been observed that some documents were created with identical document ids (ecm:id in MongoDB).

      We need some diagnostic scripts to solve this, and add the unique index when it's possible.

      The first script, mongodb-create-unique-index.js, should be run first. It will either tell you:

      1. that everything is ok.
      2. that there was no unique index but that the script created the index ok.
      3. that there are duplicate ids that must be further resolved by the second script.

      The second script, mongodb-find-duplicate-keys.js, will find duplicate keys and remove the ones that are safe (strictly identical documents otherwise). It will otherwise by default display the full documents where a duplicate key occurs, so that further manual action can be used to remove them (after choosing the relevant one) (this can be disabled if too verbose by editing the script to set var SHOW_UNRESOLVED = true).

      Note that this second script is in "dry run" mode by default (which can be used for diagnostics), it must be edited to set var DRY_RUN = false if actual changes are to be made.
      Example output for the first script:

      Using nuxeo.default
      Unique index on ecm:id already present
      
      Using nuxeo.default
      Starting scan for duplicate ids...
      Dropping previous index on ecm:id...
      Done
      Creating unique index on ecm:id...
      Done
      
      Using nuxeo.default
      Starting scan for duplicate ids...
      Collection has duplicates, the first one is ecm:id = 76fc611b-120b-454c-91f4-a3aaed5189b2
      Unique index not created
      

      Example output for the second script. Note that only documents with unresolved duplicate ids are shown, not all the duplicate ids:

      DRY RUN no modifications will be done
      Using nuxeo.default
      Collection has 10 documents
      Starting scan for duplicate ids...
      Showing unresolved duplicate ids
      
      ecm:id = 76fc611b-120b-454c-91f4-a3aaed5189b2
      {
       "_id" : ObjectId("5e83a12e6ff3932209b7a086"),
       "ecm:id" : "76fc611b-120b-454c-91f4-a3aaed5189b2",
       "foo" : {
       "bar" : "baz"
       }
      }
      {
       "_id" : ObjectId("5e83a12e6ff3932209b7a089"),
       "ecm:id" : "76fc611b-120b-454c-91f4-a3aaed5189b2",
       "foo" : {
       "bar" : "moo"
       }
      }
      Collection has 3 duplicate ids
      Collection has 10 documents impacted by duplicates
      Collection has 6 identical documents that were removed
      Collection has 2 resolved duplicate ids
      Collection has 1 unresolved duplicate ids
      

      Once this returns Collection has 0 unresolved duplicate ids then the database is in a state where the first script can be run again and will be able to create the unique index.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 4 hours
                  4h