Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-28840

MongoDB ecm:id duplicate check



    • Type: Task
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 11.1, 2021.0
    • Component/s: Core MongoDB
    • Sprint:
      nxFG 11.1.13
    • Story Points:


      For Nuxeo MongoDB databases created before NXP-27654 it's been observed that some documents were created with identical document ids (ecm:id in MongoDB).

      We need some diagnostic scripts to solve this, and add the unique index when it's possible.

      The first script, mongodb-create-unique-index.js, should be run first. It will either tell you:

      1. that everything is ok.
      2. that there was no unique index but that the script created the index ok.
      3. that there are duplicate ids that must be further resolved by the second script.

      The second script, mongodb-find-duplicate-keys.js, will find duplicate keys and remove the ones that are safe (strictly identical documents otherwise). It will otherwise by default display the full documents where a duplicate key occurs, so that further manual action can be used to remove them (after choosing the relevant one) (this can be disabled if too verbose by editing the script to set var SHOW_UNRESOLVED = true).

      Note that this second script is in "dry run" mode by default (which can be used for diagnostics), it must be edited to set var DRY_RUN = false if actual changes are to be made.
      Example output for the first script:

      Using nuxeo.default
      Unique index on ecm:id already present
      Using nuxeo.default
      Starting scan for duplicate ids...
      Dropping previous index on ecm:id...
      Creating unique index on ecm:id...
      Using nuxeo.default
      Starting scan for duplicate ids...
      Collection has duplicates, the first one is ecm:id = 76fc611b-120b-454c-91f4-a3aaed5189b2
      Unique index not created

      Example output for the second script. Note that only documents with unresolved duplicate ids are shown, not all the duplicate ids:

      DRY RUN no modifications will be done
      Using nuxeo.default
      Collection has 10 documents
      Starting scan for duplicate ids...
      Showing unresolved duplicate ids
      ecm:id = 76fc611b-120b-454c-91f4-a3aaed5189b2
       "_id" : ObjectId("5e83a12e6ff3932209b7a086"),
       "ecm:id" : "76fc611b-120b-454c-91f4-a3aaed5189b2",
       "foo" : {
       "bar" : "baz"
       "_id" : ObjectId("5e83a12e6ff3932209b7a089"),
       "ecm:id" : "76fc611b-120b-454c-91f4-a3aaed5189b2",
       "foo" : {
       "bar" : "moo"
      Collection has 3 duplicate ids
      Collection has 10 documents impacted by duplicates
      Collection has 6 identical documents that were removed
      Collection has 2 resolved duplicate ids
      Collection has 1 unresolved duplicate ids

      Once this returns Collection has 0 unresolved duplicate ids then the database is in a state where the first script can be run again and will be able to create the unique index.


          Issue Links



              • Votes:
                0 Vote for this issue
                2 Start watching this issue


                • Created:

                  Time Tracking

                  Original Estimate - Not Specified
                  Not Specified
                  Remaining Estimate - 0 minutes
                  Time Spent - 4 hours