Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-25496

MongoDB replication is missing some documents

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 9.10-HF13
    • Fix Version/s: 9.10-HF14, 10.3
    • Component/s: Replication

      Description

      On a DR test with 9k+ document, the secondary site receives only 1,5k documents.
      As the elasticsearch replication received 9k+ documents, it seems that the problem is on the source and that we miss documents.

      Here is the explanation of the problem:

      1. We are in a processTimer call, consuming MongoDB oplog.
      2. A Kafka rebalance happens
      3. A call to MongoDBComputation#init() happens
      4. Start timestamp of the query is updated by what has been already committed to Kafka
      5. processTimer ends by consuming more logs
      6. a new processTimer starts with the update query (TS > lastCommitedTimestamp) *but* the n-th page of that query
      7. we miss the n-th first pages

      Here is the interesting part of server.log

      2018-07-29 22:29:12,264 DEBUG [MongoDBComputation] ProduceRecord at : Timestamp{seconds=1532903352, inc=12}
      2018-07-29 22:29:12,267 DEBUG [MongoDBComputation] ProduceRecord at : Timestamp{seconds=1532903352, inc=13}
      2018-07-29 22:29:12,267 DEBUG [MongoDBComputation] Committing after 500 documents to replicate, page: 1
      2018-07-29 22:29:12,267 DEBUG [MongoDBComputation] CheckPoint at : Timestamp{seconds=1532903352, inc=13}
      -------------------- From Kafka -------------------------------------------------------------------------
      2018-07-29 22:29:13,406 INFO [GroupCoordinator 0]: Preparing to rebalance group nuxeo-mongodb-oplog with old generation 2 (__consumer_offsets-42) (kafka.coordinator.group.GroupCoordinator)
      ---------------------------------------------------------------------------------------------------------
      2018-07-29 22:29:16,411 INFO  [MongoDBComputation] Initializing MongoDBComputation
      2018-07-29 22:29:16,411 INFO  [MongoDBComputation] Fetching MonogDB start timestamp
      2018-07-29 22:29:17,629 INFO  [MongoDBComputation] Fetched MonogDB start timestamp: Timestamp{seconds=1532903352, inc=13}
      
                  ^
                  |  1000 lost records = 2 pages of batchSize
                  v
      
      
      2018-07-29 22:29:43,722 DEBUG [MongoDBComputation] ProduceRecord at : Timestamp{seconds=1532903383, inc=97}
      2018-07-29 22:29:43,743 DEBUG [MongoDBComputation] ProduceRecord at : Timestamp{seconds=1532903383, inc=98}
      2018-07-29 22:29:43,811 DEBUG [MongoDBComputation] ProduceRecord at : Timestamp{seconds=1532903383, inc=99}
      

      Fix is to init the page number at Computation init.

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: