Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-31606

Fix cold storage over propagating restore

    XMLWordPrintable

    Details

    • Tags:
    • Team:
      PLATFORM
    • Sprint:
      nxplatform #79
    • Story Points:
      1

      Description

      The PropagateRestoreFromColdStorageContentAction BAF may produce some NPE in some race conditions:

      [2023-01-12T10:59:44.266Z] java.lang.NullPointerException: null
      [2023-01-12T10:59:44.266Z] 	at org.nuxeo.coldstorage.service.ColdStorageServiceImpl.getContentBlobKey(ColdStorageServiceImpl.java:564) ~[classes/:?]
      [2023-01-12T10:59:44.266Z] 	at org.nuxeo.coldstorage.service.ColdStorageServiceImpl.proceedRestoreMainContent(ColdStorageServiceImpl.java:435) ~[classes/:?]
      [2023-01-12T10:59:44.266Z] 	at org.nuxeo.coldstorage.action.PropagateRestoreFromColdStorageContentAction$PropagateRestoreFromColdStorageContentComputation.compute(PropagateRestoreFromColdStorageContentAction.java:85) ~[classes/:?]
      [2023-01-12T10:59:44.266Z] 	at org.nuxeo.ecm.core.bulk.action.computation.AbstractBulkComputation.lambda$processBatchOfDocuments$3(AbstractBulkComputation.java:147) ~[nuxeo-core-bulk-2021.21.5.jar:?]
      [2023-01-12T10:59:44.266Z] 	at org.nuxeo.runtime.transaction.TransactionHelper.lambda$runInTransaction$4(TransactionHelper.java:642) ~[nuxeo-runtime-jtajca-2021.21.5.jar:?]
      [2023-01-12T10:59:44.266Z] 	at org.nuxeo.runtime.transaction.TransactionHelper.runInTransaction(TransactionHelper.java:675) ~[nuxeo-runtime-jtajca-2021.21.5.jar:?]
      [2023-01-12T10:59:44.266Z] 	at org.nuxeo.runtime.transaction.TransactionHelper.runInTransaction(TransactionHelper.java:642) ~[nuxeo-runtime-jtajca-2021.21.5.jar:?]
      [2023-01-12T10:59:44.266Z] 	at org.nuxeo.ecm.core.bulk.action.computation.AbstractBulkComputation.processBatchOfDocuments(AbstractBulkComputation.java:141) ~[nuxeo-core-bulk-2021.21.5.jar:?]
      [2023-01-12T10:59:44.266Z] 	at org.nuxeo.ecm.core.bulk.action.computation.AbstractBulkComputation.processRecord(AbstractBulkComputation.java:102) ~[nuxeo-core-bulk-2021.21.5.jar:?]
      [2023-01-12T10:59:44.266Z] 	at org.nuxeo.lib.stream.computation.log.ComputationRunner.lambda$processRecordWithRetry$10(ComputationRunner.java:507) ~[nuxeo-stream-2021.21.5.jar:?]
      [2023-01-12T10:59:44.267Z] 	at net.jodah.failsafe.Functions$10.call(Functions.java:252) ~[failsafe-1.1.0.jar:1.1.0]
      [2023-01-12T10:59:44.267Z] 	at net.jodah.failsafe.SyncFailsafe.call(SyncFailsafe.java:145) [failsafe-1.1.0.jar:1.1.0]
      [2023-01-12T10:59:44.267Z] 	at net.jodah.failsafe.SyncFailsafe.run(SyncFailsafe.java:81) [failsafe-1.1.0.jar:1.1.0]
      [2023-01-12T10:59:44.267Z] 	at org.nuxeo.lib.stream.computation.log.ComputationRunner.processRecordWithRetry(ComputationRunner.java:507) [nuxeo-stream-2021.21.5.jar:?]
      [2023-01-12T10:59:44.267Z] 	at org.nuxeo.lib.stream.computation.log.ComputationRunner.processRecordWithTracing(ComputationRunner.java:458) [nuxeo-stream-2021.21.5.jar:?]
      [2023-01-12T10:59:44.267Z] 	at org.nuxeo.lib.stream.computation.log.ComputationRunner.processRecord(ComputationRunner.java:450) [nuxeo-stream-2021.21.5.jar:?]
      [2023-01-12T10:59:44.267Z] 	at org.nuxeo.lib.stream.computation.log.ComputationRunner.processLoop(ComputationRunner.java:308) [nuxeo-stream-2021.21.5.jar:?]
      [2023-01-12T10:59:44.267Z] 	at org.nuxeo.lib.stream.computation.log.ComputationRunner.runOnce(ComputationRunner.java:252) [nuxeo-stream-2021.21.5.jar:?]
      [2023-01-12T10:59:44.267Z] 	at org.nuxeo.lib.stream.computation.log.ComputationRunner.run(ComputationRunner.java:225) [nuxeo-stream-2021.21.5.jar:?]
      [2023-01-12T10:59:44.267Z] 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
      [2023-01-12T10:59:44.267Z] 	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
      [2023-01-12T10:59:44.267Z] 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
      [2023-01-12T10:59:44.267Z] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
      [2023-01-12T10:59:44.267Z] 	at java.lang.Thread.run(Thread.java:829) [?:?]
      

      which makes the TestDummyColdStorageService#shouldMoveManyDocsWithSameBlobToColdStorage test fail with:

      [ERROR] shouldMoveManyDocsWithSameBlobToColdStorage(org.nuxeo.coldstorage.service.TestDummyColdStorageService)  Time elapsed: 8.335 s  <<< FAILURE!
      [2023-01-12T11:00:08.249Z] java.lang.AssertionError
      [2023-01-12T11:00:08.249Z] 	at org.junit.Assert.fail(Assert.java:87)
      [2023-01-12T11:00:08.249Z] 	at org.junit.Assert.assertTrue(Assert.java:42)
      [2023-01-12T11:00:08.249Z] 	at org.junit.Assert.assertTrue(Assert.java:53)
      [2023-01-12T11:00:08.249Z] 	at org.nuxeo.coldstorage.service.TestDummyColdStorageService.shouldMoveManyDocsWithSameBlobToColdStorage(TestDummyColdStorageService.java:207)
      

      This is because the BAF propagating the restore (for documents referencing the same blob that has just been restored) will repropagate a restore uselessly and over populate the associated stream. In some rare race conditions we end up with the above NPE.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: