Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-31730

Ensure file existence when using batch upload in cluster mode and Cloud storage

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2023.0, 2021.35
    • Component/s: File Upload
    • Release Notes Summary:
      CachingBlobStore now asserts that a blob exists in the wrapped blob store on wirte
    • Backlog priority:
      900
    • Sprint:
      nxplatform #83
    • Story Points:
      5

      Description

      This is the best scenario I was able to identify:

      1. create a batch batch-01 on node1
      2. upload fileA via batch-01 to transient store from node2
        => fileA is uploaded to BatchManagerCache store in GCS and stored on node2 in the local nxbin cache
      3. use batch-01 to create a File with fileA on node1
        => fileA is read from BatchManagerCache store in GCS, then stored on node1 in the local nxbin cache and in the final storage on GCS
      4. create a new batch batch-02 on node2
      5. TransientStore GC is run on node2 (this is random : the scheduler triggers an event which is caught by one of the nodes but here we force the GC on node2)
        => fileA is removed from BatchManagerCache store on GCS and in the local nxbin cache on node2
      6. upload fileA via batch-02 to transient store from node1
        => fileA is read in local nxbin cache of node2 (but missing in BatchManagerCache store on GCS
      7. use batch-02 to create a File with fileA on node2
        => the file is not in the local cache and not found in BatchManagerCache store on GCS (because of the CG)
        => java.lang.NullPointerException: null at org.nuxeo.ecm.core.blob.binary.CachingBinaryManager.getBinary(CachingBinaryManager.java:141) ~[nuxeo-core-api-2021.33.9.jar:?]

      Traces:

      • node 1
        2023-02-24T19:59:25,899 WARN  [http-nio-0.0.0.0-8080-exec-9] [org.nuxeo.ecm.core.blob.binary.CachingBinaryManager] Getting file bf81f48be077ac51b1e2bc024d9397a2
        2023-02-24T19:59:26,017 WARN  [http-nio-0.0.0.0-8080-exec-9] [org.nuxeo.ecm.core.blob.binary.CachingBinaryManager] => File bf81f48be077ac51b1e2bc024d9397a2 retrieved from fileStorage org.nuxeo.ecm.core.storage.gcp.GoogleStorageBinaryManager$GCPFileStorage : /opt/nuxeo-server-tomcat-2021.1-GCS/tmp/nxbincache.10716387506675255588/bf81f48be077ac51b1e2bc024d9397a2
        2023-02-24T19:59:26,018 DEBUG [http-nio-0.0.0.0-8080-exec-9] [org.nuxeo.ecm.core.storage.gcp.GoogleStorageBinaryManager] Storing blob with digest: bf81f48be077ac51b1e2bc024d9397a2 to GCS
        2023-02-24T19:59:26,241 DEBUG [http-nio-0.0.0.0-8080-exec-9] [org.nuxeo.ecm.core.storage.gcp.GoogleStorageBinaryManager] Stored blob with digest: bf81f48be077ac51b1e2bc024d9397a2 to GCS in 222ms
        2023-02-24T19:59:26,332 WARN  [Nuxeo-Work-updateThumbListener-2:29459507860953.1879359689] [org.nuxeo.ecm.core.blob.binary.CachingBinaryManager] Getting file bf81f48be077ac51b1e2bc024d9397a2
        2023-02-24T19:59:26,332 WARN  [Nuxeo-Work-updateThumbListener-2:29459507860953.1879359689] [org.nuxeo.ecm.core.blob.binary.CachingBinaryManager] => File bf81f48be077ac51b1e2bc024d9397a2 retrieved from cache : /opt/nuxeo-server-tomcat-2021.1-GCS/tmp/nxbincache.8733906533136329591/bf81f48be077ac51b1e2bc024d9397a2
        
        2023-02-24T20:00:08,869 WARN  [http-nio-0.0.0.0-8080-exec-1] [org.nuxeo.ecm.core.blob.binary.CachingBinaryManager] Getting file bf81f48be077ac51b1e2bc024d9397a2
        2023-02-24T20:00:08,869 WARN  [http-nio-0.0.0.0-8080-exec-1] [org.nuxeo.ecm.core.blob.binary.CachingBinaryManager] => File bf81f48be077ac51b1e2bc024d9397a2 retrieved from cache : /opt/nuxeo-server-tomcat-2021.1-GCS/tmp/nxbincache.8733906533136329591/bf81f48be077ac51b1e2bc024d9397a2
        
      • node 2
        2023-02-24T19:56:17,290 DEBUG [http-nio-0.0.0.0-9080-exec-1] [org.nuxeo.ecm.core.storage.gcp.GoogleStorageBinaryManager] Storing blob with digest: bf81f48be077ac51b1e2bc024d9397a2 to GCS
        2023-02-24T19:56:17,572 DEBUG [http-nio-0.0.0.0-9080-exec-1] [org.nuxeo.ecm.core.storage.gcp.GoogleStorageBinaryManager] Stored blob with digest: bf81f48be077ac51b1e2bc024d9397a2 to GCS in 281ms
        
        
        2023-02-24T20:00:00,297 DEBUG [Nuxeo-Work-transientStorageGC-1:BatchManagerCache] [org.nuxeo.ecm.core.transientstore.keyvalueblob.KeyValueBlobTransientStore] GC completed for BatchManagerCache: BinaryManagerStatus [gcDuration=88, numBinaries=1, sizeBinaries=171868, numBinariesGC=2, sizeBinariesGC=107063]
        
        2023-02-24T20:03:07,126 WARN  [http-nio-0.0.0.0-9080-exec-3] [org.nuxeo.ecm.core.blob.binary.CachingBinaryManager] Getting file bf81f48be077ac51b1e2bc024d9397a2
        2023-02-24T20:03:07,151 WARN  [http-nio-0.0.0.0-9080-exec-3] [org.nuxeo.ecm.core.blob.binary.CachingBinaryManager] => File bf81f48be077ac51b1e2bc024d9397a2 not in storage
        2023-02-24T20:03:07,152 ERROR [WebEngineExceptionMapper] java.lang.NullPointerException
        java.lang.NullPointerException: null
        

        Attachments

          Issue Links

            Activity

            Hide
            tmartins Thierry Martins added a comment -

            Patch I used to identify the issue and verify that the missing file in the transient store was the root cause of the NPE: NXP-31730.diff

            Show
            tmartins Thierry Martins added a comment - Patch I used to identify the issue and verify that the missing file in the transient store was the root cause of the NPE: NXP-31730.diff
            Hide
            bdelbosc Benoit Delbosc added a comment - - edited

            Ok to check if blob exist in target storage while uploading the file in the transient store.
            It needs to be done on CachingBlobStore, check also the new exists method may not work with different key strategy.
            We will take care of merging the PR.

            Show
            bdelbosc Benoit Delbosc added a comment - - edited Ok to check if blob exist in target storage while uploading the file in the transient store. It needs to be done on CachingBlobStore, check also the new exists method may not work with different key strategy. We will take care of merging the PR.
            Hide
            grenard Guillaume Renard added a comment - - edited

            We also suspect that NXP-31594 highlights a deficient cache invalidation in cluster mode for org.nuxeo.ecm.core.blob.CachingBlobStore.

            Consider a deployment with a worker node and a front node:

            • a user creates a document with a given blob
            • the given blob is cached in the front node
            • the user deletes the document, the given blob is removed from the s3 blob store by the worker node with NXP-31594
            • the user creates another document with the same given blob
            • the front node still has the given blob in cache and believes it is also stored in the s3 blob store and therefore the document is created without any error
            • the cache eventually expires
            • we end up with a document in which the attached blob does not exist and cannot be accessed (it is even more obvious when the direct download feature is enabled in Web UI, no need to wait for the cache to expire)
            Show
            grenard Guillaume Renard added a comment - - edited We also suspect that NXP-31594 highlights a deficient cache invalidation in cluster mode for org.nuxeo.ecm.core.blob.CachingBlobStore. Consider a deployment with a worker node and a front node: a user creates a document with a given blob the given blob is cached in the front node the user deletes the document, the given blob is removed from the s3 blob store by the worker node with NXP-31594 the user creates another document with the same given blob the front node still has the given blob in cache and believes it is also stored in the s3 blob store and therefore the document is created without any error the cache eventually expires we end up with a document in which the attached blob does not exist and cannot be accessed (it is even more obvious when the direct download feature is enabled in Web UI, no need to wait for the cache to expire)
            Hide
            hudson Jenkins added a comment -

            SUCCESS: Integrated in nuxeo » lts » nuxeo » 2023 #351
            NXP-31730: Make sure blob exists on write with CachingBlobStore wrapper (troger: 11400f58684e9d0b8e1fb62b21eb74ce39719716)
            NXP-31730: Make sure blob exists on write with CachingBinaryManager wrapper (troger: 1e46bbea9476d174dfe79c3c349b50d44ecae61b)

            Show
            hudson Jenkins added a comment - SUCCESS: Integrated in nuxeo » lts » nuxeo » 2023 #351 NXP-31730 : Make sure blob exists on write with CachingBlobStore wrapper (troger: 11400f58684e9d0b8e1fb62b21eb74ce39719716 ) NXP-31730 : Make sure blob exists on write with CachingBinaryManager wrapper (troger: 1e46bbea9476d174dfe79c3c349b50d44ecae61b )
            Hide
            hudson Jenkins added a comment -

            FAILURE: Integrated in nuxeo » lts » nuxeo » 2021 #612
            NXP-31730: Make sure blob exists on write with CachingBinaryManager wrapper (guirenard: e24d0ae9c75fe6197dd314bdeda0ec0c2c9061b5)
            NXP-31730: Make sure blob exists on write with CachingBlobStore wrapper (guirenard: 6b2d833fd46f5c6a1dfe925686256a5c24da9b1c)

            Show
            hudson Jenkins added a comment - FAILURE: Integrated in nuxeo » lts » nuxeo » 2021 #612 NXP-31730 : Make sure blob exists on write with CachingBinaryManager wrapper (guirenard: e24d0ae9c75fe6197dd314bdeda0ec0c2c9061b5 ) NXP-31730 : Make sure blob exists on write with CachingBlobStore wrapper (guirenard: 6b2d833fd46f5c6a1dfe925686256a5c24da9b1c )

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 0 minutes
                  0m
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 1 hour
                  1h