Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-24994

Don't crash Elasticsearch indexing when blob is missing

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 9.10
    • Fix Version/s: 9.10-HF11, 10.2
    • Component/s: Elasticsearch
    • Release Notes Summary:
      The missing blobs are ignored when running an Elasticsearch indexing.
    • Tags:
    • Backlog priority:
      800
    • Sprint:
      nxAI Sprint 10.2.6, nxAI Sprint 10.2.7
    • Story Points:
      3

      Description

      The most common failure is due to missing blobs like

      2018-04-19 16:52:55,756 ERROR [Nuxeo-Work-elasticSearchIndexing-9:1403013440772714.559777516] [org.nuxeo.ecm.core.work.AbstractWork] Exception during work: BucketIndexingWorker(333a3d49-719c-4ce9-8f6e-42c5e594d2a8..., /elasticSearchIndexing:1403012779396454.110412313, Progress(?%, ?/0), null)
      org.nuxeo.ecm.core.api.PropertyException: Cannot get blob info for: ee008588d6d4e088ee4ce541d89fea7a6
      	at org.nuxeo.ecm.core.storage.BaseDocument.getValueBlob(BaseDocument.java:484)
      	at org.nuxeo.ecm.core.storage.BaseDocument.readComplexProperty(BaseDocument.java:666)
      	at org.nuxeo.ecm.core.storage.BaseDocument.readComplexProperty(BaseDocument.java:681)
      	at org.nuxeo.ecm.core.storage.sql.coremodel.SQLDocumentLive.readDocumentPart(SQLDocumentLive.java:172)
      	at org.nuxeo.ecm.core.api.DocumentModelFactory.createDataModel(DocumentModelFactory.java:209)
      	at org.nuxeo.ecm.core.api.AbstractSession.getDataModel(AbstractSession.java:2007)
      	at org.nuxeo.ecm.core.api.impl.DocumentModelImpl.loadDataModel(DocumentModelImpl.java:438)
      	at org.nuxeo.ecm.core.api.impl.DocumentModelImpl.getDataModel(DocumentModelImpl.java:448)
      	at org.nuxeo.ecm.core.api.impl.DocumentModelImpl.getPart(DocumentModelImpl.java:1211)
      	at org.nuxeo.ecm.core.api.impl.DocumentModelImpl.getPropertyObjects(DocumentModelImpl.java:1237)
      	at org.nuxeo.ecm.automation.jaxrs.io.documents.JsonESDocumentWriter.writeProperties(JsonESDocumentWriter.java:241)
      	at org.nuxeo.ecm.automation.jaxrs.io.documents.JsonESDocumentWriter.writeSchemas(JsonESDocumentWriter.java:213)
      	at org.nuxeo.ecm.automation.jaxrs.io.documents.JsonESDocumentWriter.writeDoc(JsonESDocumentWriter.java:109)
      	at org.nuxeo.ecm.automation.jaxrs.io.documents.JsonESDocumentWriter.writeESDocument(JsonESDocumentWriter.java:236)
      	at org.nuxeo.elasticsearch.core.ElasticSearchIndexingImpl.buildEsIndexingRequest(ElasticSearchIndexingImpl.java:411)
      	at org.nuxeo.elasticsearch.core.ElasticSearchIndexingImpl.processBulkIndexCommands(ElasticSearchIndexingImpl.java:176)
      	at org.nuxeo.elasticsearch.core.ElasticSearchIndexingImpl.indexNonRecursive(ElasticSearchIndexingImpl.java:145)
      	at org.nuxeo.elasticsearch.ElasticSearchComponent.indexNonRecursive(ElasticSearchComponent.java:405)
      	at org.nuxeo.elasticsearch.work.BucketIndexingWorker.doWork(BucketIndexingWorker.java:78)
      	at org.nuxeo.elasticsearch.work.BaseIndexingWorker.work(BaseIndexingWorker.java:48)
      	at org.nuxeo.ecm.core.work.AbstractWork.runWorkWithTransaction(AbstractWork.java:435)
      	at org.nuxeo.ecm.core.work.AbstractWork.run(AbstractWork.java:355)
      	at org.nuxeo.ecm.core.work.WorkHolder.run(WorkHolder.java:57)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.io.IOException: Unknown binary: ee008588d6d4e088ee4ce541d89fea7a6
      	at org.nuxeo.ecm.core.blob.binary.BinaryBlobProvider.readBlob(BinaryBlobProvider.java:100)
      	at org.nuxeo.ecm.core.blob.DocumentBlobManagerComponent.readBlob(DocumentBlobManagerComponent.java:132)
      	at org.nuxeo.ecm.core.storage.BaseDocument.getValueBlob(BaseDocument.java:482)
      	... 25 more
      

      This error will stop the indexing.

      Even if it is the symptom of inconsistent data, some users don't care and want that the indexing continues with the remaining document and finishes properly.

      Therefore it should be possible to:
      1) log the current failure and its cause
      2) continue the indexing

      As a side note, some failures are already handled:

      • missing document
      • incorrect indexing command

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 0 minutes
                0m
                Remaining:
                Remaining Estimate - 0 minutes
                0m
                Logged:
                Time Spent - 4 hours
                4h