Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-31248

Skip and log bad records in Elasticsearch.BulkIndex

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Won't Fix
    • Affects Version/s: 10.10
    • Fix Version/s: None
    • Component/s: Bulk, Elasticsearch
    • Backlog priority:
      800
    • Sprint:
      nxplatform #70
    • Story Points:
      5

      Description

      Currently, a single Document with corrupted data is enough to halt the Elasticsearch.BulkIndex operation, requiring manual recovery steps to proceed. Bad records should be skipped (with some basic information / UUID logged for reference) such that the majority of Documents can be re-indexed properly.

      Steps to Reproduce:

      1. Set up a Nuxeo instance with MongoDB backend, with with several Documents created and indexed (e.g. using the nuxeo-showcase-content addon)
      2. In Mongo, corrupt a schema property of a Document - for example, change value of dc:modified to a String type object.
      3. Attempt repository re-indexing using the Elasticsearch.BulkIndex operation.

      Expected behavior: indexing of Documents with bad data are skipped, with basic info / UUID logged for follow-up troubleshooting, allowing for the rest of the repository to be re-indexed.

      Actual behavior: computation failures from bad records prevent the rest of the operation from proceeding, resulting in large amounts of unindexed Documents and requiring manual recovery.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: