Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-31826

Bulk Index Action should terminate properly on malformed input

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2023.0, 2021.37
    • Component/s: Bulk, Elasticsearch

      Description

      When using bulk index action, in case of mapping errors because of malformed input (trying to index an invalid date, number, geoloc, ...) this creates a systematic failure but the bulk/bulkIndex computation terminates inconsistently.
      A stream failure is expected on such case, nevertheless the bulk command might show a completed status without error which is not acceptable. This means that some records are checkpointed while they contains the cause of the error, these record shouldn't be marked as processed.

      The reason is that errors happen during BulkIndexComputation#processTimer using the async elastic bulkProcessor (and not #processRecord), therefor errors are not reported into the bulk command status, also an abort mechanism is triggered which skip further processing of records but this can be too late, the bulk command might complete without error (or lag), especially when there are only few records so they are processed before the first timer is triggered.

      Here an example when setting an incorrect mapping for the field `ecm:pos` (mapping pretends it should be a boolean instead of an integer)

      2023-04-05T13:35:23,504 WARN  [BulkIndexComputation] Failure in bulk indexing: OpenSearchException[OpenSearch exception [type=mapper_parsing_exception, reason=failed to parse field [ecm:pos] of type [boolean] in document with id '42a55246
      -d7b0-4c52-8269-504cbc08b43c'. Preview of field's value: '5']]; nested: OpenSearchException[OpenSearch exception [type=x_content_parse_exception, reason=[1:831] Current token (VALUE_NUMBER_INT) not of boolean type
       at [Source: (byte[]) ...
      ...
      2023-04-05T13:35:23,576 ERROR [BulkIndexComputation] Elasticsearch bulk 1 returns with failures: failure in bulk execution:
      [0]: index [nuxeo], type [_doc], id [42a55246-d7b0-4c52-8269-504cbc08b43c], ...
      ...
      
      2023-04-05T13:35:29,667 WARN  [AbstractComputation] Computation: bulk/bulkIndex fails last record: bulk-bulkIndex-06:+0, retrying ...
      org.nuxeo.ecm.core.api.NuxeoException: Terminate computation due to previous error
              at org.nuxeo.elasticsearch.bulk.BulkIndexComputation.processTimer(BulkIndexComputation.java:112) ~[nuxeo-elasticsearch-core-2023.0-SNAPSHOT.jar:?]
      
      ... 20 retries ...
      
      2023-04-05T13:50:31,700 ERROR [ComputationRunner] bulk/bulkIndex: Terminate computation due to previous failure
      
      

      Proposition

      In case of malformed input or unexpected indexing error, the computation should terminate and the record containing the error should not be checkpointed.
      The bulk command status should stay in a running state because it cannot complete.

      The malformed input must be resolved:

      • The recommended way is to update the mapping to ignore malformed input for the reported field, this is described here:
        https://www.elastic.co/guide/en/elasticsearch/reference/7.17/ignore-malformed.html
        Once the mapping is updated, restarting worker nodes should be enough to complete the bulk command without error.
        Note that if the mapping update is done directly on elastic, the new mapping should also be updated in the application to avoid the problem on the next full repository re-indexing.
      • Another way to fix the problem is to fix the input at the UI or database level, in this case the indexing bulk command will never complete and the consumer position needs to be moved to the end of streams before restarting worker nodes.

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: