[NXP-31248] Skip and log bad records in Elasticsearch.BulkIndex - Nuxeo Issue Tracker

XML

Word

Printable

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Won't Fix
Affects Version/s: 10.10
Fix Version/s: None
Component/s: Bulk, Elasticsearch

Tags:
- SupCom
- nxplatform
Backlog priority:
800
Sprint:
nxplatform #70
Story Points:
5

Description

Currently, a single Document with corrupted data is enough to halt the Elasticsearch.BulkIndex operation, requiring manual recovery steps to proceed. Bad records should be skipped (with some basic information / UUID logged for reference) such that the majority of Documents can be re-indexed properly.

Steps to Reproduce:

Set up a Nuxeo instance with MongoDB backend, with with several Documents created and indexed (e.g. using the nuxeo-showcase-content addon)
In Mongo, corrupt a schema property of a Document - for example, change value of dc:modified to a String type object.
Attempt repository re-indexing using the Elasticsearch.BulkIndex operation.

Expected behavior: indexing of Documents with bad data are skipped, with basic info / UUID logged for follow-up troubleshooting, allowing for the rest of the repository to be re-indexed.

Actual behavior: computation failures from bad records prevent the rest of the operation from proceeding, resulting in large amounts of unindexed Documents and requiring manual recovery.

Attachments

Issue Links

is related to

NXP-31267 Make sure DocumentNotFound exception message always includes the doc id

Resolved

NXP-31268 Index bulk action should not terminate on corrupted documents

Resolved

Activity

People

Assignee:

Benoit Delbosc

Reporter:

Henry Miskaryan

Participants:

Benoit Delbosc, Henry Miskaryan

Votes:

0 Vote for this issue

Watchers:

3 Start watching this issue

Dates

Created:

2022-09-02 23:13

Updated:

2022-09-15 15:33

Resolved:

2022-09-15 15:02