Currently, a single Document with corrupted data is enough to halt the Elasticsearch.BulkIndex operation, requiring manual recovery steps to proceed. Bad records should be skipped (with some basic information / UUID logged for reference) such that the majority of Documents can be re-indexed properly.
- Set up a Nuxeo instance with MongoDB backend, with with several Documents created and indexed (e.g. using the nuxeo-showcase-content addon)
- In Mongo, corrupt a schema property of a Document - for example, change value of dc:modified to a String type object.
- Attempt repository re-indexing using the Elasticsearch.BulkIndex operation.
indexing of Documents with bad data are skipped, with basic info / UUID logged for follow-up troubleshooting, allowing for the rest of the repository to be re-indexed.
computation failures from bad records prevent the rest of the operation from proceeding, resulting in large amounts of unindexed Documents and requiring manual recovery.