For now the bulk indexing command send to ES can be limited by a number of documents (elasticsearch.reindex.bucketWriteSize).
But the optimal payload in size is 5-15M.
The bulk command should be send once:
- a number of documents is reached elasticsearch.reindex.bucketWriteSize
- or a bulk size threshold is reached elasticsearch.index.bulkMaxSize
This will prevent to send too big indexing bulk command that overwhelm ES.
Other improvements (not covered in this ticket) could be:
- send bulk command if the time to build the bulk command is longer than a timeout, to prevent long running transaction
- reschedule a new job after this timeout to prevent blocking the indexing chain.
This requires a concrete case where ES indexing is a bottleneck before impl these last improvements.