[NXP-26691] StreamWorkManager workaround for large work - Nuxeo Issue Tracker

XML

Word

Printable

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 9.10, 10.10
Fix Version/s: 9.10-HF30, 10.10-HF05, 11.1, 2021.0
Component/s: Events / Works

Release Notes Summary:
StreamWorkManager can manage large works.
Release Notes Description:
Hide

It is now possible to use the StreamWorkManager implementation with large Work that exceed 1MB when serialized. The value is stored outside of the stream, in an external storage. For now the possible storages are the KeyValue store and the Transient store.

Here are the nuxeo.conf options to use to activate this feature for the StreamWorkManager:

# Filter big work to be stored outside of the stream nuxeo.stream.work.computation.filter.enabled=true # Above this threshold in bytes the record value is stored outside of the stream nuxeo.stream.work.computation.filter.thresholdSize=1000000 nuxeo.stream.work.computation.filter.class=org.nuxeo.ecm.core.transientstore.computation.TransientStoreOverflowRecordFilter nuxeo.stream.work.computation.filter.storeName=default nuxeo.stream.work.computation.filter.storeKeyPrefix=bigRecord: # An alternative storage using the KeyValue store #nuxeo.stream.work.computation.filter.class=org.nuxeo.ecm.core.work.KeyValueStoreOverflowRecordFilter # TTL is only taken in account with the KV impl, for TS impl you need to configure TS garbage collector #nuxeo.stream.work.computation.filter.storeTTL=4d

When using the TransientStore its TTL (firstLevelTTL) need to be adapted so the record value is not garbage collected before the work has been processed.

Note that in Nuxeo 9.10 the nuxeo.stream.work.computation.filter.storeTTL option which is used by the KeyValue store implementation needs to be expressed in number of seconds, while in Nuxeo 10.10 and above it can be expressed using a duration string like "48h" or "4d".

Note also that this ability of using an external storage for large record value is not tied to the StreamWorkManager and can be used in any StreamProcessor.
Show
It is now possible to use the StreamWorkManager implementation with large Work that exceed 1MB when serialized. The value is stored outside of the stream, in an external storage. For now the possible storages are the KeyValue store and the Transient store. Here are the nuxeo.conf options to use to activate this feature for the StreamWorkManager: # Filter big work to be stored outside of the stream nuxeo.stream.work.computation.filter.enabled= true # Above this threshold in bytes the record value is stored outside of the stream nuxeo.stream.work.computation.filter.thresholdSize=1000000 nuxeo.stream.work.computation.filter.class=org.nuxeo.ecm.core.transientstore.computation.TransientStoreOverflowRecordFilter nuxeo.stream.work.computation.filter.storeName= default nuxeo.stream.work.computation.filter.storeKeyPrefix=bigRecord: # An alternative storage using the KeyValue store #nuxeo.stream.work.computation.filter.class=org.nuxeo.ecm.core.work.KeyValueStoreOverflowRecordFilter # TTL is only taken in account with the KV impl, for TS impl you need to configure TS garbage collector #nuxeo.stream.work.computation.filter.storeTTL=4d When using the TransientStore its TTL ( firstLevelTTL ) need to be adapted so the record value is not garbage collected before the work has been processed. Note that in Nuxeo 9.10 the nuxeo.stream.work.computation.filter.storeTTL option which is used by the KeyValue store implementation needs to be expressed in number of seconds, while in Nuxeo 10.10 and above it can be expressed using a duration string like "48h" or "4d". Note also that this ability of using an external storage for large record value is not tied to the StreamWorkManager and can be used in any StreamProcessor.
Tags:
- SupCom
- hfr
- nxcore
Backlog priority:
1,000
Upgrade notes:

Hide

The wrongly named class LogConfigDescriptor.StreamDescriptor has been renamed to LogConfigDescriptor.LogDescriptor.

Show
The wrongly named class LogConfigDescriptor.StreamDescriptor has been renamed to LogConfigDescriptor.LogDescriptor .
Sprint:
nxcore 11.1.2, nxcore 11.1.3, nxcore 11.1.4, nxcore 11.1.5, nxcore 11.1.6, nxcore 11.1.7
Story Points:
5

Description

When scheduling a Work with the StreamWorkManager the Work is serialized and sent into a Nuxeo stream Record.
The record has a maximum size limit that depends on the Nuxeo Stream implementation (1MB for Kafka, same for Chronicle Queue by default).
Even if this limit can be tuned at the backend level, it is not recommended to enable a very big record (bigger than ~10MB) for performance reason.

This means that we want to make sure that Works are always serialized with limited size.
Some change has already been made in ~~NXP-25716~~ but still, there are possible cases of overflow:

when the JSF UI Bulk File Import action is used with a large number of files. The AsyncEventExecutor attempting to create an instance of AsyncEventExecutor.ListenerWork with an event bundle containing a large number of events, the size of the work exceeds the default Kafka record size when importing 900 files:

[Transaction] Unexpected exception from afterCompletion; continuing
java.lang.RuntimeException: Unable to send record: ProducerRecord ...
    at org.nuxeo.lib.stream.log.kafka.KafkaLogAppender.append(KafkaLogAppender.java:130)
    at org.nuxeo.lib.stream.log.kafka.KafkaLogAppender.append(KafkaLogAppender.java:110)
    at org.nuxeo.ecm.core.work.StreamWorkManager.schedule(StreamWorkManager.java: 155)
    at org.nuxeo.ecm.core.work.WorkManagerImpl.schedule(WorkManagerImpl.java: 717)
    at org.nuxeo.ecm.core.event.impl.AsyncEventExecutor.scheduleListeners(AsyncEventExecutor.java:124)
    at org.nuxeo.ecm.core.event.impl.AsyncEventExecutor.run(AsyncEventExecutor.java:92)
    at org.nuxeo.ecm.core.event.impl.EventServiceImpl.fireEventBundle(EventServiceImpl.java:361)
    at org.nuxeo.ecm.core.event.impl.EventServiceImpl.handleTxCommited(EventServiceImpl.java:531)
    at org.nuxeo.ecm.core.event.impl.EventServiceImpl.afterCompletion(EventServiceImpl.java:512)
    at org.apache.geronimo.transaction.manager.TransactionImpl.afterCompletion(TransactionImpl.java:540)
    ...
Caused by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.RecordTooLargeException: The message is 3060227 bytes when serialized which is larger than the maximum requets size you have configured with the max.request.size configuration
    at org.apache.kafka.clients.producer.KafkaProducer$FutureFailuer<init>(KafkaProducer.java:1124)
    at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:823)
    ...

Any existing custom (non Nuxeo) Work may exceed this limit.

Because we want backward compatibility we should add a workaround to support this large Work.
In the case of record overflow:

we should warn that the serialized Work size is too big: either the work is serializing too many things by error, either it should be refactored to not pass blob or a large amount of data
fallback to a KV storage or transient store for the record value.

Again this is a workaround until works are fixed because having the record value outside of the record have consequences:

the log is not anymore the single source of trust
failover based on Log replication will depend on an additional storage replication
retention of the additional storage must match the Log retention
constant throughput will be impaired by mixing very different record sizes and access

Attachments

Issue Links

causes

NXP-27303 Overflow record filter should handle records with duplicate key

Resolved

NXP-27269 Fix configuration variables' names in StreamWorkManager

Resolved

depends on

NXP-27176 Backport improvement in Configuration Service

Resolved

is duplicated by

NXP-26987 Allow to configure stream producer

Resolved

is related to

NXP-30796 Avoid Record overflow during csvExport containing huge metadata

Resolved

NXP-32246 Enable large Work to be serialized by default

Resolved

(1 is related to)

Activity

People

Assignee:

Benoit Delbosc

Reporter:

Vincent Dutat

Participants:

Benoit Delbosc, Jenkins, Vincent Dutat

Votes:

0 Vote for this issue

Watchers:

4 Start watching this issue

Dates

Created:

2019-01-21 20:28

Updated:

2024-01-23 12:49

Resolved:

2019-04-15 10:08

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

1w 1d