[NXP-25400] Chronicle Queue retention conflict with offset tracker - Nuxeo Issue Tracker

XML

Word

Printable

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Won't Fix
Affects Version/s: 10.2
Fix Version/s: 10.10
Component/s: Streams

Tags:
- nxcore
Story Points:
3

Description

It appears - maybe after CQ upgrade ~~NXP-25231~~ - that if a consumer doesn't commit its position there can be a conflict on start if the CQ retention purge some data.

When creating a consumer a tailer is created and it searches for the last committed position, this is done by reading an offset log in the backward direction,
because there is no committed position it reads all the records and if the purge has deleted the oldest cq4 file an error is raised.

Ex of traceback:

2018-07-12 10:30:36,961 ERROR [localhost-startStop-1] [org.nuxeo.osgi.OSGiAdapter] Error during Framework Listener execution : class org.nuxeo.runtime.osgi.OSGiRuntimeService
java.lang.IllegalStateException: Expected file to exist for cycle: 17720, file: /var/lib/nuxeo/stream/bulk/counter/offset-bulkCounter/20180708.cq4.
minCycle: 17721, maxCycle: 17724
Available files: [20180711.cq4, 20180710.cq4, 20180709.cq4, 20180712.cq4]
    at net.openhft.chronicle.queue.impl.single.SingleChronicleQueue$StoreSupplier.nextCycle(SingleChronicleQueue.java:935)
    at net.openhft.chronicle.queue.impl.WireStorePool.nextCycle(WireStorePool.java:107)
    at net.openhft.chronicle.queue.impl.single.SingleChronicleQueue.nextCycle(SingleChronicleQueue.java:432)
    at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.nextIndexWithNextAvailableCycle0(SingleChronicleQueueExcerpts.java:1278)
    at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.nextIndexWithNextAvailableCycle(SingleChronicleQueueExcerpts.java:1234)
    at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.beyondStartOfCycleBackward(SingleChronicleQueueExcerpts.java:1110)
    at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.beyondStartOfCycle(SingleChronicleQueueExcerpts.java:1068)
    at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.next0(SingleChronicleQueueExcerpts.java:1033)
    at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.readingDocument(SingleChronicleQueueExcerpts.java:956)
    at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.readingDocument(SingleChronicleQueueExcerpts.java:891)
    at net.openhft.chronicle.wire.MarshallableIn.readBytes(MarshallableIn.java:63)
    at org.nuxeo.lib.stream.log.chronicle.ChronicleLogOffsetTracker.readLastCommittedOffset(ChronicleLogOffsetTracker.java:128)
    at org.nuxeo.lib.stream.log.chronicle.ChronicleLogOffsetTracker.getLastCommittedOffset(ChronicleLogOffsetTracker.java:109)
    at org.nuxeo.lib.stream.log.chronicle.ChronicleLogTailer.toLastCommitted(ChronicleLogTailer.java:171)
    at org.nuxeo.lib.stream.log.chronicle.ChronicleLogTailer.<init>(ChronicleLogTailer.java:83)
    at org.nuxeo.lib.stream.log.chronicle.ChronicleLogAppender.createTailer(ChronicleLogAppender.java:207)
    at org.nuxeo.lib.stream.log.chronicle.ChronicleLogManager.lambda$doCreateTailer$3(ChronicleLogManager.java:208)
    at java.util.ArrayList.forEach(ArrayList.java:1257)
    at org.nuxeo.lib.stream.log.chronicle.ChronicleLogManager.doCreateTailer(ChronicleLogManager.java:207)
    at org.nuxeo.lib.stream.log.internals.AbstractLogManager.createTailer(AbstractLogManager.java:96)
    at org.nuxeo.lib.stream.computation.log.ComputationRunner.<init>(ComputationRunner.java:117)

Restarting the computation (nuxeo) will fix the pb because the purge has already been done.

But restarting the next day will raise the same pb.

Note that so far we don't have this case in Nuxeo,
the problem was visible in 10.2-SNAP because of an imcomplete implementation of BAF, that is now deactivated in 10.2.

Attachments

Issue Links

is related to

NXP-25388 Disable setProperties action in 10.2

Resolved

Activity

People

Assignee:

Unassigned

Reporter:

Benoit Delbosc

Participants:

Benoit Delbosc

Votes:

0 Vote for this issue

Watchers:

1 Start watching this issue

Dates

Created:

2018-07-13 08:54

Updated:

2018-12-03 11:05

Resolved:

2018-12-03 11:04

Time Tracking

Estimated:

Not Specified

Remaining:

Logged: