It appears - maybe after CQ upgrade NXP-25231 - that if a consumer doesn't commit its position there can be a conflict on start if the CQ retention purge some data.
When creating a consumer a tailer is created and it searches for the last committed position, this is done by reading an offset log in the backward direction,
because there is no committed position it reads all the records and if the purge has deleted the oldest cq4 file an error is raised.
Ex of traceback:
2018-07-12 10:30:36,961 ERROR [localhost-startStop-1] [org.nuxeo.osgi.OSGiAdapter] Error during Framework Listener execution : class org.nuxeo.runtime.osgi.OSGiRuntimeService java.lang.IllegalStateException: Expected file to exist for cycle: 17720, file: /var/lib/nuxeo/stream/bulk/counter/offset-bulkCounter/20180708.cq4. minCycle: 17721, maxCycle: 17724 Available files: [20180711.cq4, 20180710.cq4, 20180709.cq4, 20180712.cq4] at net.openhft.chronicle.queue.impl.single.SingleChronicleQueue$StoreSupplier.nextCycle(SingleChronicleQueue.java:935) at net.openhft.chronicle.queue.impl.WireStorePool.nextCycle(WireStorePool.java:107) at net.openhft.chronicle.queue.impl.single.SingleChronicleQueue.nextCycle(SingleChronicleQueue.java:432) at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.nextIndexWithNextAvailableCycle0(SingleChronicleQueueExcerpts.java:1278) at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.nextIndexWithNextAvailableCycle(SingleChronicleQueueExcerpts.java:1234) at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.beyondStartOfCycleBackward(SingleChronicleQueueExcerpts.java:1110) at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.beyondStartOfCycle(SingleChronicleQueueExcerpts.java:1068) at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.next0(SingleChronicleQueueExcerpts.java:1033) at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.readingDocument(SingleChronicleQueueExcerpts.java:956) at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.readingDocument(SingleChronicleQueueExcerpts.java:891) at net.openhft.chronicle.wire.MarshallableIn.readBytes(MarshallableIn.java:63) at org.nuxeo.lib.stream.log.chronicle.ChronicleLogOffsetTracker.readLastCommittedOffset(ChronicleLogOffsetTracker.java:128) at org.nuxeo.lib.stream.log.chronicle.ChronicleLogOffsetTracker.getLastCommittedOffset(ChronicleLogOffsetTracker.java:109) at org.nuxeo.lib.stream.log.chronicle.ChronicleLogTailer.toLastCommitted(ChronicleLogTailer.java:171) at org.nuxeo.lib.stream.log.chronicle.ChronicleLogTailer.<init>(ChronicleLogTailer.java:83) at org.nuxeo.lib.stream.log.chronicle.ChronicleLogAppender.createTailer(ChronicleLogAppender.java:207) at org.nuxeo.lib.stream.log.chronicle.ChronicleLogManager.lambda$doCreateTailer$3(ChronicleLogManager.java:208) at java.util.ArrayList.forEach(ArrayList.java:1257) at org.nuxeo.lib.stream.log.chronicle.ChronicleLogManager.doCreateTailer(ChronicleLogManager.java:207) at org.nuxeo.lib.stream.log.internals.AbstractLogManager.createTailer(AbstractLogManager.java:96) at org.nuxeo.lib.stream.computation.log.ComputationRunner.<init>(ComputationRunner.java:117)
Restarting the computation (nuxeo) will fix the pb because the purge has already been done.
But restarting the next day will raise the same pb.
Note that so far we don't have this case in Nuxeo,
the problem was visible in 10.2-SNAP because of an imcomplete implementation of BAF, that is now deactivated in 10.2.
- is related to
-
NXP-25388 Disable setProperties action in 10.2
- Resolved