Internal streams are used to publish metrics and processing topologies.
They are initialized at low level (Nuxeo Stream library level) using a basic Avro codec to encode Records, the codec is not initialized with a schema store like it is done at higher level in the Codec Service, this means that computations reading these internal streams cannot read Record with a different schema, the cannot read Record from a previous version.
When upgrading from 2021 to 2023, this is the case for 2 computations:
1. The stream/introspection computation that aggregates stream information fails to read old Records:
stream/introspection: Terminate computation due to unexpected failure inside computation code: Cannot resolve schema for fingerprint: 1073870735093060760 org.apache.avro.message.MissingSchemaException: Cannot resolve schema for fingerprint: 1073870735093060760 at org.apache.avro.message.BinaryMessageDecoder.getDecoder(BinaryMessageDecoder.java:142) ~[avro-1.11.3.jar:1.11.3] at org.apache.avro.message.BinaryMessageDecoder.decode(BinaryMessageDecoder.java:160) ~[avro-1.11.3.jar:1.11.3] at org.apache.avro.message.MessageDecoder$BaseDecoder.decode(MessageDecoder.java:148) ~[avro-1.11.3.jar:1.11.3] at org.nuxeo.lib.stream.codec.AvroMessageCodec.decode(AvroMessageCodec.java:78) ~[nuxeo-stream-2023.5.17.jar:?] at org.nuxeo.lib.stream.log.kafka.KafkaLogTailer.read(KafkaLogTailer.java:194) ~[nuxeo-stream-2023.5.17.jar:?] at org.nuxeo.lib.stream.computation.log.ComputationRunner.processRecord(ComputationRunner.java:431) ~[nuxeo-stream-2023.5.17.jar:?]
2. The stream/metrics computation which is collecting consumer latencies is reading the last processed record of all streams and fails reading old Records:
Computation: stream/metrics fails last record: null, after retries. org.apache.avro.message.MissingSchemaException: Cannot resolve schema for fingerprint: 1073870735093060760 at org.apache.avro.message.BinaryMessageDecoder.getDecoder(BinaryMessageDecoder.java:142) ~[avro-1.11.3.jar:1.11.3] at org.apache.avro.message.BinaryMessageDecoder.decode(BinaryMessageDecoder.java:160) ~[avro-1.11.3.jar:1.11.3] at org.apache.avro.message.MessageDecoder$BaseDecoder.decode(MessageDecoder.java:148) ~[avro-1.11.3.jar:1.11.3] at org.nuxeo.lib.stream.codec.AvroMessageCodec.decode(AvroMessageCodec.java:78) ~[nuxeo-stream-2023.9.10.jar:?] at org.nuxeo.lib.stream.log.kafka.KafkaLogTailer.read(KafkaLogTailer.java:194) ~[nuxeo-stream-2023.9.10.jar:?] at org.nuxeo.lib.stream.log.internals.AbstractLogManager.getLatencyPerPartition(AbstractLogManager.java:197) ~[nuxeo-stream-2023.9.10.jar:?] at org.nuxeo.lib.stream.log.UnifiedLogManager.getLatencyPerPartition(UnifiedLogManager.java:175) ~[nuxeo-stream-2023.9.10.jar:?] at org.nuxeo.lib.stream.log.LogManager.getLatency(LogManager.java:421) ~[nuxeo-stream-2023.9.10.jar:?]
Consequences:
After an 2023 upgrade, even if there is no activity in progress, the above computations are reported as stream failures, there is no stream introspection available:
- no metrics related to consumer position and streams
- no introspection and scale information on stream management endpoint
Note that moving consumer positions to end of streams is not enough to get rid of the above errors,
because stream/metrics computation is trying to read last processed records and it will fail as long as the last processed record is a 2021 Record.
Once the Kafka retention is passed (default is 7 days) , there is no more errors and above computations can resume.
A workaround is to flush completely Kafka topics by starting 2023 on an empty Kafka cluster.
- is related to
-
NXP-32491 Add all missing Avro schemas from 2021 in 2023
- Resolved