Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-32492

Record Codec on Internal streams are not backward compatible

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2023.11
    • Component/s: Streams

      Description

      Internal streams are used to publish metrics and processing topologies.
      They are initialized at low level (Nuxeo Stream library level) using a basic Avro codec to encode Records, the codec is not initialized with a schema store like it is done at higher level in the Codec Service, this means that computations reading these internal streams cannot read Record with a different schema, the cannot read Record from a previous version.

      When upgrading from 2021 to 2023, this is the case for 2 computations:

      1. The stream/introspection computation that aggregates stream information fails to read old Records:

      stream/introspection: Terminate computation due to unexpected failure inside computation code: Cannot resolve schema for fingerprint: 1073870735093060760
      
      org.apache.avro.message.MissingSchemaException: Cannot resolve schema for fingerprint: 1073870735093060760
      	at org.apache.avro.message.BinaryMessageDecoder.getDecoder(BinaryMessageDecoder.java:142) ~[avro-1.11.3.jar:1.11.3]
      	at org.apache.avro.message.BinaryMessageDecoder.decode(BinaryMessageDecoder.java:160) ~[avro-1.11.3.jar:1.11.3]
      	at org.apache.avro.message.MessageDecoder$BaseDecoder.decode(MessageDecoder.java:148) ~[avro-1.11.3.jar:1.11.3]
      	at org.nuxeo.lib.stream.codec.AvroMessageCodec.decode(AvroMessageCodec.java:78) ~[nuxeo-stream-2023.5.17.jar:?]
      	at org.nuxeo.lib.stream.log.kafka.KafkaLogTailer.read(KafkaLogTailer.java:194) ~[nuxeo-stream-2023.5.17.jar:?]
      	at org.nuxeo.lib.stream.computation.log.ComputationRunner.processRecord(ComputationRunner.java:431) ~[nuxeo-stream-2023.5.17.jar:?]
      

      2. The stream/metrics computation which is collecting consumer latencies is reading the last processed record of all streams and fails reading old Records:

      Computation: stream/metrics fails last record: null, after retries.
      
      org.apache.avro.message.MissingSchemaException: Cannot resolve schema for fingerprint: 1073870735093060760
      	at org.apache.avro.message.BinaryMessageDecoder.getDecoder(BinaryMessageDecoder.java:142) ~[avro-1.11.3.jar:1.11.3]
      	at org.apache.avro.message.BinaryMessageDecoder.decode(BinaryMessageDecoder.java:160) ~[avro-1.11.3.jar:1.11.3]
      	at org.apache.avro.message.MessageDecoder$BaseDecoder.decode(MessageDecoder.java:148) ~[avro-1.11.3.jar:1.11.3]
      	at org.nuxeo.lib.stream.codec.AvroMessageCodec.decode(AvroMessageCodec.java:78) ~[nuxeo-stream-2023.9.10.jar:?]
      	at org.nuxeo.lib.stream.log.kafka.KafkaLogTailer.read(KafkaLogTailer.java:194) ~[nuxeo-stream-2023.9.10.jar:?]
      	at org.nuxeo.lib.stream.log.internals.AbstractLogManager.getLatencyPerPartition(AbstractLogManager.java:197) ~[nuxeo-stream-2023.9.10.jar:?]
      	at org.nuxeo.lib.stream.log.UnifiedLogManager.getLatencyPerPartition(UnifiedLogManager.java:175) ~[nuxeo-stream-2023.9.10.jar:?]
      	at org.nuxeo.lib.stream.log.LogManager.getLatency(LogManager.java:421) ~[nuxeo-stream-2023.9.10.jar:?]
      
      

      Consequences:

      After an 2023 upgrade, even if there is no activity in progress, the above computations are reported as stream failures, there is no stream introspection available:

      Note that moving consumer positions to end of streams is not enough to get rid of the above errors,
      because stream/metrics computation is trying to read last processed records and it will fail as long as the last processed record is a 2021 Record.

      Once the Kafka retention is passed (default is 7 days) , there is no more errors and above computations can resume.

      A workaround is to flush completely Kafka topics by starting 2023 on an empty Kafka cluster.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: