Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-27664

Add a Flat Record codec to interoperate with KSQL and more

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 11.4, 2021.0
    • Component/s: Streams

      Description

      The records used by Nuxeo Stream Processor are computation Record, with the following fields:

      • a key (string) use as rooting or partition key
      • a watermark (timestamp and sequence)
      • some internal flags (stored on a single byte)
      • the data (a byte array) representing the message

      This record can be encoded using different codecs:

      • Avro Message using a local fs schema registry
      • Avro binary, in this case, you need to have a hard-coded schema
      • Avro in JSON mostly for debug purpose
      • Java Externalizable for complex messages or legacy code
      • Avro Confluent message that uses the Confluent Schema registry.

      Also, the data byte array can represent different things depending on the convention between the producer and consumer. This message can be encoded using any of the above codecs. The record and its message don't have to use the same codec.

      So when we have an encoded Record, we need to decode it to get the binary data, then we need to decode the data to get the inner message, it is a 2 levels codec.

      The record envelope is nice to get stats on latency because all records are homogeneous but for interoperability, it is hard to get the double decoding logic in place. For instance, when using KSQL on a stream with the Avro Confluent topic only the record fields are accessible (so the key and the watermark roughly).

      So we need a new codec that could be used when the message is encoded in Avro, the result will be a flat record holding the computation Record fields and the message fields encoded using the Avro Confluent format.

      Not only KSQL should be able to access all the fields (record and message) but we should also be able to read from a KSQL output stream using normal computation.

      Also, this format will be handy for other tools:

      • Kafka Stream knows how to read Avro Confluent message
      • Flink is supposed to have Kafka source and sink that can use Avro Confluent message

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0 minutes
                0m
                Logged:
                Time Spent - 1 day, 1 hour
                1d 1h