Uploaded image for project: 'Nuxeo AI Core'
  1. Nuxeo AI Core
  2. AICORE-99

Collect data from Nuxeo into a TFRecord stored in S3

    XMLWordPrintable

    Details

    • Type: Epic
    • Status: Resolved
    • Priority: Minor
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: NXP-10.3
    • Component/s: ML Custom Model
    • Release Notes Summary:
      Export documents in a dataset to AI cloud in TFRecord format
    • Tags:

      Description

      System that has as input an NXQL query (or any other way of representing a list of documents with required fields), passes each document by a processing pipeline and stores the collective data into a binary or TFRecord format.

      The processes should allow and java processes.

      Case study : The data should be extracted from Nuxeo using Bulk operations and stored into S3 in order to use it with Sagemaker. The output format is in TFRecord.

      The reference for this data, as well as the statistics (histograms, inputs, outputs) should be stored in an Ai_Copus document.

      Required pre-processing:

      • for text: store as a string in UTF8 format.
      • for image: do a resizing for 299x299x3 (RBG) in float format with values between 0-1.

      INFO: No python here for now. All needed pre-processing will be done at the Dataset level when training a model.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              pcardoso Pedro Cardoso
              Participants:
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: