Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-25286

Document for Corpus (training and evaluation data)

    XMLWordPrintable

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 10.3
    • Component/s: ML Custom Model
    • Release Notes Summary:
      Document reference to represent Corpus objects
    • Tags:
    • Sprint:
      nxAI Sprint 10.3.1, nxAI Sprint 10.3.3

      Description

      This Document works as a reference to an S3 TFRecord file, created for training Custom Models
      It contains a pointer towards the S3 location, as well as a set of metadatas with some statistics on the data.

      To train a custom model, we can use one or more of these for training, and for evaluation.

      Schema

      • data_location (string) : Where we can find the TFRecord binary in S3 (bucket called sagemaker for easy access )
      • training_data (bool) : identifies if this is for training or evaluation
      • inputs: [Complex(name,type)] : identifies the input fields
      • outputs : [Complex(name,type,multi_class)] : identifies the output fields
      • input and output histogram : [Complex(field,label,count)] : histogram on the labels for each field in input and output. List of tuples (field, label) with a counter
      • documents_count (int) : total number of documents in this binary

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0 minutes
                0m
                Logged:
                Time Spent - 1 minute
                1m