-
Type: New Feature
-
Status: Resolved
-
Priority: Major
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: NXP-10.3
-
Component/s: ML Custom Model
-
Release Notes Summary:Document reference to represent Corpus objects
-
Epic Link:
-
Tags:
-
Sprint:nxAI Sprint 10.3.1, nxAI Sprint 10.3.3
This Document works as a reference to an S3 TFRecord file, created for training Custom Models
It contains a pointer towards the S3 location, as well as a set of metadatas with some statistics on the data.
To train a custom model, we can use one or more of these for training, and for evaluation.
Schema
- data_location (string) : Where we can find the TFRecord binary in S3 (bucket called sagemaker for easy access )
- training_data (bool) : identifies if this is for training or evaluation
- inputs: [Complex(name,type)] : identifies the input fields
- outputs : [Complex(name,type,multi_class)] : identifies the output fields
- input and output histogram : [Complex(field,label,count)] : histogram on the labels for each field in input and output. List of tuples (field, label) with a counter
- documents_count (int) : total number of documents in this binary