This Document works as a reference to an S3 TFRecord file, created for training Custom Models
It contains a pointer towards the S3 location, as well as a set of metadatas with some statistics on the data.
To train a custom model, we can use one or more of these for training, and for evaluation.
- data_location (string) : Where we can find the TFRecord binary in S3 (bucket called sagemaker for easy access )
- training_data (bool) : identifies if this is for training or evaluation
- inputs: [Complex(name,type)] : identifies the input fields
- outputs : [Complex(name,type,multi_class)] : identifies the output fields
- input and output histogram : [Complex(field,label,count)] : histogram on the labels for each field in input and output. List of tuples (field, label) with a counter
- documents_count (int) : total number of documents in this binary