[AICORE-99] Collect data from Nuxeo into a TFRecord stored in S3 - Nuxeo Issue Tracker

XML

Word

Printable

Details

Type: Epic
Status: Resolved
Priority: Minor
Resolution: Done
Affects Version/s: None
Fix Version/s: NXP-10.3
Component/s: ML Custom Model

Release Notes Summary:
Export documents in a dataset to AI cloud in TFRecord format
Tags:
- nxAI

Description

System that has as input an NXQL query (or any other way of representing a list of documents with required fields), passes each document by a processing pipeline and stores the collective data into a binary or TFRecord format.

The processes should allow and java processes.

Case study : The data should be extracted from Nuxeo using Bulk operations and stored into S3 in order to use it with Sagemaker. The output format is in TFRecord.

The reference for this data, as well as the statistics (histograms, inputs, outputs) should be stored in an Ai_Copus document.

Required pre-processing:

for text: store as a string in UTF8 format.
for image: do a resizing for 299x299x3 (RBG) in float format with values between 0-1.

INFO: No python here for now. All needed pre-processing will be done at the Dataset level when training a model.

Attachments

Activity

People

Assignee:

Unassigned

Reporter:

Pedro Cardoso

Participants:

Pedro Cardoso

Votes:

0 Vote for this issue

Watchers:

1 Start watching this issue

Dates

Created:

2018-07-02 15:48

Updated:

2020-03-11 17:19

Resolved:

2018-10-15 09:12