-
Type: Epic
-
Status: Resolved
-
Priority: Minor
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: QualifiedToSchedule
-
Component/s: ML Custom Model
-
Tags:
-
Completion Level (0 to 5):5
When creating a new custom model, the first step is to collect a corpus from the existing Nuxeo Database.
The first step should be a quick statistics collection of the data and give some feedback about the quality of the data, and, if possible, impossibility of training a model with it.
Feeds statistics into the UI to help the user create a model definition (docType, inputs, outputs)
Statistics supplied:
- Required for defining output fields
- Cardinality
- If possible with percentiles
- Aggregation (bucketing/histogram)
- Rules: score (green, yellow, red) and comment on values
- Unbalanced data ?
- Enough data for each value ?
- All the same tag !
- Required for defining input fields
- Null fields
- Total number of documents
- Rule: score on possible use for training
- Cardinality
Endpoint
- docType
- Inputs
- Outputs
- WHERE clause
Tasks
- Generic service
- Generic rest API
- Define rules for quality of output
- Define rules for quality of input
- Get statistics for input
- Get statistics for output
- is required by
-
AICORE-245 Create REST endpoints for Content Statistics on AI Nuxeo
- Resolved