Uploaded image for project: 'Nuxeo AI Core'
  1. Nuxeo AI Core
  2. AICORE-100

Do some sanity check on the data for training a custom model

    XMLWordPrintable

    Details

    • Tags:
    • Completion Level (0 to 5):
      5

      Description

      When creating a new custom model, the first step is to collect a corpus from the existing Nuxeo Database.
      The first step should be a quick statistics collection of the data and give some feedback about the quality of the data, and, if possible, impossibility of training a model with it.

      Feeds statistics into the UI to help the user create a model definition (docType, inputs, outputs)
      Statistics supplied:

      • Required for defining output fields
        • Cardinality
          • If possible with percentiles
        • Aggregation (bucketing/histogram)
          • Rules: score (green, yellow, red) and comment on values
          • Unbalanced data ?
          • Enough data for each value ?
          • All the same tag !
        • Required for defining input fields
          • Null fields
          • Total number of documents
          • Rule: score on possible use for training

      Endpoint

      • docType
      • Inputs
      • Outputs
      • WHERE clause

      Tasks

      • Generic service
      • Generic rest API
      • Define rules for quality of output
      • Define rules for quality of input
      • Get statistics for input
      • Get statistics for output

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                pcardoso Pedro Cardoso
                Participants:
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: