-
Type: Epic
-
Status: Resolved
-
Priority: Major
-
Resolution: Done
-
Affects Version/s: None
-
Fix Version/s: 10.3
-
Component/s: Core, Nuxeo Vision
-
Release Notes Summary:Defines an infrastructure for Document Enrichment with service integration. The new framework is architecture to be resilient, scalable and efficient as flexible on the functional level.
-
Tags:
AI Service integration Framework
The idea is to have an AI integration framework that allows to:
- extract Data from Nuxeo Document
- picture, text content, filename, mime type, filing plan …
- extract audio track or frames from images …
- call an external AI service to compute external meta-data
- classification, entities extraction, text transcription …
- store the additional meta-data
- facet/schema
This system needs to support 2 processing modes:
- on the fly / event based
- create/modify event => call AI enrichment
- batch
- batch call AI enrichment based on a query
In addition, we will need to be able to execute run the AI calls in 2 modes:
- execute: call the AI services to get the result
- learn mode: call AI service to make it learn
- event-based for user-initiated update of the inferred schemas
- batch mode for initial training (may include data export and staging)
The existing Google-Vision/AWS-Rekognition could be used as startup code:
- it already contains part of the infrastructure
- GoogleVision and Rekognition are 2 valid services to integrate
We may want to start the work as a clone/fork or the nuxeo-vision repository in order to avoid any confusion (here the goal is to reuse existing code to speed up the bootstrap process, but the target scope is much wider than nuxeo-vision)
We want to build a generic infrastructure so that:
- we can easily add new services
- build a pluggability model to
- extract data from the document
- call AI service in classify or learn mode
- store result
- build a pluggability model to
- we can route/dispatch between different AI services
- depending on event, doc type and mime-type
Services to integrate
- Standard Image recognition
- Google Vision and Rekognition
- Objects detection, Faces recognition
- here we could also leverage Google an AWS services
- Specialized Image recognition
- ProductAI (we have a 2 weeks trial)
- Speech to Text
- AWS transcribe?
- Text based categorization
- use AWS Comprehend or build our own service based on AWS SageMaker
In terms of short term target, improving GoogleVision/AWS Rekognition and ProductAI seems like easy options.