The stream importer can import files and generate random documents, but it will be more interesting to import Avro files representing documents.
If we can express a target document schemas in Avro
NXP-24325, this can be used as a required input format to mass import documents.
This is a better choice than json, xml or csv because:
- an Avro object container file contains the schema used to write the data
- the file cannot contains invalid data (not respecting the schema)
- the format is as compact as possible