Ideally to get the higher import rate, we need to saturate the backend on disk IO. This is what we have when submitting only insert commands.
When we mix inserts and queries the backend CPU starts to limit the import rate. This is especially true on sharded backend where queries require additional communications between nodes.
For MongoDB this query CPU load on mongos and mongod prevents to take advantage of improved disk IO provided by a sharded cluster.
This can be improved by using a simple cache on states and other shortcut taken in this branch:
Obviously using a cache in Nuxeo cluster mode will require an invalidation mechanism.
So If we can not merge this change as is, may be they can be activated conditionally.
To limit the number of backend queries during document creation or import, typically to improve performance on a mass import,
one can now add a flag to skip the document id duplication check: