[NXP-17885] Use TransientStore for batch upload - Nuxeo Issue Tracker

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 7.4
Component/s: File Upload , TransientStore

Impact type:

API change
Upgrade notes:
Hide

Added:

BatchManager#getTransientStore()

BatchManager#initBatch()

BatchManager#addStream(String batchId, String idx, InputStream is, int chunkCount, int chunkIdx, String name, String mime, long fileSize)

Batch#addChunk(String idx, InputStream is, int chunkCount, int chunkIdx, String name, String mime, long fileSize)

BatchFileEntry

BatchChunkEntry

Changed:

Batch extends AbstractStorageEntry

Batch#clear() to Batch#clean()
Show
Added: BatchManager#getTransientStore() BatchManager#initBatch() BatchManager#addStream(String batchId, String idx, InputStream is, int chunkCount, int chunkIdx, String name, String mime, long fileSize) Batch#addChunk(String idx, InputStream is, int chunkCount, int chunkIdx, String name, String mime, long fileSize) BatchFileEntry BatchChunkEntry Changed: Batch extends AbstractStorageEntry Batch#clear() to Batch#clean()

Description

Main principle

Now the BatchManager relies on the TransientStore to allow several implementations among which the Redis one that is cluster aware.
Indeed, upload data (structure and streams) must be shared across Nuxeo nodes if we want the upload system to work across the cluster without having to enforce affinity. This is required by ~~NXP-17780~~.

That's why the Batch object now fits in a StorageEntry which is the main object manipulated by the TransientStore.
The way we implemented the storage also allows to take into account chunking in the data struture maintained by the transient store as this is required by ~~NXP-16951~~.

Example of storage of a batch with 2 files among which one of them is made of 2 chunks

The Batch is stored in the "default" transient store with its id as a key.
It has no blobs but references in its parameters the keys of each file in the batch, stored as BatchFileEntry objects in the same transient store.

TransientStore("default") -> {"batchId-a0dbccda-a36c-436d-8de6-09fe96f14e08": batch (StorageEntry)}

batch ->
    - blobs = []
    - params = {"0": "batchId-a0dbccda-a36c-436d-8de6-09fe96f14e08_0", "1": "batchId-a0dbccda-a36c-436d-8de6-09fe96f14e08_1"}

Each BatchFileEntry is indeed stored in the "default" transient store with the file index concatenated to the batch id as a key.
A file that is not chunked directly references its blob in the blob list of the StorageEntry, a file that is chunked references in its parameters the keys of each chunk, stored as BatchChunkEntry objects in the same transient store.

TransientStore("default") -> {"batchId-a0dbccda-a36c-436d-8de6-09fe96f14e08_0": batchFileEntry0 (StorageEntry)}
TransientStore("default") -> {"batchId-a0dbccda-a36c-436d-8de6-09fe96f14e08_1": batchFileEntry1 (StorageEntry)}

batchFileEntry0 ->
    - blobs = [blob]
    - params = {"chunked": false}

batchFileEntry1 ->
    - blobs = []
    - params = {"chunked": true, "fileName": "My file.txt", "mimeType": "text/plain", "fileSize": 1024, "chunkCount": 2, "chunks": {0: "batchId-a0dbccda-a36c-436d-8de6-09fe96f14e08_1_0", 1: "batchId-a0dbccda-a36c-436d-8de6-09fe96f14e08_1_1"}}

Each BatchChunkEntry is indeed stored in the "default" transient store with the chunk index concatenated to the file key as a key.
A chunk directly references its blob in the blob list of the StorageEntry and has no parameters.

TransientStore("default") -> {"batchId-a0dbccda-a36c-436d-8de6-09fe96f14e08_1_0": batchChunkEntry0 (StorageEntry)}
TransientStore("default") -> {"batchId-a0dbccda-a36c-436d-8de6-09fe96f14e08_1_1": batchChunkEntry1 (StorageEntry)}

batchChunkEntry0 ->
    - blobs = [chunk0]
    - params = {}

batchChunkEntry1 ->
    - blobs = [chunk1]
    - params = {}

Adding a file or a chunk to a batch with the BatchManager

1. First you need to initialize a batch by calling

BatchManager#init()

which returns the batch id.
2. Then add a whole file to the batch:

BatchManager#addStream(String batchId, String idx, InputStream is, String name, String mime)

or add a chunk to the given batch file:

BatchManager#addStream(String batchId, String idx, InputStream is, int chunkCount, int chunkIdx, String name, String mime, long fileSize)

3. To get the blob of a given file in the batch just call

BatchManager#getBlob(String batchId, String fileId)

This will return the file blob by eventually concatenating the file chunks if the file is made of chunks.
4. The batch can be cleaned by calling

BatchManager#clean(String batchId)

Attachments

Issue Links

depends on

NXP-17951 TransientStore caching directory lazy initialization crashes if accessed by multiple threads

Resolved

NXP-17884 Improve TransientStore API

Resolved

is required by

NXP-16951 Allow to resume upload

Resolved

NXP-16953 Polish upload API

Resolved

NXP-17780 Make Upload API Cluster Aware

Resolved

NXP-16950 Improve and Extend Upload API

Open

NXP-18187 Run BatchManager and BatchUpload unit tests against a Redis implementation of the TransientStore

Resolved

NXP-18197 Add multithreaded unit tests to validate the BatchManager thread safety

Resolved

NXP-18051 Refactor RedisTransientStore to have a working cluster aware TransientStore

Resolved

(4 is required by)

Use TransientStore for batch upload