[NXDRIVE-1753] Improve uploads robustness - Nuxeo Issue Tracker

XML

Word

Printable

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 4.0.4
Fix Version/s: 4.1.4
Component/s: Synchronizer

Epic Link:
Upload client
Tags:
Sprint:
nxDrive 11.1.13
Story Points:
2

Description

Status & Problem

The current upload process is:

upload all chunks;
then call the NuxeoDrive.CreateFile operation to link the uploaded blob to a given document.

If the 2nd part fails, the entire upload will be restarted from zero. This is problematic with big files (even worse if the network is bad).

We have seen an example where the 2nd part failed even if the operation was a success. So Drive restarted the upload. FTR this was this error:

Traceback (most recent call last):
  File "nxdrive/engine/processor.py", line 280, in _execute
  File "nxdrive/engine/processor.py", line 749, in _synchronize_locally_created
  File "nxdrive/client/remote_client.py", line 553, in stream_file
  File "nxdrive/client/remote_client.py", line 379, in upload
  File "nxdrive/client/remote_client.py", line 149, in execute
  File "nxdrive/client/remote_client.py", line 145, in execute
  File "site-packages/nuxeo/operations.py", line 201, in execute
  File "site-packages/nuxeo/client.py", line 209, in request
nuxeo.exceptions.HTTPError: HTTPError(502), error: b'<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>502 Proxy Error</title>\n</head><body>\n<h1>Proxy Error</h1>\n<p>The proxy server received an invalid\r\nresponse from an upstream server.<br />\r\nThe proxy server could not handle the request <em><a href="/nuxeo/api/v1/upload/batchId-BATCHID/0/execute/NuxeoDrive.CreateFile">POST&nbsp;/nuxeo/api/v1/upload/batchId-BATCHID/0/execute/NuxeoDrive.CreateFile</a></em>.<p>\nReason: <strong>Error reading from remote server</strong></p></p></body></html>\n', server trace: None
2019-07-05 13:44:37 700 123145313792000 DEBUG    nxdrive.engine.processor Postpone action on document(Server unavailable): <DocPair...>

Improvement 1

The idea is to separate completely the upload process:

If the 1st part is OK, continue; else retry as it is the current behavior.
If an error happens at the 2nd part:
- If the file exists on the server, this means the error is only after the operation was done and we can say it is OK.
- If the file does not exist on the server, it means there is an actual error, put the document in error as this is the current behavior. If it was a temporary error (network, ... ) the upload will be retried and if the chunk TTL is large enough, no upload would be done. Else a new upload will be done.

Improvement 2

Currently, the part 2 use the part 1 upload duration to calculate the Nuxeo-Transaction-Timeout for the operation (duration * 2 seconds). This is not good as for resumable uploads, one may have paused a big upload (20GiB) at the nearly end; and at the resume, the upload duration will be far from the real (total) value.
Another way of computing this timeout is needed.

Attachments

Options
- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

Attachments

nxdrive.log.2019-07-10
4.29 MB
2019-07-10 12:57
server.log
85 kB
2019-07-10 13:18

Issue Links

is related to

NXDRIVE-1743 Fail to upload large files through Drive

Resolved

NXPY-112 Update uploadedSize on each and every upload iteration

Resolved

NXDRIVE-1714 Handle files transfers above 100 GB

Resolved

Activity

People

Assignee:

Mickaël Schoentgen

Reporter:

Mickaël Schoentgen

Participants:

Jenkins, Mickaël Schoentgen

Votes:

0 Vote for this issue

Watchers:

2 Start watching this issue

Dates

Created:

2019-07-05 14:14

Updated:

2020-06-05 08:37

Resolved:

2019-07-15 15:00

Time Tracking

Estimated:

Not Specified

Remaining:

Logged: