Uploaded image for project: 'Nuxeo Drive '
  1. Nuxeo Drive
  2. NXDRIVE-1753

Improve uploads robustness

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 4.0.4
    • Fix Version/s: 4.1.4
    • Component/s: Synchronizer

      Description

      Status & Problem

      The current upload process is:

      • upload all chunks;
      • then call the NuxeoDrive.CreateFile operation to link the uploaded blob to a given document.

      If the 2nd part fails, the entire upload will be restarted from zero. This is problematic with big files (even worse if the network is bad).

      We have seen an example where the 2nd part failed even if the operation was a success. So Drive restarted the upload. FTR this was this error:

      Traceback (most recent call last):
        File "nxdrive/engine/processor.py", line 280, in _execute
        File "nxdrive/engine/processor.py", line 749, in _synchronize_locally_created
        File "nxdrive/client/remote_client.py", line 553, in stream_file
        File "nxdrive/client/remote_client.py", line 379, in upload
        File "nxdrive/client/remote_client.py", line 149, in execute
        File "nxdrive/client/remote_client.py", line 145, in execute
        File "site-packages/nuxeo/operations.py", line 201, in execute
        File "site-packages/nuxeo/client.py", line 209, in request
      nuxeo.exceptions.HTTPError: HTTPError(502), error: b'<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>502 Proxy Error</title>\n</head><body>\n<h1>Proxy Error</h1>\n<p>The proxy server received an invalid\r\nresponse from an upstream server.<br />\r\nThe proxy server could not handle the request <em><a href="/nuxeo/api/v1/upload/batchId-BATCHID/0/execute/NuxeoDrive.CreateFile">POST&nbsp;/nuxeo/api/v1/upload/batchId-BATCHID/0/execute/NuxeoDrive.CreateFile</a></em>.<p>\nReason: <strong>Error reading from remote server</strong></p></p></body></html>\n', server trace: None
      2019-07-05 13:44:37 700 123145313792000 DEBUG    nxdrive.engine.processor Postpone action on document(Server unavailable): <DocPair...>
      

      Improvement 1

      The idea is to separate completely the upload process:

      • If the 1st part is OK, continue; else retry as it is the current behavior.
      • If an error happens at the 2nd part:
        • If the file exists on the server, this means the error is only after the operation was done and we can say it is OK.
        • If the file does not exist on the server, it means there is an actual error, put the document in error as this is the current behavior. If it was a temporary error (network, ... ) the upload will be retried and if the chunk TTL is large enough, no upload would be done. Else a new upload will be done.

      Improvement 2

      Currently, the part 2 use the part 1 upload duration to calculate the Nuxeo-Transaction-Timeout for the operation (duration * 2 seconds). This is not good as for resumable uploads, one may have paused a big upload (20GiB) at the nearly end; and at the resume, the upload duration will be far from the real (total) value.
      Another way of computing this timeout is needed.

        Attachments

        1. nxdrive.log.2019-07-10
          4.29 MB
          Monique Ruggiero
        2. server.log
          85 kB
          Monique Ruggiero

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 4 days
                  4d