Uploaded image for project: 'Nuxeo Drive '
  1. Nuxeo Drive
  2. NXDRIVE-54

Drive fails to synchronize local files named with unicode characters

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3.0806
    • Component/s: Local client
    • Tags:
    • Backlog priority:
      900
    • Sprint:
      Sprint Drive 5.9.6-1

      Description

      To reproduce:

      1. Launch Drive
      2. (Manage to) create in a locally synchronized folder a file containing the \xa0 (whitespace) or \u2122 (trademark symbol) character. You can easily do this in Python with:
        open(u'/home/ataillefer/Nuxeo Drive/Shared with JOE/joe\xa0.txt', 'w').close()
        

      You will get the following:

      2013-09-18 16:59:14,699 3624 139742916241152 DEBUG    nxdrive.synchronizer Cannot perform alignment of u'/home/ataillefer/Nuxeo Drive/Shared with JOE' using digest info due to concurrent file access
      2013-09-18 16:59:14,701 3624 139742916241152 DEBUG    nxdrive.model      Delaying local digest computation for /home/ataillefer/Nuxeo Drive/Shared with JOE/joe .txt due to possible concurrent file access.
      Traceback (most recent call last):
        File "/home/ataillefer/sources/nuxeo/addons/nuxeo-drive/nuxeo-drive-client/nxdrive/model.py", line 283, in update_local
          self.local_digest = local_info.get_digest()
        File "/home/ataillefer/sources/nuxeo/addons/nuxeo-drive/nuxeo-drive-client/nxdrive/client/local_client.py", line 61, in get_digest
          with open(safe_long_path(self.filepath), 'rb') as f:
      IOError: [Errno 2] No such file or directory: u'/home/ataillefer/Nuxeo Drive/Shared with JOE/joe .txt'
      

      The problem is that the filenames are NFKC normalized in the local DB using unicodedata.normalize(), to be consistent with the server-side normalization done by SessionImpl#addChildNode():

      name = Normalizer.normalize(name, Normalizer.Form.NFKC);
      

      See NXP-11315 for the original normalization issue.

      Therefore, if a file is created in the file system with unicode characters in its name, its normalized name will be used by Drive which won't be able to match it with the real name on the file system, typically in LocalClient.get_digest().

      Possible solution is to store both orginal (non-normalized) and normalized filenames in the local DB, then always use:

      • the normalized name for matching the server-side name
      • the non-normalized name for matching the file system name

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                9 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 2 days
                  2d
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 1 day Time Not Required
                  1d