Uploaded image for project: 'Nuxeo Drive '
  1. Nuxeo Drive
  2. NXDRIVE-1981

Make safe_filename() more efficient

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.1.707
    • Fix Version/s: 4.4.1, 4.4.2
    • Component/s: Framework

      Description

      The current implementation is using re.sub() to replace a set of characters in filenames.

      After a small benchmark, and given the huge number of filenames Drive is is dealing with, this micro optimization may be interesting.

      from timeit import repeat
      
      setup1 = r"""
      import re
      pattern = re.compile(r'([aetuo])')
      value = "azertyuiop"
      """
      
      setup2 = """
      repmap = {ord(c): "-" for c in "aetuo"}
      value = "azertyuiop"
      """
      
      print(repeat("re.sub(pattern, '-', value)", setup1, number=100000)[1])
      print(repeat("value.translate(repmap)", setup2, number=100000)[1])
      

      The script first test the current implementation using re.sub().
      The second is using a potential better implementation, using str.translate().

      Results:

      re.sub()       : 0.23668296402320266
      str.translate(): 0.05809920001775026
      

      This is a significative improvement on the performance level.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 5 hours
                  5h