Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-32317

Use better default for bulk bucketSize to reduce record processing duration

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2021.50, 2023.8
    • Component/s: Bulk
    • Release Notes Summary:
      Tune bulk actions to avoid long record processing
    • Sprint:
      nxplatform #107
    • Story Points:
      3

      Description

      Some Bulk actions have slow processing depending on the input document, still they use a high value for bucketSize which is the number of document ids to process per record.
      This can result in very long record processing, and this is not good for different reasons:

      • in case of scaling (up or down) there is a Kafka rebalancing that is blocked until the ongoing record is processed.
      • when mixing bulk command with a single item and 100 items it creates a traffic jam on specific partition
      • when processing 100 items we are not taking advantage of parallelism, everything is processed by a single thread
      • we may need to increase the kafka poll interval to avoid checkpoint failure

      A bucketSize of 100 makes sense for fast processing, not for processing that could take minutes per item.

      This ticket is to make sure we have good default.

      For instance, we may want to reconfigure bucket/batch size for the following actions:

          <action name="recomputeViews" inputStream="bulk/recomputeViews" bucketSize="100" batchSize="50" .. />
          <action name="automation" inputStream="bulk/automation" bucketSize="100" batchSize="10" ... />
          <action name="automationUi" inputStream="bulk/automationUi" bucketSize="100" batchSize="10" .../>
          <action name="documentRoutingEscalation" inputStream="bulk/documentRoutingEscalation" bucketSize="100" .../>
      

      Like we already did for few of them:

          <action name="recomputeThumbnails" inputStream="bulk/recomputeThumbnails" bucketSize="25" batchSize="1"
          <action name="recomputeVideoConversion" inputStream="bulk/recomputeVideoConversion" bucketSize="2" batchSize="1"
      

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: