Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-28590

Create a Bulk Action that scales using AWS lambda

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: QualifiedToSchedule
    • Component/s: Streams

      Description

      Create a POC to demonstrate how to use a lambda inside a Bulk action.

      The lambda is part of the action processor, it enables to scale outside of Nuxeo using serverless capabilities (up to 1000 concurrent lambda)

      The execution done with the lambda will not be ordered so it must not be used if there is a causal relationship between events.

      In order to decouple lambda invocation and computations (that are part of the bulk Action) we propose to use SQS, because:

      • it is cheap (1st million of call is free each month)
      • it decouples lambda processing call that requires backpressure or failure handling when reaching limits, a single computation thread should handle a high throughput
      • the lambda code doesn't require Nuxeo dependencies
      • the lambda code doesn't require access to Nuxeo services (Kafka, Mongo, Redis ...)
      • the lambda output is pushed to SQS:
        • this is fast, we don't want to pay to wait for a remote service response
        • we don't want to call a REST endpoint on Nuxeo, this is slow, hard to scale and require a Nuxeo UP

       

      The lambda is invoked by a trigger based on an SQS standard Queue input, The Batch size could be configured to 1 so the function is invoked with a single event. See [Configuring a Queue as an Event Source |https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html].

      The code of the lambda needs to send a result into another SQS Standard Queue output.

       

       The Bulk Action is a processor where

      • a first computation read a batch of document ids and perform some projection (load document fields) then use SQS lib to send messages. It commits the position.
      • the second computation using a timer periodically polls the SQS output queue and processes the result and output bulk action stats.

       

      We should have all benefits of Bulk Action (processing large document set, retry logic fallback, progress status) and the scalability of lambda (1000 concurrency).

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              bdelbosc Benoit Delbosc
              Participants:
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                PagerDuty

                Error rendering 'com.pagerduty.jira-server-plugin:PagerDuty'. Please contact your Jira administrators.