Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-29380

Cleanup Listener removeTasksForDeletedDocumentRoute does not scale

    Details

      Description

      Context

      Following this change that was backported in HF29:
      https://github.com/nuxeo/nuxeo/commit/4d0870e69c9282cb9b9283569207b06e26aa3d26

      Each time I start a nuxeo worker node, I see :

      • failed work on the Nuxeo side
      • long queries stuck on MongoDB

      MongoDB traces

      onnectionId: 113729
      opid: 1000273721
      lsid: Object
      id: Object
      $binary: Object
      base64: ykTeHrap7mGzEicPqntyqA==
      subType: 03
      uid: <redacted>
      planSummary: IXSCAN { ecm:primaryType: 1 }
      waitingForLock: false
      host: atlas-ibxgqb-shard-00-00.x7qzw.mongodb.net:27017
      active: true
      currentOpTime: 2020-07-07T21:53:36.937+0000
      desc: conn113729
      secs_running: 566
      microsecs_running: 566914039
      op: query
      lockStats: Object
      Global: Object
      acquireCount: Object
      r: 843176
      Database: Object
      acquireCount: Object
      r: 843176
      acquireWaitCount: Object
      r: 11
      timeAcquiringMicros: Object
      r: 29981
      Collection: Object
      acquireCount: Object
      r: 843176
      client: 10.1.12.146:54774
      appName: Nuxeo
      clientMetadata: Object
      driver: Object
      version: 3.8.1
      name: mongo-java-driver
      os: Object
      type: Linux
      name: Linux
      architecture: amd64
      version: 5.3.0-1019-aws
      platform: Java/Oracle Corporation/1.8.0_252-b09
      application: Object
      name: Nuxeo
      ns: nuxeo.default
      numYields: 843175
      locks: Object
      Global: r
      Database: r
      Collection: r
      query: Object
      $truncated: { find: "default", filter: { $and: [ { nt:processId: "7d69557a69f43278" }, { ecm:primaryType: { $in: [ "CorrespondenceIA", "TemplateRoot", "DocumentRouteModelsRoot", "CorrespondenceID", "SimpleTask", "CommentRoot", "Document", "CorrespondenceIN", "CorrespondenceIL", "RoutingTask", "CustomerFolderRoot", "Picture", "CorrespondenceAZ", "CorrespondenceRI", "CorrespondenceAL", "Note", "CorrespondenceAK", "CorrespondenceAR", "ManagementRoot", "Favorites", "TaskRoot", "CorrespondenceOH", "SavedSearch", "CorrespondenceOK", "CorrespondenceGA", "AdvancedContent", "PermissionsSearch", "UserInvitationContainer", "CorrespondenceWV", "UserWorkspacesRoot", "Folder", "BasicAuditSearch", "CorrespondenceOR", "CorrespondenceWY", "CorrespondenceNV", "CorrespondenceFL", "account-navigation_pp", "CorrespondenceNY", "CorrespondenceWA", "StepFolder", "CorrespondenceWI", "PictureBook", "DocumentRouteInstancesRoot", "Statement", "CorrespondenceHI", "HiddenFolder", "Section", "Annotation", "ExpiredSearch", "c...
      

      Logs on the Nuxeo Side

      I did not wait long enough to see the errors on the Nuxeo side, however, when shutting g down the node:

      Work id: 535951549538111.1351184247 title: Listener removeTasksForDeletedDocumentRoute [documentRemoved/0b83972c35477c09, documentRemoved/3bb320af03b98376], has been interrupted, it will be rescheduled, record: Record{watermark=208949231035875328, wmDate=2020-07-07 21:08:40.574, flags=[DEFAULT], key='0b83972c35477c09', data.length=4019, data="....sr.=org.nuxeo.ecm.core.event.impl.AsyncEventExecutor$ListenerWork...........I..retryCountL..bundlet.1Lorg/nuxeo/ecm/core/ev"}
      

      Proposed solution

      • better filter the delete event
        • I suspect this one comes from the delete/redeploy of the workflow template
      • add the missing index
        • running any query without index should be a nogo
      • make the query more specific
        • I suspect that we could do some filtering on the doc type that is indexed

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 4 hours
                  4h

                    PagerDuty

                    Error rendering 'com.pagerduty.jira-server-plugin:PagerDuty'. Please contact your Jira administrators.