Uploaded image for project: 'Nuxeo Studio'
  1. Nuxeo Studio
  2. NXS-6386

Follow up on production incident on May 4th

    XMLWordPrintable

    Details

      Description

      On May the 3th 11pm CET, several errors occurred when customers tried to do a git clone of their Studio project. The Gitty node was temporarily unavailable and a 503 error was returned. Here are the different errors found in the application logs.

      • Workers ran on Gitty node
        • Failed to execute async event null on listener segmentIOEventListener
        • Unable to write stat entry for xxx-xxx-xxx-xxx on Gitty nodes
        • Exception during work: HouseKeepingWorker(/triggerHouseKeepingListener:12823062651770.1770079632, Progress(?%, ?/0), null) on Gitty nodes
        • Exception during projectRemovalListener sync listener execution, continuing to run other listeners

      => Those errors are investigated and fixed in the scope of https://jira.nuxeo.com/browse/NXS-6360

      • Git repository not found on Gitty nodes
        • Failed to clone remote repository for project: project1-habeo
        • Failed to clone remote repository for project: brendan-phillips-haley
        • Failed to clone remote repository for project: sonia-sherman-koch-and
        • Failed to clone remote repository for project: joy-nichols-mccall-and
        • Failed to clone remote repository for project: clinton-fitzgerald-webb

      => In rare cases, we have identified a bunch of files locked by jgit and it prevents the cleanup of those trial projects. Because of the exceptions, the cleanup ends in a timeout, the transaction is rolled back so those "corrupted" trials are kept in Connect and the errors will occur again when the next cleanup is triggered.
      https://jira.nuxeo.com/browse/NXS-6374

      • Exception during jiraSynchroListener sync listener execution, continuing to run other listeners
        • org.nuxeo.ecm.core.api.NuxeoException: Error while trying to delete Jira user with username = mtienda@geoit.com.mx
           Response code : 400
           Response : {"errorMessages":["Cannot delete user, the user directory is read-only."],"errors":{}
        • org.nuxeo.ecm.core.api.NuxeoException: Error while trying to delete Jira user with username = mtavila.1@gmail.com
           Response code : 400
           Response : {"errorMessages":["Cannot delete user, the user directory is read-only."],"errors":{}}

          => https://jira.nuxeo.com/browse/NXS-6387

      • Couldn't find StudioProject for ConnectProject with id content-now
        • It needs to be investigated further but that error is not responsible for the unavailability of the Gitty node.

      A note book has been created in Datadog to track the metrics related to the different ELB: https://app.datadoghq.com/notebook/749358/arnaud-5-may-2021-11-01

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: