Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-28761

Fix TestSmtpMailServerFeature randomly stuck in JX

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 11.1
    • Component/s: CI/CD

      Description

      First time seen after NXP-28618 "Run Nuxeo Platform unit tests against Elasticsearch" was merged.
      The unit tests were stuck running TestSmtpMailServerFeature (against PostgreSQL, but the backend shouldn't matter here).

      Thread dump extract:

      "main" #1 prio=5 os_prio=0 cpu=3123.77ms elapsed=78804.45s tid=0x00007f278c013000 nid=0x12fdb in Object.wait()  [0x00007f2793807000]
         java.lang.Thread.State: WAITING (on object monitor)
              at java.lang.Object.wait(java.base@11.0.6/Native Method)
              - waiting on <0x00000000c24aee38> (a com.dumbster.smtp.SimpleSmtpServer)
              at java.lang.Object.wait(java.base@11.0.6/Object.java:328)
              at com.dumbster.smtp.SimpleSmtpServer.start(Unknown Source)
              - waiting to re-lock in wait() <0x00000000c24aee38> (a com.dumbster.smtp.SimpleSmtpServer)
              at org.nuxeo.mail.SmtpMailServerFeature.start(SmtpMailServerFeature.java:129)
              at org.nuxeo.mail.SmtpMailServerFeature.beforeMethodRun(SmtpMailServerFeature.java:113)
              at org.nuxeo.runtime.test.runner.FeaturesRunner.lambda$beforeMethodRun$2(FeaturesRunner.java:180)
              at org.nuxeo.runtime.test.runner.FeaturesRunner$$Lambda$154/0x00000008402a6c40.call(Unknown Source)
              at org.nuxeo.runtime.test.runner.FeaturesLoader.apply(FeaturesLoader.java:117)
              at org.nuxeo.runtime.test.runner.FeaturesLoader.apply(FeaturesLoader.java:104)
              at org.nuxeo.runtime.test.runner.FeaturesRunner.beforeMethodRun(FeaturesRunner.java:180)
              at org.nuxeo.runtime.test.runner.FeaturesRunner$BeforeMethodRunStatement.evaluate(FeaturesRunner.java:307)
              at org.nuxeo.runtime.test.runner.FeaturesRunner$BeforeSetupStatement.evaluate(FeaturesRunner.java:322)
              at org.nuxeo.runtime.test.runner.FeaturesRunner$AfterMethodRunStatement.evaluate(FeaturesRunner.java:345)
              at org.nuxeo.runtime.test.runner.FeaturesRunner$AfterTeardownStatement.evaluate(FeaturesRunner.java:362)
              at org.nuxeo.runtime.test.runner.FeaturesRunner$RulesFactory$1.evaluate(FeaturesRunner.java:435)
              at org.nuxeo.runtime.test.runner.RuntimeDeployment$DeploymentStatement.evaluate(RuntimeDeployment.java:326)
      

      With SimpleSmtpServer#start(int port):

        public static SimpleSmtpServer start(int port) {
          SimpleSmtpServer server = new SimpleSmtpServer(port);
          Thread t = new Thread(server);
          t.start();
          
          // Block until the server socket is created
          synchronized (server) {
            try {
              server.wait();
            } catch (InterruptedException e) {
              // Ignore don't care.
            }
          }
          return server;
        }
      

       
      The thread dump is missing the SimpleSmtpServer thread but it was waiting on serverSocket.accept() in the code below:

        public void run() {
          stopped = false;
          try {
            try {
              serverSocket = new ServerSocket(port);
              serverSocket.setSoTimeout(TIMEOUT); // Block for maximum of 1.5 seconds
            } finally {
              synchronized (this) {
                // Notify when server socket has been created
                notifyAll();
              }
            }
      
            // Server: loop until stopped
            while (!isStopped()) {
              // Start server socket and listen for client connections
              Socket socket = null;
              try {
                socket = serverSocket.accept();
              } catch (Exception e) {
                if (socket != null) {
                  socket.close();
                }
                continue; // Non-blocking socket timeout occurred: try accept() again
              }
      
              // Get the input and output streams
              BufferedReader input = new BufferedReader(new InputStreamReader(socket.getInputStream()));
              PrintWriter out = new PrintWriter(socket.getOutputStream());
      
              synchronized (this) {
                /*
                 * We synchronize over the handle method and the list update because the client call completes inside
                 * the handle method and we have to prevent the client from reading the list until we've updated it.
                 * For higher concurrency, we could just change handle to return void and update the list inside the method
                 * to limit the duration that we hold the lock.
                 */
                List msgs = handleTransaction(out, input);
                receivedMail.addAll(msgs);
              }
              socket.close();
            }
          } catch (Exception e) {
            /** @todo Should throw an appropriate exception here. */
            e.printStackTrace();
          } finally {
            if (serverSocket != null) {
              try {
                serverSocket.close();
              } catch (IOException e) {
                e.printStackTrace();
              }
            }
          }
        }
      

      We're not sure here if:

      • There is a race condition on the server port, initially fetched by SmtpMailServerFeature#getFreePort:
        protected int getFreePort() {
            int retryCount = 0;
            while (retryCount < RETRIES) {
                try (ServerSocket socket = new ServerSocket(0)) {
                    socket.setReuseAddress(true);
                    return socket.getLocalPort();
                } catch (IOException e) {
                    retryCount++;
                    log.trace("Failed to allocate port on retry {}", retryCount, e);
                }
            }
            throw new NuxeoException(String.format("Unable to find free port after %d retries", retryCount));
        }
        

        but maybe used by another test thread before the server socket is actually started and the server listens for client connections?

      • The server is started but no client asks for a connection ?
        This could be the case for instance if the dev mode tests are stuck while the PostgreSQL or MongoDB ones are running, yet it seems that when the problem occurred there was only on mode actually running.

      We're actually relying on dumbster:dumbster:1.6 for the SimpleSmtpServer.
      Yet, this fork seems interesting as it allows the server to just pick a port itself and then tell the caller which port it got.
      Let's try to use this fork instead.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 4 hours
                  4h

                    PagerDuty

                    Error rendering 'com.pagerduty.jira-server-plugin:PagerDuty'. Please contact your Jira administrators.