-
Type: Bug
-
Status: Resolved
-
Priority: Major
-
Resolution: Fixed
-
Affects Version/s: None
-
Component/s: CI/CD
-
Epic Link:
-
Team:PLATFORM
-
Sprint:nxplatform 11.1.31
-
Story Points:2
First time seen after NXP-28618 "Run Nuxeo Platform unit tests against Elasticsearch" was merged.
The unit tests were stuck running TestSmtpMailServerFeature (against PostgreSQL, but the backend shouldn't matter here).
Thread dump extract:
"main" #1 prio=5 os_prio=0 cpu=3123.77ms elapsed=78804.45s tid=0x00007f278c013000 nid=0x12fdb in Object.wait() [0x00007f2793807000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(java.base@11.0.6/Native Method) - waiting on <0x00000000c24aee38> (a com.dumbster.smtp.SimpleSmtpServer) at java.lang.Object.wait(java.base@11.0.6/Object.java:328) at com.dumbster.smtp.SimpleSmtpServer.start(Unknown Source) - waiting to re-lock in wait() <0x00000000c24aee38> (a com.dumbster.smtp.SimpleSmtpServer) at org.nuxeo.mail.SmtpMailServerFeature.start(SmtpMailServerFeature.java:129) at org.nuxeo.mail.SmtpMailServerFeature.beforeMethodRun(SmtpMailServerFeature.java:113) at org.nuxeo.runtime.test.runner.FeaturesRunner.lambda$beforeMethodRun$2(FeaturesRunner.java:180) at org.nuxeo.runtime.test.runner.FeaturesRunner$$Lambda$154/0x00000008402a6c40.call(Unknown Source) at org.nuxeo.runtime.test.runner.FeaturesLoader.apply(FeaturesLoader.java:117) at org.nuxeo.runtime.test.runner.FeaturesLoader.apply(FeaturesLoader.java:104) at org.nuxeo.runtime.test.runner.FeaturesRunner.beforeMethodRun(FeaturesRunner.java:180) at org.nuxeo.runtime.test.runner.FeaturesRunner$BeforeMethodRunStatement.evaluate(FeaturesRunner.java:307) at org.nuxeo.runtime.test.runner.FeaturesRunner$BeforeSetupStatement.evaluate(FeaturesRunner.java:322) at org.nuxeo.runtime.test.runner.FeaturesRunner$AfterMethodRunStatement.evaluate(FeaturesRunner.java:345) at org.nuxeo.runtime.test.runner.FeaturesRunner$AfterTeardownStatement.evaluate(FeaturesRunner.java:362) at org.nuxeo.runtime.test.runner.FeaturesRunner$RulesFactory$1.evaluate(FeaturesRunner.java:435) at org.nuxeo.runtime.test.runner.RuntimeDeployment$DeploymentStatement.evaluate(RuntimeDeployment.java:326)
With SimpleSmtpServer#start(int port):
public static SimpleSmtpServer start(int port) { SimpleSmtpServer server = new SimpleSmtpServer(port); Thread t = new Thread(server); t.start(); // Block until the server socket is created synchronized (server) { try { server.wait(); } catch (InterruptedException e) { // Ignore don't care. } } return server; }
The thread dump is missing the SimpleSmtpServer thread but it was waiting on serverSocket.accept() in the code below:
public void run() { stopped = false; try { try { serverSocket = new ServerSocket(port); serverSocket.setSoTimeout(TIMEOUT); // Block for maximum of 1.5 seconds } finally { synchronized (this) { // Notify when server socket has been created notifyAll(); } } // Server: loop until stopped while (!isStopped()) { // Start server socket and listen for client connections Socket socket = null; try { socket = serverSocket.accept(); } catch (Exception e) { if (socket != null) { socket.close(); } continue; // Non-blocking socket timeout occurred: try accept() again } // Get the input and output streams BufferedReader input = new BufferedReader(new InputStreamReader(socket.getInputStream())); PrintWriter out = new PrintWriter(socket.getOutputStream()); synchronized (this) { /* * We synchronize over the handle method and the list update because the client call completes inside * the handle method and we have to prevent the client from reading the list until we've updated it. * For higher concurrency, we could just change handle to return void and update the list inside the method * to limit the duration that we hold the lock. */ List msgs = handleTransaction(out, input); receivedMail.addAll(msgs); } socket.close(); } } catch (Exception e) { /** @todo Should throw an appropriate exception here. */ e.printStackTrace(); } finally { if (serverSocket != null) { try { serverSocket.close(); } catch (IOException e) { e.printStackTrace(); } } } }
We're not sure here if:
- There is a race condition on the server port, initially fetched by SmtpMailServerFeature#getFreePort:
protected int getFreePort() { int retryCount = 0; while (retryCount < RETRIES) { try (ServerSocket socket = new ServerSocket(0)) { socket.setReuseAddress(true); return socket.getLocalPort(); } catch (IOException e) { retryCount++; log.trace("Failed to allocate port on retry {}", retryCount, e); } } throw new NuxeoException(String.format("Unable to find free port after %d retries", retryCount)); }
but maybe used by another test thread before the server socket is actually started and the server listens for client connections?
- The server is started but no client asks for a connection ?
This could be the case for instance if the dev mode tests are stuck while the PostgreSQL or MongoDB ones are running, yet it seems that when the problem occurred there was only on mode actually running.
We're actually relying on dumbster:dumbster:1.6 for the SimpleSmtpServer.
Yet, this fork seems interesting as it allows the server to just pick a port itself and then tell the caller which port it got.
Let's try to use this fork instead.
- is related to
-
NXP-28618 Run Nuxeo Platform unit tests against Elasticsearch
- Resolved