Uploaded image for project: 'Nuxeo AI Core'
  1. Nuxeo AI Core
  2. AICORE-412

Fix bulk tests / Tune Elasticsearch (Data too large)

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.5.0, 3.0.0
    • Fix Version/s: 2.6.0, 3.1.2
    • Component/s: CI/CD

      Description

      ES randomly fails the unit tests.
      See https://jenkins.ai.dev.nuxeo.com/job/nuxeo/job/nuxeo-ai/job/sprint-3b/7/testReport/, for instance org.nuxeo.ai.bulk.BulkEnrichmentTest:

      Error while invoking beforeRun on features: [org.nuxeo.ai.enrichment.EnrichmentTestFeature, org.nuxeo.runtime.test.runner.MDCFeature, org.nuxeo.runtime.test.runner.ConditionalIgnoreRule$Feature, org.nuxeo.runtime.test.runner.RandomBug$Feature, org.nuxeo.runtime.test.runner.WithFrameworkPropertyFeature, org.nuxeo.runtime.test.runner.RuntimeFeature, org.nuxeo.runtime.cluster.ClusterFeature, org.nuxeo.runtime.test.runner.TransactionalFeature, org.nuxeo.runtime.stream.RuntimeStreamFeature, org.nuxeo.ecm.core.api.local.DummyLoginFeature, org.nuxeo.ecm.core.work.WorkManagerFeature, org.nuxeo.ecm.core.bulk.CoreBulkFeature, org.nuxeo.ecm.core.test.CoreFeature, org.nuxeo.directory.test.DirectoryFeature, org.nuxeo.ecm.platform.test.UserManagerFeature, org.nuxeo.ecm.platform.test.PlatformFeature, org.nuxeo.ecm.automation.core.AutomationCoreFeature, org.nuxeo.ecm.automation.test.AutomationFeature, org.nuxeo.ecm.platform.test.NuxeoLoginFeature, org.nuxeo.runtime.test.runner.LogFeature, org.nuxeo.elasticsearch.test.RepositoryLightElasticSearchFeature, org.nuxeo.elasticsearch.test.RepositoryElasticSearchFeature]
      Trace d'appels
      java.lang.AssertionError: Error while invoking beforeRun on features: [org.nuxeo.ai.enrichment.EnrichmentTestFeature, org.nuxeo.runtime.test.runner.MDCFeature, org.nuxeo.runtime.test.runner.ConditionalIgnoreRule$Feature, org.nuxeo.runtime.test.runner.RandomBug$Feature, org.nuxeo.runtime.test.runner.WithFrameworkPropertyFeature, org.nuxeo.runtime.test.runner.RuntimeFeature, org.nuxeo.runtime.cluster.ClusterFeature, org.nuxeo.runtime.test.runner.TransactionalFeature, org.nuxeo.runtime.stream.RuntimeStreamFeature, org.nuxeo.ecm.core.api.local.DummyLoginFeature, org.nuxeo.ecm.core.work.WorkManagerFeature, org.nuxeo.ecm.core.bulk.CoreBulkFeature, org.nuxeo.ecm.core.test.CoreFeature, org.nuxeo.directory.test.DirectoryFeature, org.nuxeo.ecm.platform.test.UserManagerFeature, org.nuxeo.ecm.platform.test.PlatformFeature, org.nuxeo.ecm.automation.core.AutomationCoreFeature, org.nuxeo.ecm.automation.test.AutomationFeature, org.nuxeo.ecm.platform.test.NuxeoLoginFeature, org.nuxeo.runtime.test.runner.LogFeature, org.nuxeo.elasticsearch.test.RepositoryLightElasticSearchFeature, org.nuxeo.elasticsearch.test.RepositoryElasticSearchFeature]
      	at org.nuxeo.runtime.test.runner.FeaturesRunner.apply(FeaturesRunner.java:253)
      	at org.nuxeo.runtime.test.runner.FeaturesRunner.apply(FeaturesRunner.java:225)
      	at org.nuxeo.runtime.test.runner.FeaturesRunner.beforeRun(FeaturesRunner.java:189)
      	at org.nuxeo.runtime.test.runner.FeaturesRunner$BeforeClassStatement.evaluate(FeaturesRunner.java:323)
      	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
      	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
      	at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
      	at org.junit.runners.Suite.runChild(Suite.java:128)
      	at org.junit.runners.Suite.runChild(Suite.java:27)
      	at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
      	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
      	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
      	at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
      	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
      	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
      	at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
      	at org.apache.maven.surefire.junitcore.JUnitCore.run(JUnitCore.java:55)
      	at org.apache.maven.surefire.junitcore.JUnitCoreWrapper.createRequestAndRun(JUnitCoreWrapper.java:137)
      	at org.apache.maven.surefire.junitcore.JUnitCoreWrapper.executeEager(JUnitCoreWrapper.java:107)
      	at org.apache.maven.surefire.junitcore.JUnitCoreWrapper.execute(JUnitCoreWrapper.java:83)
      	at org.apache.maven.surefire.junitcore.JUnitCoreWrapper.execute(JUnitCoreWrapper.java:75)
      	at org.apache.maven.surefire.junitcore.JUnitCoreProvider.invoke(JUnitCoreProvider.java:158)
      	at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
      	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
      	at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
      	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
      	Suppressed: org.nuxeo.ecm.core.api.NuxeoException: org.elasticsearch.client.ResponseException: method [POST], host [http://localhost:9200], URI [/nxutest/_refresh], status line [HTTP/1.1 429 Too Many Requests]
      {"error":{"root_cause":[{"type":"circuit_breaking_exception","reason":"[parent] Data too large, data for [<http_request>] would be [909074512/866.9mb], which is larger than the limit of [906992025/864.9mb], real usage: [909074512/866.9mb], new bytes reserved: [0/0b], usages [request=0/0b, fielddata=0/0b, in_flight_requests=0/0b, accounting=7876/7.6kb]","bytes_wanted":909074512,"bytes_limit":906992025,"durability":"PERMANENT"}],"type":"circuit_breaking_exception","reason":"[parent] Data too large, data for [<http_request>] would be [909074512/866.9mb], which is larger than the limit of [906992025/864.9mb], real usage: [909074512/866.9mb], new bytes reserved: [0/0b], usages [request=0/0b, fielddata=0/0b, in_flight_requests=0/0b, accounting=7876/7.6kb]","bytes_wanted":909074512,"bytes_limit":906992025,"durability":"PERMANENT"},"status":429}
      		at org.nuxeo.elasticsearch.client.ESRestClient.performRequest(ESRestClient.java:214)
      		at org.nuxeo.elasticsearch.client.ESRestClient.performRequestWithTracing(ESRestClient.java:220)
      		at org.nuxeo.elasticsearch.client.ESRestClient.refresh(ESRestClient.java:129)
      		at org.nuxeo.elasticsearch.core.ElasticSearchAdminImpl.refreshRepositoryIndex(ElasticSearchAdminImpl.java:210)
      		at org.nuxeo.elasticsearch.core.ElasticSearchAdminImpl.refresh(ElasticSearchAdminImpl.java:273)
      		at org.nuxeo.elasticsearch.ElasticSearchComponent.refresh(ElasticSearchComponent.java:370)
      		at org.nuxeo.elasticsearch.test.RepositoryLightElasticSearchFeature.await(RepositoryLightElasticSearchFeature.java:76)
      		at org.nuxeo.runtime.test.runner.TransactionalFeature.await(TransactionalFeature.java:124)
      		at org.nuxeo.runtime.test.runner.TransactionalFeature.nextTransaction(TransactionalFeature.java:104)
      		at org.nuxeo.ecm.core.test.CoreFeature.beforeRun(CoreFeature.java:191)
      		at org.nuxeo.runtime.test.runner.FeaturesRunner.lambda$beforeRun$1(FeaturesRunner.java:189)
      		at org.nuxeo.runtime.test.runner.FeaturesRunner.apply(FeaturesRunner.java:239)
      		... 25 more
      	Caused by: org.elasticsearch.client.ResponseException: method [POST], host [http://localhost:9200], URI [/nxutest/_refresh], status line [HTTP/1.1 429 Too Many Requests]
      {"error":{"root_cause":[{"type":"circuit_breaking_exception","reason":"[parent] Data too large, data for [<http_request>] would be [909074512/866.9mb], which is larger than the limit of [906992025/864.9mb], real usage: [909074512/866.9mb], new bytes reserved: [0/0b], usages [request=0/0b, fielddata=0/0b, in_flight_requests=0/0b, accounting=7876/7.6kb]","bytes_wanted":909074512,"bytes_limit":906992025,"durability":"PERMANENT"}],"type":"circuit_breaking_exception","reason":"[parent] Data too large, data for [<http_request>] would be [909074512/866.9mb], which is larger than the limit of [906992025/864.9mb], real usage: [909074512/866.9mb], new bytes reserved: [0/0b], usages [request=0/0b, fielddata=0/0b, in_flight_requests=0/0b, accounting=7876/7.6kb]","bytes_wanted":909074512,"bytes_limit":906992025,"durability":"PERMANENT"},"status":429}
      		at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:302)
      		at org.elasticsearch.client.RestClient.performRequest(RestClient.java:272)
      		at org.elasticsearch.client.RestClient.performRequest(RestClient.java:246)
      		at org.nuxeo.elasticsearch.client.ESRestClient.performRequest(ESRestClient.java:212)
      		... 36 more
      

      Observed on the pod during the tests:

      CPU  MEM   CPU/R:L   MEM/R:L
      2797 1690 2000:4000 2048:4096

       The circuit breaker parent rejects the requests because of a fielddata overflow (9KB + 866MB). The limit is tied to the JVM HEAP size.

       

      See https://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html

       

      TODO:

      • review the code: there may be more field uselessly set as fielddata
      • tune ES or Maven or Surefire ?  MAVEN_OPTS Xmx
      • activate ES for the two previews; that's not related to the unit tests issue but worth to also improve
        see https://github.com/nuxeo/nuxeo-helm-chart/blob/master/nuxeo/values.yaml#L122 , current chart is 1.0.14
      • consider not using the embedded ES, even for the unit tests
        => for the functional tests, wait for related Platform improvements on their next chart
      • lower requested resources
      • ...

       

       

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: