-
Type: Bug
-
Status: Resolved
-
Priority: Critical
-
Resolution: Fixed
-
Affects Version/s: 5.8.0-HF23
-
Fix Version/s: 5.8.0-HF24
-
Component/s: Core, Elasticsearch
Es indexing on 5.8 is very slow when it comes to index a Folder with a lot of child documents (100K+)
Only one thread is working and appear to wait for db:
"Nuxeo-Work-elasticSearchIndexing-1" daemon prio=10 tid=0x00007f3059a03800 nid=0x794b runnable [0x00007f309d6cf000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.DataInputStream.readFully(DataInputStream.java:195) at java.io.DataInputStream.readFully(DataInputStream.java:169) at net.sourceforge.jtds.jdbc.SharedSocket.readPacket(SharedSocket.java:881) at net.sourceforge.jtds.jdbc.SharedSocket.getNetPacket(SharedSocket.java:762) - locked <0x0000000673f19120> (a java.util.concurrent.ConcurrentHashMap) at net.sourceforge.jtds.jdbc.ResponseStream.getPacket(ResponseStream.java:477) at net.sourceforge.jtds.jdbc.ResponseStream.read(ResponseStream.java:114) at net.sourceforge.jtds.jdbc.ResponseStream.peek(ResponseStream.java:99) at net.sourceforge.jtds.jdbc.TdsCore.wait(TdsCore.java:3999) at net.sourceforge.jtds.jdbc.TdsCore.executeSQL(TdsCore.java:1052) - locked <0x000000066d884630> (a net.sourceforge.jtds.jdbc.TdsCore) at net.sourceforge.jtds.jdbc.MSCursorResultSet.cursorFetch(MSCursorResultSet.java:714) - locked <0x000000066d884630> (a net.sourceforge.jtds.jdbc.TdsCore) at net.sourceforge.jtds.jdbc.MSCursorResultSet.next(MSCursorResultSet.java:1137) at org.apache.tomcat.dbcp.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207) at org.apache.tomcat.dbcp.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207) at org.nuxeo.ecm.core.storage.sql.jdbc.JDBCRowMapper.getSelectRows(JDBCRowMapper.java:431) at org.nuxeo.ecm.core.storage.sql.jdbc.JDBCRowMapper.readSimpleRows(JDBCRowMapper.java:255) at org.nuxeo.ecm.core.storage.sql.jdbc.JDBCRowMapper.read(JDBCRowMapper.java:221) at org.nuxeo.ecm.core.storage.sql.SoftRefCachingRowMapper.read(SoftRefCachingRowMapper.java:381) at org.nuxeo.ecm.core.storage.sql.PersistenceContext.getFromMapper(PersistenceContext.java:637) at org.nuxeo.ecm.core.storage.sql.PersistenceContext.getMulti(PersistenceContext.java:682) at org.nuxeo.ecm.core.storage.sql.SessionImpl.getNodesByIds(SessionImpl.java:745) at org.nuxeo.ecm.core.storage.sql.SessionImpl.getNodeById(SessionImpl.java:638) at org.nuxeo.ecm.core.storage.sql.SessionImpl.getNodeById(SessionImpl.java:654) at org.nuxeo.ecm.core.storage.sql.SessionImpl.getChildren(SessionImpl.java:1066) at org.nuxeo.ecm.core.storage.sql.ra.ConnectionImpl.getChildren(ConnectionImpl.java:242) at org.nuxeo.ecm.core.storage.sql.coremodel.SQLSession.getChildren(SQLSession.java:785) at org.nuxeo.ecm.core.storage.sql.coremodel.SQLDocumentLive.getChildren(SQLDocumentLive.java:597) at org.nuxeo.ecm.core.api.DocsQueryProviderFactory$2.getDocs(DocsQueryProviderFactory.java:98) at org.nuxeo.ecm.core.api.AbstractSession.getDocsResultChunk(AbstractSession.java:1183) at org.nuxeo.ecm.core.api.impl.DocumentModelIteratorImpl.retrieveNextChunk(DocumentModelIteratorImpl.java:93) at org.nuxeo.ecm.core.api.impl.DocumentModelIteratorImpl.nextDocument(DocumentModelIteratorImpl.java:119) at org.nuxeo.ecm.core.api.impl.DocumentModelIteratorImpl.next(DocumentModelIteratorImpl.java:105) at org.nuxeo.ecm.core.api.impl.DocumentModelIteratorImpl.next(DocumentModelIteratorImpl.java:33) at org.nuxeo.elasticsearch.work.ChildrenIndexingWorker.doIndexingWork(ChildrenIndexingWorker.java:60) at org.nuxeo.elasticsearch.work.AbstractIndexingWorker.work(AbstractIndexingWorker.java:67) at org.nuxeo.ecm.core.work.WorkHolder.run(WorkHolder.java:68) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744)
but it is not waiting the db but keep requesting the same data.
Here is the explanation:
- org.nuxeo.ecm.core.api.AbstractSession.getDocsResultChunk(AbstractSession.java:1183) -> we give final int start, final int max
- at org.nuxeo.ecm.core.storage.sql.SessionImpl.getChildren(SessionImpl.java:1066) -> on prends tous les enfants (we completement ignore the start and the end)
It's like pseudo pagination with post filter. As a result, on a Folder with 150000 docs, 15 being the default limit, it will query 10000 times the whole 150000 docs.