-
Type: Bug
-
Status: Resolved
-
Priority: Major
-
Resolution: Fixed
-
Affects Version/s: None
-
Component/s: Core VCS
-
Tags:
-
Backlog priority:900
-
Sprint:nxplatform #118
-
Story Points:5
To be able to retrieve large result set from SQL backend we need a cursor based ResultSet, this approach allows data retrieval in chunks, preventing the loading of all rows into memory at once.
To do this we need to follow these requirements as explain in the PostgreSQL documentation:
- the connection must be using the V3 protocol (this is the default >= pg 7.4 and seen in stack trace org.postgresql.core.v3)
- the connection must not be in autocommit mode (Nuxeo raises an exception when in autocommit mode)
- the statement must be TYPE_FORWARD_ONLY (this flag is set by Nuxeo)
Nuxeo is doing the following:
PreparedStatement ps = connection.prepareStatement(q.selectInfo.sql, ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY, ResultSet.HOLD_CURSORS_OVER_COMMIT); ps.setFetchSize(batchSize);
Unfortunately it seems that the first executeQuery call is returning all rows (tuples) at the PostgreSQL driver level,
even if the call returns only the expected fetch size portion to Nuxeo.
bulk/scrollerPool-00,in:0,inCheckpoint:0,out:0,lastRead:1721667132192,lastTimer:0,wm:0,loop:55110,rebalance assigned at org.postgresql.core.PGStream.receiveTupleV3()Lorg/postgresql/core/Tuple; (PGStream.java:619) at org.postgresql.core.v3.QueryExecutorImpl.processResults(Lorg/postgresql/core/ResultHandler;IZ)V (QueryExecutorImpl.java:2379) at org.postgresql.core.v3.QueryExecutorImpl.execute(Lorg/postgresql/core/Query;Lorg/postgresql/core/ParameterList;Lorg/postgresql/core/ResultHandler;IIIZ)V (QueryExecutorImpl.java:371) at org.postgresql.jdbc.PgStatement.executeInternal(Lorg/postgresql/core/CachedQuery;Lorg/postgresql/core/ParameterList;I)V (PgStatement.java:502) at org.postgresql.jdbc.PgStatement.execute(Lorg/postgresql/core/CachedQuery;Lorg/postgresql/core/ParameterList;I)V (PgStatement.java:419) at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(I)Z (PgPreparedStatement.java:194) at org.postgresql.jdbc.PgPreparedStatement.executeQuery()Ljava/sql/ResultSet; (PgPreparedStatement.java:137) at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery()Ljava/sql/ResultSet; (DelegatingPreparedStatement.java:122) at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery()Ljava/sql/ResultSet; (DelegatingPreparedStatement.java:122) at org.nuxeo.ecm.core.storage.sql.jdbc.JDBCMapper.scrollSearch(Ljava/lang/String;Lorg/nuxeo/ecm/core/query/QueryFilter;II)Lorg/nuxeo/ecm/core/api/ScrollResult; (JDBCMapper.java:656) at org.nuxeo.ecm.core.storage.sql.jdbc.JDBCMapper.scroll(Ljava/lang/String;II)Lorg/nuxeo/ecm/core/api/ScrollResult; (JDBCMapper.java:612) at org.nuxeo.ecm.core.storage.sql.SoftRefCachingMapper.scroll(Ljava/lang/String;II)Lorg/nuxeo/ecm/core/api/ScrollResult; (SoftRefCachingMapper.java:53) at org.nuxeo.ecm.core.storage.sql.SessionImpl.scroll(Ljava/lang/String;II)Lorg/nuxeo/ecm/core/api/ScrollResult; (SessionImpl.java:658) at org.nuxeo.ecm.core.storage.sql.coremodel.SQLSession.scroll(Ljava/lang/String;II)Lorg/nuxeo/ecm/core/api/ScrollResult; (SQLSession.java:150) at org.nuxeo.ecm.core.api.AbstractSession.scroll(Ljava/lang/String;II)Lorg/nuxeo/ecm/core/api/ScrollResult; (AbstractSession.java:1434) at org.nuxeo.ecm.core.scroll.RepositoryScroll.fetch()Z (RepositoryScroll.java:88) at org.nuxeo.ecm.core.scroll.RepositoryScroll.hasNext()Z (RepositoryScroll.java:81) at org.nuxeo.ecm.core.bulk.computation.BulkScrollerComputation.processRecord(Lorg/nuxeo/lib/stream/computation/ComputationContext;Lorg/nuxeo/lib/stream/computation/Record;)V (BulkScrollerComputation.java:189)
then the code iterate on the cursor by page of fetchSize rows, but there is no more db roundtrip.
This can lead to OOM (12GB of tuples for 115m of rows).
It looks like a regression since benchmark of 1b of docs have been successful on PostgreSQL years ago.
- is caused by
-
NXP-21034 New scroll API on PostgreSQL 9.5 requires cursor holdability between transaction
- Resolved
- links to