-
Type: Bug
-
Status: Resolved
-
Priority: Minor
-
Resolution: Fixed
-
Affects Version/s: 2021.0
-
Component/s: Core
-
Release Notes Summary:Formulas are now skipped when extracting cell values.
-
Tags:
-
Backlog priority:600
-
Sprint:nxplatform #103, nxplatform #104
-
Story Points:5
XLX2TextConverter uses Apache POI XSSFCell class to extract text from Excel file. However XSSFCell.html#getStringCellValue throws an exception when the cell contains a formula (not detected as a NUMERIC cell)
Steps to reproduce :
- Create a document with the attached XLSX file
- Observe that the fulltext extraction stops with this error
ERROR [AbstractWork] Exception during work: FulltextExtractorWork(9f647c47-8572-4bb0-883e-3662bbbb5d3a, , Progress(0.0%, ?/0), Extracting) java.lang.IllegalStateException: Cannot get a STRING value from a NUMERIC formula cell at org.apache.poi.xssf.usermodel.XSSFCell.typeMismatch(XSSFCell.java:1035) ~[poi-ooxml-4.1.2.jar:4.1.2] at org.apache.poi.xssf.usermodel.XSSFCell.checkFormulaCachedValueType(XSSFCell.java:398) ~[poi-ooxml-4.1.2.jar:4.1.2] at org.apache.poi.xssf.usermodel.XSSFCell.getRichStringCellValue(XSSFCell.java:386) ~[poi-ooxml-4.1.2.jar:4.1.2] at org.apache.poi.xssf.usermodel.XSSFCell.getStringCellValue(XSSFCell.java:342) ~[poi-ooxml-4.1.2.jar:4.1.2] at org.nuxeo.ecm.core.convert.plugins.text.extractors.XLX2TextConverter.appendTextFromCell(XLX2TextConverter.java:90) ~[nuxeo-core-convert-plugins-2021.37.4.jar:?] at org.nuxeo.ecm.core.convert.plugins.text.extractors.XLX2TextConverter.convert(XLX2TextConverter.java:73) ~[nuxeo-core-convert-plugins-2021.37.4.jar:?] at org.nuxeo.ecm.core.convert.service.ConversionServiceImpl.convert(ConversionServiceImpl.java:340) ~[nuxeo-core-convert-2021.37.4.jar:?] at org.nuxeo.ecm.core.convert.plugins.text.extractors.FullTextConverter.convert(FullTextConverter.java:70) ~[nuxeo-core-convert-plugins-2021.37.4.jar:?] at org.nuxeo.ecm.core.convert.service.ConversionServiceImpl.convert(ConversionServiceImpl.java:340) ~[nuxeo-core-convert-2021.37.4.jar:?] at org.nuxeo.ecm.core.storage.FulltextExtractorWork.blobToText(FulltextExtractorWork.java:281) ~[nuxeo-core-storage-2021.37.4.jar:?] at org.nuxeo.ecm.core.storage.FulltextExtractorWork.joinText(FulltextExtractorWork.java:310) ~[nuxeo-core-storage-2021.37.4.jar:?] at org.nuxeo.ecm.core.storage.FulltextExtractorWork.extractAndUpdateBinaryText(FulltextExtractorWork.java:236) ~[nuxeo-core-storage-2021.37.4.jar:?] at org.nuxeo.ecm.core.storage.FulltextExtractorWork.extractAndUpdate(FulltextExtractorWork.java:188) ~[nuxeo-core-storage-2021.37.4.jar:?] at org.nuxeo.ecm.core.storage.FulltextExtractorWork.work(FulltextExtractorWork.java:143) ~[nuxeo-core-storage-2021.37.4.jar:?] at org.nuxeo.ecm.core.work.AbstractWork.runWorkWithTransaction(AbstractWork.java:524) [nuxeo-core-event-2021.37.4.jar:?] at org.nuxeo.ecm.core.work.AbstractWork.run(AbstractWork.java:387) [nuxeo-core-event-2021.37.4.jar:?] at org.nuxeo.ecm.core.work.WorkHolder.run(WorkHolder.java:57) [nuxeo-core-event-2021.37.4.jar:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at java.lang.Thread.run(Thread.java:829) [?:?]
Expected behavior: the fulltext extraction should not crash on a cell type mismatch. XLX2TextConverter should log a warn with the exception and continue to extract the text from the other cells.