Uploaded image for project: 'Nuxeo Platform'
  1. Nuxeo Platform
  2. NXP-31938

Skip cell type mismatch during fulltext extraction

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2021.0
    • Fix Version/s: 2021.48, 2023.6
    • Component/s: Core
    • Release Notes Summary:
      Formulas are now skipped when extracting cell values.
    • Backlog priority:
      600
    • Sprint:
      nxplatform #103, nxplatform #104
    • Story Points:
      5

      Description

      XLX2TextConverter uses Apache POI XSSFCell class to extract text from Excel file. However XSSFCell.html#getStringCellValue throws an exception when the cell contains a formula (not detected as a NUMERIC cell)

       

      Steps to reproduce :

      1. Create a document with the attached XLSX file
      2. Observe that the fulltext extraction stops with this error
      ERROR [AbstractWork] Exception during work: FulltextExtractorWork(9f647c47-8572-4bb0-883e-3662bbbb5d3a, , Progress(0.0%, ?/0), Extracting)
      java.lang.IllegalStateException: Cannot get a STRING value from a NUMERIC formula cell
          at org.apache.poi.xssf.usermodel.XSSFCell.typeMismatch(XSSFCell.java:1035) ~[poi-ooxml-4.1.2.jar:4.1.2]
          at org.apache.poi.xssf.usermodel.XSSFCell.checkFormulaCachedValueType(XSSFCell.java:398) ~[poi-ooxml-4.1.2.jar:4.1.2]
          at org.apache.poi.xssf.usermodel.XSSFCell.getRichStringCellValue(XSSFCell.java:386) ~[poi-ooxml-4.1.2.jar:4.1.2]
          at org.apache.poi.xssf.usermodel.XSSFCell.getStringCellValue(XSSFCell.java:342) ~[poi-ooxml-4.1.2.jar:4.1.2]
          at org.nuxeo.ecm.core.convert.plugins.text.extractors.XLX2TextConverter.appendTextFromCell(XLX2TextConverter.java:90) ~[nuxeo-core-convert-plugins-2021.37.4.jar:?]
          at org.nuxeo.ecm.core.convert.plugins.text.extractors.XLX2TextConverter.convert(XLX2TextConverter.java:73) ~[nuxeo-core-convert-plugins-2021.37.4.jar:?]
          at org.nuxeo.ecm.core.convert.service.ConversionServiceImpl.convert(ConversionServiceImpl.java:340) ~[nuxeo-core-convert-2021.37.4.jar:?]
          at org.nuxeo.ecm.core.convert.plugins.text.extractors.FullTextConverter.convert(FullTextConverter.java:70) ~[nuxeo-core-convert-plugins-2021.37.4.jar:?]
          at org.nuxeo.ecm.core.convert.service.ConversionServiceImpl.convert(ConversionServiceImpl.java:340) ~[nuxeo-core-convert-2021.37.4.jar:?]
          at org.nuxeo.ecm.core.storage.FulltextExtractorWork.blobToText(FulltextExtractorWork.java:281) ~[nuxeo-core-storage-2021.37.4.jar:?]
          at org.nuxeo.ecm.core.storage.FulltextExtractorWork.joinText(FulltextExtractorWork.java:310) ~[nuxeo-core-storage-2021.37.4.jar:?]
          at org.nuxeo.ecm.core.storage.FulltextExtractorWork.extractAndUpdateBinaryText(FulltextExtractorWork.java:236) ~[nuxeo-core-storage-2021.37.4.jar:?]
          at org.nuxeo.ecm.core.storage.FulltextExtractorWork.extractAndUpdate(FulltextExtractorWork.java:188) ~[nuxeo-core-storage-2021.37.4.jar:?]
          at org.nuxeo.ecm.core.storage.FulltextExtractorWork.work(FulltextExtractorWork.java:143) ~[nuxeo-core-storage-2021.37.4.jar:?]
          at org.nuxeo.ecm.core.work.AbstractWork.runWorkWithTransaction(AbstractWork.java:524) [nuxeo-core-event-2021.37.4.jar:?]
          at org.nuxeo.ecm.core.work.AbstractWork.run(AbstractWork.java:387) [nuxeo-core-event-2021.37.4.jar:?]
          at org.nuxeo.ecm.core.work.WorkHolder.run(WorkHolder.java:57) [nuxeo-core-event-2021.37.4.jar:?]
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
          at java.lang.Thread.run(Thread.java:829) [?:?]
      

       

      Expected behavior: the fulltext extraction should not crash on a cell type mismatch. XLX2TextConverter should log a warn with the exception and continue to extract the text from the other cells.

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: