lineitemsdata type is used for fields that extract tabular information for a specific type of table and pre-defined columns. There are many different
lineitemsfields tailored to different extraction use-cases.
invoice.lineitemsfield captures tables containing invoice line items, while the
statement.transactionsfield returns credit and debit transaction rows from bank and credit card statements.
lineitemsdata type return a common data structure. Each prediction is a list of tables, one for each table found in the source document. Each of these tables is a JSON object with three keys:
typesaligns each extracted column to a specific column type.
headersidentify the specific text and position of header cells within the source document.
cellscontain the content of the table arranged as an array-of-arrays; organised rows by columns.
typesarray contains a type identifier and confidence score for the corresponding table column. These identifiers can be used to interpret the corresponding cell content for that column in the table. For example, a column labelled "Item Price" might be classified as a
sypht.invoice.lineitems.unitPricecolumn) and contain prices for each listed item.
nullindicates the corresponding column does not match a pre-defined column type for the field. Header and cell content is still returned for these columns.
headersare not needed to interpret the content of the table for a
lineitemsfield, but may be useful to understand the content of non-aligned columns and how the data was originally presented in the source document.
boundsinformation used to locate headers in the source.
cellsarray represents a row, and each row contains one item per column in the table. Row items may be
nullindicating an empty cell for a given column. Rows with no extracted cells are omitted from the output.
lineitemsfield in Python. We utilise the
pandaslibrary to format tabular results.