# Line Items

### What are line-items?

The `lineitems` data type is used for fields that extract tabular information for a specific type of table and pre-defined columns. There are many different `lineitems` fields tailored to different extraction use-cases.

For example, the `invoice.lineitems` field captures tables containing invoice line items, while the `statement.transactions` field returns credit and debit transaction rows from bank and credit card statements.

{% hint style="info" %}
Specific AI products may contain additional functionality on top table basic extraction for a given use-case. For example,`ndis.lineitems` includes inference of Support Item Reference Numbers from line level description text. Always prefer the best matching `lineitems`field to your use-case over generic table extraction (i.e.`generic.table)`when available.
{% endhint %}

### Basic data structure

Fields with the `lineitems` data type return a common data structure. Each prediction is a list of tables, one for each table found in the source document. Each of these tables is a JSON object with three keys:

```javascript
{
   "types": [...],
   "headers": [...],
   "cells": [[...], ...],
}
```

* `types` aligns each extracted column to a specific column type.
* `headers` identify the specific text and position of header cells within the source document.
* `cells` contain the content of the table arranged as an array-of-arrays; organised rows by columns.

The following sections unpack each part of this data structure in detail.

#### Types

Each item in the `types` array contains a type identifier and confidence score for the corresponding table column. These identifiers can be used to interpret the corresponding cell content for that column in the table. For example, a column labelled "Item Price" might be classified as a `sypht.invoice.lineitems.unitPrice` column) and contain prices for each listed item.

A type of `null` indicates the corresponding column does not match a pre-defined column type for the field. Header and cell content is still returned for these columns.

```javascript
[
    {
        "type": "sypht.invoice.lineitems.unitPrice",
        "score": 0.97207
    },
...
]
```

#### Headers

Headers encode information about the header cells detected on each table. In general `headers` are not needed to interpret the content of the table for a `lineitems` field, but may be useful to understand the content of non-aligned columns and how the data was originally presented in the source document.

Each object contains `text` and `bounds` information used to locate headers in the source.

```javascript
{
    "id": "table.label.head:price:69",
    "type": "table.label",
    "text": "ITEM PRICE",
    "page_idx": 0,
    "tokens": [
        "ITEM",
        "PRICE"
    ],
    "bounds": {
        "pageNum": 1,
        "topLeft": {
            "x": 0.6981132075471698,
            "y": 0.4591194968553459
        },
        "bottomRight": {
            "x": 0.7458379578246392,
            "y": 0.47570040022870214
        },
        "tokenIds": [
            68,
            69
        ]
    },
},
```

#### Cells

Each element in the `cells` array represents a row, and each row contains one item per column in the table. Row items may be `null` indicating an empty cell for a given column. Rows with no extracted cells are omitted from the output.

When cells are present they contain a similar data structure to headers. This includes both `text` and `bounds` information.

```javascript
"cells": [
    [
        null,
        {
            "type": "table.value",
            "text": "50.00",
            "page_idx": 0,
            "tokens": [
                "50.00"
            ],
            "bounds": {
                "pageNum": 1,
                "topLeft": {
                    "x": 0.5268220495745468,
                    "y": 0.49914236706689535
                },
                "bottomRight": {
                    "x": 0.6182019977802442,
                    "y": 0.5105774728416238
                },
                "tokenIds": [
                    87
                ]
            }
        },
        ...
    ],
    [
        ...
    ]
]
```

### Examples

Line-item fields are a densely packed source of structured information. While there is a lot of information available, it's usually quite simple in practice to pull out the specific information you need.

Here we provide an end-to-end example uploading a document using the Sypht API and interpreting the results of a `lineitems` field in Python.  We utilise the `pandas` library to format tabular results.

```python
import pandas as pd
from sypht.client import SyphtClient
sypht = SyphtClient()

# upload a document and run the invoices product
with open('invoice.png', 'rb') as f:
    doc_id = sc.upload(f, products=["invoices"])

# collect the extraction results
results = sc.fetch_results(doc_id)

# grab the lineitems field in this case
tables = results['invoice.lineitems']

for table in tables:
    # construct a DataFrame representing the table using the source doc headers
    df = pd.DataFrame(
        [
            [cell['text'] for cell in row]
            for row in table['cells']
        ],
        columns=[header['text'] for header in table['headers']]
    )
    print(df)
```

Depending on the input file content, this sample produces a DataFrame with original document headers for columns and cell content in each row, e.g.:

| Date     | Product Description | Misc. | Total ($/AUD) |
| -------- | ------------------- | ----- | ------------- |
| 1/1/2020 | Foo                 | Hello | $50.00        |
| 1/1/2021 | Bar                 | World | $100.00       |

Alternately we can use the aligned column types rather than raw text to construct a DataFrame like so:

```python
df = pd.DataFrame(
    [
        [cell['text'] for cell in row]
        for row in table['cells']
    ],
    columns=[header['type'] for header in table['types']]
)
```

This produces an equivalent table with columns aligned to specific `invoice.lineitems` types:

| `invoice.lineitems.date` | `invoice.lineitems.description` | `null` | `invoice.lineitems.total` |
| ------------------------ | ------------------------------- | ------ | ------------------------- |
| 1/1/2020                 | Foo                             | Hello  | $50.00                    |
| 1/1/2021                 | Bar                             | World  | $100.00                   |
