# Line Items

### What are line-items?

The `lineitems` data type is used for fields that extract tabular information for a specific type of table and pre-defined columns. There are many different `lineitems` fields tailored to different extraction use-cases.

For example, the `invoice.lineitems` field captures tables containing invoice line items, while the `statement.transactions` field returns credit and debit transaction rows from bank and credit card statements.

{% hint style="info" %}
Specific AI products may contain additional functionality on top table basic extraction for a given use-case. For example,`ndis.lineitems` includes inference of Support Item Reference Numbers from line level description text. Always prefer the best matching `lineitems`field to your use-case over generic table extraction (i.e.`generic.table)`when available.
{% endhint %}

### Basic data structure

Fields with the `lineitems` data type return a common data structure. Each prediction is a list of tables, one for each table found in the source document. Each of these tables is a JSON object with three keys:

```javascript
{
   "types": [...],
   "headers": [...],
   "cells": [[...], ...],
}
```

* `types` aligns each extracted column to a specific column type.
* `headers` identify the specific text and position of header cells within the source document.
* `cells` contain the content of the table arranged as an array-of-arrays; organised rows by columns.

The following sections unpack each part of this data structure in detail.

#### Types

Each item in the `types` array contains a type identifier and confidence score for the corresponding table column. These identifiers can be used to interpret the corresponding cell content for that column in the table. For example, a column labelled "Item Price" might be classified as a `sypht.invoice.lineitems.unitPrice` column) and contain prices for each listed item.

A type of `null` indicates the corresponding column does not match a pre-defined column type for the field. Header and cell content is still returned for these columns.

```javascript
[
    {
        "type": "sypht.invoice.lineitems.unitPrice",
        "score": 0.97207
    },
...
]
```

#### Headers

Headers encode information about the header cells detected on each table. In general `headers` are not needed to interpret the content of the table for a `lineitems` field, but may be useful to understand the content of non-aligned columns and how the data was originally presented in the source document.

Each object contains `text` and `bounds` information used to locate headers in the source.

```javascript
{
    "id": "table.label.head:price:69",
    "type": "table.label",
    "text": "ITEM PRICE",
    "page_idx": 0,
    "tokens": [
        "ITEM",
        "PRICE"
    ],
    "bounds": {
        "pageNum": 1,
        "topLeft": {
            "x": 0.6981132075471698,
            "y": 0.4591194968553459
        },
        "bottomRight": {
            "x": 0.7458379578246392,
            "y": 0.47570040022870214
        },
        "tokenIds": [
            68,
            69
        ]
    },
},
```

#### Cells

Each element in the `cells` array represents a row, and each row contains one item per column in the table. Row items may be `null` indicating an empty cell for a given column. Rows with no extracted cells are omitted from the output.

When cells are present they contain a similar data structure to headers. This includes both `text` and `bounds` information.

```javascript
"cells": [
    [
        null,
        {
            "type": "table.value",
            "text": "50.00",
            "page_idx": 0,
            "tokens": [
                "50.00"
            ],
            "bounds": {
                "pageNum": 1,
                "topLeft": {
                    "x": 0.5268220495745468,
                    "y": 0.49914236706689535
                },
                "bottomRight": {
                    "x": 0.6182019977802442,
                    "y": 0.5105774728416238
                },
                "tokenIds": [
                    87
                ]
            }
        },
        ...
    ],
    [
        ...
    ]
]
```

### Examples

Line-item fields are a densely packed source of structured information. While there is a lot of information available, it's usually quite simple in practice to pull out the specific information you need.

Here we provide an end-to-end example uploading a document using the Sypht API and interpreting the results of a `lineitems` field in Python.  We utilise the `pandas` library to format tabular results.

```python
import pandas as pd
from sypht.client import SyphtClient
sypht = SyphtClient()

# upload a document and run the invoices product
with open('invoice.png', 'rb') as f:
    doc_id = sc.upload(f, products=["invoices"])

# collect the extraction results
results = sc.fetch_results(doc_id)

# grab the lineitems field in this case
tables = results['invoice.lineitems']

for table in tables:
    # construct a DataFrame representing the table using the source doc headers
    df = pd.DataFrame(
        [
            [cell['text'] for cell in row]
            for row in table['cells']
        ],
        columns=[header['text'] for header in table['headers']]
    )
    print(df)
```

Depending on the input file content, this sample produces a DataFrame with original document headers for columns and cell content in each row, e.g.:

| Date     | Product Description | Misc. | Total ($/AUD) |
| -------- | ------------------- | ----- | ------------- |
| 1/1/2020 | Foo                 | Hello | $50.00        |
| 1/1/2021 | Bar                 | World | $100.00       |

Alternately we can use the aligned column types rather than raw text to construct a DataFrame like so:

```python
df = pd.DataFrame(
    [
        [cell['text'] for cell in row]
        for row in table['cells']
    ],
    columns=[header['type'] for header in table['types']]
)
```

This produces an equivalent table with columns aligned to specific `invoice.lineitems` types:

| `invoice.lineitems.date` | `invoice.lineitems.description` | `null` | `invoice.lineitems.total` |
| ------------------------ | ------------------------------- | ------ | ------------------------- |
| 1/1/2020                 | Foo                             | Hello  | $50.00                    |
| 1/1/2021                 | Bar                             | World  | $100.00                   |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.sypht.com/sypht/field-types/line-items.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
