# Smart document split

## How it works

In cases where a single PDF file contains multiple underlying documents, smart-split allows for the automatic detection and segmentation of sub-documents. Even in cases where sub-documents have variable lengths and format.

When a source file is uploaded, it is processed and a corresponding `fileId` is assigned. The `/results/` for the original file will then contain one or more child document `fileIds` which can then be queried to obtain the corresponding sub-document results.

Document splitting can be used in conjunction with other standard workflows like prediction or validation.

![Sample combination of document splitting and validation.](https://2266347786-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MFTZ3y4gwn_iaPw5B8H%2F-MHPL2Li39J4tXZ671Q2%2F-MHPLVXnSgpg7JzX_BdV%2Fimage.png?alt=media\&token=cf642b9b-95ad-4e12-84cd-35a9dad1746c)

## Getting started

To automatically split files on upload, a few changes to the standard `fileupload` form-data parameters are required:

* Specify the document-splitting workflow type by setting: `workflowId=split`
* Specify a `childWorkflowId` to define what workflow to run on each generated sub-document
* Optionally specify `childWorkflowOptions` to parameterise the workflow run on each generated sub-document

{% hint style="info" %}
Split workflows are a `BETA` feature and subject to change.\
\
This guide is under-construction.
{% endhint %}

### 1. Upload

`workflowId` = `split`

`workflowOptions`&#x20;

{% code title="Sample" %}

```javascript
{
    "prediction": {
        "childWorkflow": "prediction",
        "childWorkflowOptions": {
            "prediction": {
                "fieldSets": ["sypht.invoice"]
             }
        }
    }
}
```

{% endcode %}

#### Response:

```
{
    "fileId": "00000000-0000-0000-0000-000000000000",
    "uploadedAt": "2020-08-20T03:19:07.319Z",
    "status": "RECEIVED"
}
```

### 2. Collect the split results

GET <https://api.sypht.com/result/final/00000000-0000-0000-0000-000000000000>

```javascript
{
    "fileId": "815c63f6-...-f07223d057cb",
    "status": "FINALISED",
    "results": {
        "fields": [
            {
                "name": "components.children",
                "value": [
                    {"file_id": "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"},
                    {"file_id": "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb"}
                ]
            }
        ]
    }
}
```

### 3. Collect results for each sub-document

GET <https://api.sypht.com/result/final/aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa>

{% code title="Response" %}

```javascript
{
    "fileId": "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa",
    "status": "FINALISED",
    "results": {
        "timestamp": "2020-08-20T03:30:09.703Z",
        "fields": [
            {
                "name": "invoice.total",
                "value": "1485.00",
                "confidence": 0.9958282699555642,
                ...
            },
            ...
        ]
    }
}
```

{% endcode %}

GET <https://api.sypht.com/result/final/bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb>

{% code title="Response" %}

```javascript
{
    "fileId": "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb",
    "status": "FINALISED",
    "results": {
        "timestamp": "2020-08-20T03:30:09.703Z",
        "fields": [
            {
                "name": "invoice.total",
                "value": "2485.00",
                "confidence": 0.99582,
                ...
            },
            ...
        ]
    }
}
```

{% endcode %}

Limitations, Errors and Recommendations

* Uploading a document for the split worflow does not enforce any page limit checks. You may upload a document of any size but recent tests have shown we cannot process more than **50 pages** at this time.
* Any split documents will be checked for page limits. To avoid this scenario please ask to have your page limit increased to your expected maximum.
* If a split document is rejected due to page or file size limits, the split workflow will eventually be marked as failure.  Some split documents my successfully upload however - this is not ideal and can be avoided by increasing your page limit as above.
