LogoLogo
Sypht AppHelp CenterMarketplace
  • Introduction
  • Setup
    • Authentication
  • Upload and Extract
    • Upload and Extract (v2)
    • Upload and Extract (v1)
    • File management
  • Workflows (v1)
    • Real-time validation
    • Smart document split
    • Entity matching
  • Workflows (v2)
    • extraction (sypht.extract)
    • Real-time validation (sypht.validate)
    • Smart document split (sypht.split)
  • Field types
    • Line Items
    • Classification fields
    • Signals
    • Values
  • API Docs
    • API Introduction
    • Authentication
      • POST - Authentication Token
    • Upload Document
      • POST - Upload Document
      • POST - Upload JSON
    • Results
      • GET - Document Results
      • GET - Image Results
    • Upload Annotation
      • PUT - Upload Annotation
Powered by GitBook
On this page
  • How it works
  • Getting started
  • 1. Upload
  • 2. Collect the split results
  • 3. Collect results for each sub-document

Was this helpful?

  1. Workflows (v1)

Smart document split

Add automatic document splitting to workflows.

PreviousReal-time validationNextEntity matching

Last updated 2 years ago

Was this helpful?

How it works

In cases where a single PDF file contains multiple underlying documents, smart-split allows for the automatic detection and segmentation of sub-documents. Even in cases where sub-documents have variable lengths and format.

When a source file is uploaded, it is processed and a corresponding fileId is assigned. The /results/ for the original file will then contain one or more child document fileIds which can then be queried to obtain the corresponding sub-document results.

Document splitting can be used in conjunction with other standard workflows like prediction or validation.

Getting started

To automatically split files on upload, a few changes to the standard fileupload form-data parameters are required:

  • Specify the document-splitting workflow type by setting: workflowId=split

  • Specify a childWorkflowId to define what workflow to run on each generated sub-document

  • Optionally specify childWorkflowOptions to parameterise the workflow run on each generated sub-document

Split workflows are a BETA feature and subject to change. This guide is under-construction.

1. Upload

workflowId = split

workflowOptions

Sample
{
    "prediction": {
        "childWorkflow": "prediction",
        "childWorkflowOptions": {
            "prediction": {
                "fieldSets": ["sypht.invoice"]
             }
        }
    }
}

Response:

{
    "fileId": "00000000-0000-0000-0000-000000000000",
    "uploadedAt": "2020-08-20T03:19:07.319Z",
    "status": "RECEIVED"
}

2. Collect the split results

{
    "fileId": "815c63f6-...-f07223d057cb",
    "status": "FINALISED",
    "results": {
        "fields": [
            {
                "name": "components.children",
                "value": [
                    {"file_id": "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"},
                    {"file_id": "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb"}
                ]
            }
        ]
    }
}

3. Collect results for each sub-document

Response
{
    "fileId": "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa",
    "status": "FINALISED",
    "results": {
        "timestamp": "2020-08-20T03:30:09.703Z",
        "fields": [
            {
                "name": "invoice.total",
                "value": "1485.00",
                "confidence": 0.9958282699555642,
                ...
            },
            ...
        ]
    }
}
Response
{
    "fileId": "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb",
    "status": "FINALISED",
    "results": {
        "timestamp": "2020-08-20T03:30:09.703Z",
        "fields": [
            {
                "name": "invoice.total",
                "value": "2485.00",
                "confidence": 0.99582,
                ...
            },
            ...
        ]
    }
}

Limitations, Errors and Recommendations

  • Uploading a document for the split worflow does not enforce any page limit checks. You may upload a document of any size but recent tests have shown we cannot process more than 50 pages at this time.

  • Any split documents will be checked for page limits. To avoid this scenario please ask to have your page limit increased to your expected maximum.

  • If a split document is rejected due to page or file size limits, the split workflow will eventually be marked as failure. Some split documents my successfully upload however - this is not ideal and can be avoided by increasing your page limit as above.

GET

GET

GET

https://api.sypht.com/result/final/00000000-0000-0000-0000-000000000000
https://api.sypht.com/result/final/aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa
https://api.sypht.com/result/final/bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb
Sample combination of document splitting and validation.