Upload and Extract (v1)

Make predictions on uploaded documents

Getting Started

Making a prediction to extract data via the Sypht API has two basic steps:

  1. Upload a document and specify the prediction workflow options

  2. Retrieve the results

File upload requests return a fileId which can be used to refer to that file in subsequent requests and workflows; for example -- when retrieving the extraction results later.

Extraction workflows are asynchronous. You may upload multiple files in parallel before fetching the results. However, result fetch is synchronous by default and requests will block until pending workflows on the requested file have completed. This behaviour can be altered via requests parameters, however we recommend the default behaviour for both speed and performance.

1. Upload POST /fileupload/

This guide will demonstrate how to make uploads and predictions using API calls in Postman. Please see our provided clients to do this programmatically.

Request headers

  • Content-Type = multipart/form-data

  • Accept= application/json

  • Authorization= Bearer {{access token}}

Request Payload

Field extraction parameters

There are three ways to specify fields to be extracted in the request payload. Only one of the following must be included in the request payload:

  1. (preferred) products: list(string) of AI product_ids which can be found under marketplace under the documentation tab of each product. e.g. ['invoices', 'ndis-claims']

  2. fieldSets: list(string) of individual fieldsets to be extracted. e.g. ['sypht.invoice', sypht.bank']

  3. fieldSet: (string) of a single fieldset to extract

File Upload parameter

  • fileToUpload: file binary of a PDF, PNG or JPG to extract fields from. Maximum file size supported is 20Mb with maximum page limit of 16.

File can also be uploaded as a base64 string in the payload under the /fileupload/json endpoint with fileToUpload being a base64 encoded string of a document PDF, JPG, PNG, etc.

Other parameters

  • tags: An optional string tag to identify a set of documents for convenience

  • workflowId <todo>

  • workflowOptions <todo>

  • locale <todo>

Response

{
    "fileId": "f15ead36-dd97-4e1e-8db8-cad0fb9f5e07",
    "uploadedAt": "2020-11-02T01:14:43.542Z",
    "status": "RECEIVED"
}

The fileId in the response is the unique identifier used for the uploaded document which can be used to retrieve the result of the document in the following section.

2. GET /result/final/<fileId>

After submitting a /fileupload/ request, the extraction request is queued asynchronously and results can be retrieved via a GET request to /results/<fileId> with the fileId found in the response.

For the above example, a GET request to api.sypht.com/result/final/f15ead36-dd97-4e1e-8db8-cad0fb9f5e07

{
    "fileId": "f15ead36-dd97-4e1e-8db8-cad0fb9f5e07",
    "status": "FINALISED",
    "results": {
        "timestamp": "2020-11-02T01:15:10.535Z",
        "fields": [
            {
                "name": "issuer.businessNo",
                "value": "13313135031",
                "confidence": 0.9869603735738959,
                "boundingBox": {
                    "pageNum": 1,
                    "topLeft": {
                        "x": 0.8033333333333334,
                        "y": 0.12333333333333334
                    },
                    "bottomRight": {
                        "x": 0.92152943176447052,
                        "y": 0.17342135454545454
                    },
                    "tokenIds": [
                        13
                    ]
            }...
            {
                "name": "recipient.referenceNo",
                "value": null,
                "confidence": 0.909761641998803,
                "boundingBox": null
            }...
        ]
}

In the response json, extractions are returned under results > fields for each of the fields in the AI product selected in the request (see marketplace for details of each field).

Last updated