Upload Annotation
This endpoint allows you to feed back data to Sypht for training and evaluation. This is to help our models learn faster and also evaluate and compare them to other systems / reference extractions. For reference, here is a working example of the endpoint in our python client.
External company annotations submitted via API are not automatically included in training data set. When submitting annotations using the method described herein, please notify your Sypht representative so that the annotations can be included for training and evaluation. To ensure all submited annotations are added to training/evaluation sets, please provide a list of document IDs which the annotations were submitted for.
The body content of this request should contain a JSON-encoded object with the gold-standard (i.e. human annotated) extraction data in the following format, e.g.:
{
"origin": "external",
"fields": [
{
"id": "issuer.name",
"type": "simple",
"data": {
"value": "John Smith"
}
},
// ... etc for additional fields
]
}
Attribute definitions:
"origin" is the annotation source (use "external" for uploading API annotations)
"fields" is an array of field annotation data to be submitted for each document
"id" is the field ID which the annotation is being submitted for
"type" is the annotation data type used for external annotations submitted via API (use "simple" for uploading API annotations)
"data" attribute contains the annotation value for each field
"value" attribute expects the exact label value in a string format for each field
External annotations are currently supported only for the text type fields (such as issuer.name, invoice.documentDate, invoice.issuerName, invoice.total, etc.). Linetiems (i.e. tables), Classification (such as invoice.currencyCode, etc) and Entity Matching fields are not currently supported for external annotations submitted via API.
Useful hints
Ideally the annotated string values (i.e. labels) should match exactly with the relevant text on a document, so that the annotated value can be associated with the relevant token/s on a document.
However, Sypht applies some post-processing normalisation techniques to the text tokens to facillitate matching document tokens to the relevant label values. Below are some acceptable variations:
  • Value formats for date and time fields are normalised before matching, hence the exact format of the date value is not critical as long as it represents a date or time. Example: "20 Jan 2020" is equivalent to "20/01/2020"
  • Numerical values for currency fields (such as invoice.total, invoice.amountDue, etc) are nomalised into floating point numbers with two decimals. Example: "100" is equivalent to "100.00"
Copy link