bounds
represents the location of some extraction. It includes the page number, and the coordinates of two vertices—the top left and bottom right—that form a rectangle or bounding box enclosing the area represented by the value. An example:x
and y
coordinates are encoded as floating point numbers. These numbers are ratios relative to the size of the page. For example, if a page had a width of 900 pixels, an x
value of 0.5
would refer to the absolute value of 450. Likewise, if a page had a height of 1200 pixels, a y
value of 0.25
would refer to the absolute value of 300.tokens
representing an extraction from a document that contains a single token (i.e. word or unit). It contains information such as the characters that make up the token (with or without white space), the location of the token on the page (via a nested encoded bounds Value), and the index of the token relative to the rest of the tokens in the document. The index is useful for creating an ordered list of tokens. Here is an example of an encoded token:tokens
value represents a sequence of tokens that form a single extraction. In their encoded form, they contain text, an encoded bounds and a list of encoded tokens. For example:entity
value represents an Entity, and is the output of an entity match field prediction. An entity has an id, entity type, company_id, and data. These values are representations of the entity match reference data that were sent to Sypht via the entities API (see here for further information on entity matching and the entity API). Here is an example of an encoded entity:ndis-support-category
and has unique identifier 07_101_0106_6_3
. It came from the Sypht company (9333c845-875b-47d7-bb35-0873770f23d5
), and the included data has keys and values that describe this particular entity.list
value is a generic ordered collection of values. Each item within the list is itself an encoded value. A list should contain items each of the same value type.table
represents a prediction that has extracted a table from a document. An encoded table contains a list of columns:column
, and derived-column
.column
is a column whose data has been extracted directly from the document (as opposed to inferred or derived). Most columns fall into this class. A column has a category, header, and a list of cells. The category explains what category of column it is. This is especially useful for line-item extractions from invoices. The list of currently available line-item categories is:header
is an encoded "tokens" value identifying the tokens that make up the header of this column. The "cells" are a list of encoded values, the type of which depend on the column category. For example, a column with category "description" would contain cells each of value type "tokens". (Note: currently all columns regardless of category have cells of value type tokens. This could be improved by using "date" values for "date" columns, etc.). Here is an example of an encoded column: