Entity matching
Match data to an external data source
Last updated
Match data to an external data source
Last updated
Entity fields match information on a source document to user-provided reference data. This lets you establish a link between documents and records from an existing business database or directory.
A common use-case for entity matching is to link an invoice issuer to a supplier database. Entity fields automatically learn a fuzzy-match between information on the document (e.g. a Supplier Name, Address or Business number) and reference data fields you've uploaded. The reference data is then returned as a standard prediction result, allowing you to build on these matches to power complex automation rules, derived field values and validation checks.
To configure and use an entity match field, the following basic process applies:
Upload entity data to Sypht via the entity storage API
Configure and train an entity match field
Extract the entity field from documents to establish a match
As entities are pushed from your database to Sypht there is no need to open up network or API access into secure internal data stores. You have complete control over what data is pushed and when.
When an entity field is extracted on a document, data is matched against all available entities in the Sypht data store at that time. As reference data changes over time you can push smaller differential updates via the entity storage API to replicate addition, removal or modification of entities.
We recommend establishing a regular automated ETL process to keep reference data in sync.
This section explains the available API endpoints for storage, retrieval and search of entity data. Entities are stored in isolated collections for a given company_id
and within that company there may be multiple distinct entity types (e.g. there may be distinct entity types for supplier
and the receiving office
; each having distinct attributes and match logic).
Checkout the open-source Sypht Python Client on GitHub for a reference implementation using the entities API.
PUT storage/{company_id}
/entity/{entity_type}
/{entity_id}
Path Parameters:
company_id
your Sypht Company Id
entity_type
type of entity e.g. supplier, vehicle or employee
entity_id
a unique identifier for the entity
Request Body:
JSON-encoded entity data
An object with keys and values representing attributes of the entity
Complex json data structures (e.g. nested objects or lists) may be stored but are currently not supported as reference fields for search and match
Empty values should be represented as null
POST storage/{company_id}
/bulkentity/{entity_type}
/
Path Parameters:
company_id
your Sypht Company Id
entity_type
type of entity e.g. supplier, vehicle or employee
Request Body:
JSON-encoded list of objects with an entity_id
and data
to be store
DELETE storage/{company_id}
/entity/{entity_type}
/{entity_id}
Path Parameters:
company_id
your Sypht Company Id
entity_type
type of entity e.g. supplier, vehicle or employee
entity_id
a unique identifier for the entity
GET storage/{company_id}
/entity/{entity_type}
/{entity_id}
Path Parameters:
company_id
your Sypht Company Id
entity_type
type of entity e.g. supplier, vehicle or employee
entity_id
a unique identifier for the entity
Response Body:
POST storage/{company_id}
/entitysearch/{entity_type}
/
Path Parameters:
company_id
your Sypht Company Id
entity_type
type of entity e.g. supplier, vehicle or employee
Request Body:
JSON-encoded string containing exact
and fuzzy
match search constraints
Each of these should be an object with keys denoting an attribute to search against and value denoting the query string to search for
e.g. To search for exact match for "rego_no" == "qwer"
Response Body:
POST storage/{company_id}
/entitysearch/{entity_type}
/by_id
Path Parameters:
company_id
your Sypht Company Id
entity_type
type of entity e.g. supplier, vehicle or employee
Request Body:
JSON-encoded list of objects with entity_id
Response Body:
GET storage/{company_id}
/entitysearch/{entity_type}
Path Parameters:
company_id
your Sypht Company Id
entity_type
type of entity e.g. supplier, vehicle or employee
Query Parameters:
page
page token, if None (not provided) will return first page by default, otherwise request for specified page which would be grabbed from next_page
of previous response
limit
maximum count for responded entity_ids
Response Body:
pip install sypht
This client method is a wrapper to loop over pagination endpoint to get all entity_ids
for specified entity_type
Returns list of objects if verbose (by default)
[{"entity_id": "id_0"}, {"entity_id": "id_1"}, ...]
Returns list of entity_id if not verbose
["id_0", "id_1", ...]