# Entity matching

## How it works

Entity fields match information on a source document to user-provided reference data. This lets you establish a link between documents and records from an existing business database or directory.

A common use-case for entity matching is to link an invoice issuer to a supplier database. Entity fields automatically learn a fuzzy-match between information on the document (e.g. a Supplier Name, Address or Business number) and reference data fields you've uploaded. The reference data is then returned as a standard prediction result, allowing you to build on these matches to power complex automation rules, derived field values and validation checks.

## Getting started

To configure and use an entity match field, the following basic process applies:

1. Upload entity data to Sypht via the entity storage API
2. Configure and train an entity match field
3. Extract the entity field from documents to establish a match

As entities are pushed *from* your database *to* Sypht there is no need to open up network or API access into secure internal data stores. You have complete control over what data is pushed and when.

![Overview of the basic entity matching process.](https://2266347786-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MFTZ3y4gwn_iaPw5B8H%2F-MH71M6xdY_KuEkg3vQX%2F-MH71ZR58wckbW78z1GI%2Fimage.png?alt=media\&token=9dec81b6-7364-4828-b341-3db52517b0cb)

#### Keeping reference data in sync

When an entity field is extracted on a document, data is matched against all available entities in the Sypht data store at that time. As reference data changes over time you can push smaller differential updates via the entity storage API to replicate addition, removal or modification of entities.

{% hint style="info" %}
We recommend establishing a regular automated ETL process to keep reference data in sync.
{% endhint %}

## **Using the entities API**

This section explains the available API endpoints for storage, retrieval and search of entity data. Entities are stored in isolated collections for a given `company_id` and within that company there may be multiple distinct entity types (e.g. there may be distinct entity types for `supplier` and the receiving `office` ; each having distinct attributes and match logic).

{% hint style="info" %}
Checkout the open-source [Sypht Python Client](https://github.com/sypht-team/sypht-python-client) on GitHub for a reference implementation using the entities API.
{% endhint %}

### Pushing entities

**PUT** storage/`{company_id}`/entity/`{entity_type}`/`{entity_id}`

Path Parameters:

* `company_id` your Sypht Company Id
* `entity_type` type of entity e.g. supplier, vehicle or employee
* `entity_id` a unique identifier for the entity

Request Body:

* JSON-encoded entity data
  * An object with keys and values representing attributes of the entity
  * Complex json data structures (e.g. nested objects or lists) may be stored but are currently not supported as reference fields for search and match
  * Empty values should be represented as null

{% code title="Example Supplier Entity data" %}

```javascript
{
   "name": "Joe's Plumbing",
   "address": "123 Water Street, Chippendale, 2008",
   "expense_code": "GL1234",
   "project_id": "PRJ-9876",
   "last_sync": "2021-01-01"
}
```

{% endcode %}

**POST** storage/`{company_id}`/bulkentity/`{entity_type}`/

Path Parameters:

* `company_id` your Sypht Company Id
* `entity_type` type of entity e.g. supplier, vehicle or employee

Request Body:

* JSON-encoded list of objects with an `entity_id` and `data` to be store

```javascript
[
  {
    "entity_id": "id0",
    "data": {
      "rego_no": "qwer",
      "Contract Start": "2020-11-04"
    }
  },
  {
    "entity_id": "id1",
    "data": {
      "rego_no": "wert",
      "Contract Start": "2020-11-05"
    }
  },
  {
    "entity_id": "id2",
    "data": {
      "rego_no": "erty",
      "Contract Start": "2020-11-06"
    }
  }
]
```

### **Removing** entities

**DELETE** storage/`{company_id}`/entity/`{entity_type}`/`{entity_id}`

Path Parameters:

* `company_id` your Sypht Company Id
* `entity_type` type of entity e.g. supplier, vehicle or employee
* `entity_id` a unique identifier for the entity

### Retrieving entities

**GET** storage/`{company_id}`/entity/`{entity_type}`/`{entity_id}`

Path Parameters:

* `company_id` your Sypht Company Id
* `entity_type` type of entity e.g. supplier, vehicle or employee
* `entity_id` a unique identifier for the entity

Response Body:

```javascript
{
   "name": "Joe's Plumbing",
   "address": "123 Water Street, Chippendale, 2008",
   "expense_code": "GL1234",
   "project_id": "PRJ-9876",
   "last_sync": "2021-01-01"
}
```

### **Searching** entities

**POST** storage/`{company_id}`/entitysearch/`{entity_type}`/

Path Parameters:

* `company_id` your Sypht Company Id
* `entity_type` type of entity e.g. supplier, vehicle or employee

Request Body:

* JSON-encoded string containing `exact` and `fuzzy` match search constraints
  * Each of these should be an object with keys denoting an attribute to search against and value denoting the query string to search for
* e.g. To search for exact match for "rego\_no" == "qwer"&#x20;

```javascript
{
    "exact": {},
    "fuzzy": {"name":"Plumbing"}
}
```

Response Body:

```javascript
[
    {
        "item": {
            "name": "Joe's Plumbing",
            "address": "123 Water Street, Chippendale, 2008",
            "expense_code": "GL1234",
            "project_id": "PRJ-9876",
            "last_sync": "2021-01-01"
        },
        "score": 0.5753642
    },
    ...
]
```

### Searching entities by id

**POST** storage/`{company_id}`/entitysearch/`{entity_type}`/by\_id

Path Parameters:

* `company_id` your Sypht Company Id
* `entity_type` type of entity e.g. supplier, vehicle or employee

Request Body:

* JSON-encoded list of objects with `entity_id`

```javascript
[
  {
    "entity_id": "id_0"
  },
  {
    "entity_id": "id_1"
  }
]
```

Response Body:

```javascript
{
    "entities": [
        {
            "entity_id": "id_0",
            "data": {
                "name": "Joe's Plumbing",
                "address": "123 Water Street, Chippendale, 2008",
                "expense_code": "GL1234"
            },
            "error": false
        },
        {
            "entity_id": "id_1",
            "data": null,
            "error": true
        }
    ]
}
```

### Retrieving list of entity\_id

**GET** storage/`{company_id}`/entitysearch/`{entity_type}`

Path Parameters:

* `company_id` your Sypht Company Id
* `entity_type` type of entity e.g. supplier, vehicle or employee

Query Parameters:

* `page`  page token, if None (not provided) will return first page by default, otherwise request for specified page which would be grabbed from `next_page` of previous response
* `limit` maximum count for responded entity\_ids

Response Body:

```javascript
{
    "next_page": "page token",
    "entities": [
        "102013",
        "102015",
        "102019",
        "102034",
        "102051",
        "102056",
        "102057",
        "102068",
        "102072",
        "102074"
    ]
}
```

## Using Sypht Client

{% embed url="<https://github.com/sypht-team/sypht-python-client>" %}

> pip install sypht

### Retrieving all entity\_ids

This client method is a wrapper to loop over pagination endpoint to get **all** `entity_ids` for specified `entity_type`

* Returns list of objects if **verbose** (by default)

  > \[{"entity\_id": "id\_0"}, {"entity\_id": "id\_1"}, ...]
* Returns list of entity\_id if not verbose

  > \["id\_0", "id\_1", ...]

```python
from sypht.client import SyphtClient

sc = SyphtClient('<client_id>', '<client_secret>')
sc.get_all_entity_ids(entity_type='test_type')
```
