Real-time validation (sypht.validate)

Add real-time human-in-the-loop validation to predictions. (coming soon to v2)

Validation workflows are a BETA feature and subject to change. This guide is under-construction.

How it works

Validation rules

With v2 validation rules, checks are constructed with a range of different validation conditions. Where in v1 you had a set of available condition types, such as a field-confidence-range condition with a specified minimum; with v2 you would achieve the same result by using a less than condition comparing the prediction confidence with a constant threshold value.

For example, here is a v1 validation rule that would create a task if the bpay.crn field's confidence is below 50%:

{
    "type": "field-confidence-range",
    "field": "bpay.crn",
    "min": 0.5
}

And here is a v2 validation rule that would perform the same check:

{
    "condition": {
        "type": "less-than",
        "values": [
            "$.predictions.['sypht.bpay.crn'].confidence",
            0.50
        ],
        "compare_as": "numeric"
    },
    "note": "Low confidence",
    "field_ids": [
        "sypht.bpay.crn"
    ]
}

While this may initially appear more complicated or verbose, the benefit is that simple rules can be combined with and, or and not operations to build more complicated validation rules, which can then be used to get users to check one or more fields.

The rules schema consists of an array of one or more checks. Each check must contain the following properties:

  • condition: the condition that determines if the check will trigger a validation task

  • note: A note that will appear in the task UI, next to the fields that need to be reviewed

  • field_ids: the fields that will be highlighted in task view if the conditions for the check are satisfied.

Conditions can be grouped into 2 categories: Logic operators and comparison operators.

Logic operators consist of and, or and not condition types and will contain a nested array of conditions (or a single condition in the case of the not type). eg:

{
    "checks":[
        {
            "condition":{
                "type":"and",
                "conditions":[{},{},...]
            },
            "field_ids":[...],
            "note":"..."
        },
        {
            "condition":{
                "type":"or",
                "conditions":[{},{},...]
            },
            "field_ids":[...],
            "note":"..."
        },
        {
            "condition":{
                "type":"not",
                "condition":{}
            },
            "field_ids":[...],
            "note":"..."
        }
    ]
}

Comparison operators consist of the greater-than, less-than and equal types for comparing 2 different values, and the is-set and has-value types to check if a variable has been initialised and/or has a value.

In almost all cases you will want your conditions to check predicted values in a document, these will be available on the reserved $.predictions workflow variable which is an associated array using field ids as keys. The values in the predictions array are objects that match the schema found in the results endpoint with confidence, value and value_norm properties

{
    "checks"[
        {
            "condition":{
                "type":"greater-than",
                "values":[
                    "$.predictions['sypht.invoice.total'].value_norm",
                    10000
                ],
                "compare_as":"numeric"
            },
            "field_ids":[...],
            "note":"..."
        },
        {
            "condition":{
                "type":"less-than",
                "values":[
                    "$.predictions['sypht.invoice.total'].confidence",
                    0.9
                ],
                "compare_as":"numeric"
            },
            "field_ids":[...],
            "note":"..."
        },
        {
            "condition":{
                "type":"equal",
                "values":[
                    "$.predictions['sypht.issuer.abn'].value_norm",
                    "12123456789"
                ],
                "compare_as":null
            },
            "field_ids":[...],
            "note":"..."
        },
        {
            "condition":{
                "type":"is-set",
                "path":"$.predictions['sypht.invoice.total'].value_norm"
            },
            "field_ids":[...],
            "note":"..."
        },
        {
            "condition":{
                "type":"has-value",
                "item":"$.predictions['sypht.invoice.total'].value_norm"
            },
            "field_ids":[...],
            "note":"..."
        }
    ]
}

1. Store rules in sypht

PUT /workflows/validation_rules/{rules_id}

rules_id is a self-assigned id for the rule, any string value is valid. Note the id for use in subsequent requests.

The post body consists of data and schema properties at the top level and the checks array nested within data:

Request body:
{
    "data":{
        "checks":[
            {
                "condition":{
                    "type":"greater-than",
                    "values":[
                        "$.predictions['sypht.invoice.total'].value_norm",
                        10000
                    ],
                    "compare_as":"numeric"
                },
                "field_ids":[...],
                "note":"..."
            },
            {
                "condition":{
                    "type":"less-than",
                    "values":[
                        "$.predictions['sypht.invoice.total'].confidence",
                        0.9
                    ],
                    "compare_as":"numeric"
                },
                "field_ids":[...],
                "note":"..."
            }
        ]
    },
    "schema":true
}

2. Invoke sypht.validate workflow

POST /workflows/sypht.validate/jobs

Request body:
{
    "inputs":{
        "file_id":"73a958f8-61e9-4c44-82bc-05ab7db95de2",
        "product_id":"ndis-claims:2",
        "rules_id":""
    }
}

file_id is the file id returned by the file upload endpoint

product_id is the id of the product you want to extract and validate with. Note that with the validate workflow you can only supply a single product id at a time. Product ids for subscribed products can be found in the sypht marketplace.

rules_id is the id for a validation rules config that you have stored in sypht using the rules storage endpoint

Response
{
    "job": {
        "id": "", //uuid
        "company_id": "", //uuid
        "created": "2023-03-28T22:36:03.911887",
        "workflow_id": "", //worflowId 
        "inputs": {
            ...
            //key-value pairs of workflow inputs
        },
        "settings": {
            // key-value pair of your sypht settings
            ...
        },
        "usage": null,
        "file_id": "",
        "status": "new",
        "version": 3
    },
    "message_id": "" //uuid
}

3. Complete review tasks

Go to your tasks page at https://app.sypht.com/tasks. If a task has been created as per your validation rules, it will be listed here.

Click on the task to open it and in the task view screen the fields to be reviewed (as per valiation rules) will be highlighted for review

4. Collecting results

While tasks are in progress, the results endpoint will block requests for up-to 30 seconds and return an IN PROGRESS status.

GET /result/final/{file_id}

Response
{
    "fileId": "1111111-6152-467c-9182-f07223d057cb",
    "status": "IN PROGRESS"
}

Once the task is complete, or if the predictions passed all validation checks, results will be returned normally.

Response
{
    "fileId": "1111111-6152-467c-9182-f07223d057cb",
    "status": "FINALISED",
    "results": {
        "timestamp": "2020-08-20T03:30:09.703Z",
        "fields": [
            {
                "name": "vehicle.odometerKm",
                "value": "148500",
                "confidence": 0.9958282699555642,
                ...
            },
            ...
        ]
    }
}

Last updated