Signals
Get to know Signals field concepts and output data formats
Signals are a unique type of field which leverage previously extracted data to derive aggregated metrics or values. Depending on the signal type, you may compute statistics of your own document data set or a shared pool of aggregated information.
Sample use-cases for signal field types include:
- Comparing the Total on an invoice to the historical average to detect anomalies
- Detecting document duplicates by searching for old data matching key fields
- Computing the probability of observed fields combinations to detect potential fraud
Signals are currently available for access via private Beta.
They come in two forms:
Probability models compute the probability of observing the extracted field values with respect to others values present in the document. In the example below, we calculate the probability of the observed bank account details appearing for the detected Australian Business Number (ABN).
This can be a useful signal for detecting common invoice fraud where a third party replaces legitimate payment information for a known invoice issuer with their own payment details. These new details will not match historical payment information for this issuer and therefor return a low probability of being legitimate.

Sample config for computing the conditional probability of an observed value
observed
the set of fields being observed for probability estimatesconditioned
the set of fields being conditioned on
A
float
value indicating the probability P(observed|conditioned)
computed from historical data. Field confidence for probability models is correlated with the size of the document dataset being searched. More source data yields higher-confidence probability estimates.
Document match models dynamically search and match previously uploaded documents based on values extracted on a query document. This can be used to power a 3-way match of invoice to purchase and delivery documentation; or in the example below, to detect near-duplicates such and avoid the invoice being processed multiple times:

Sample configuration to match documents by reference number and date
fields
a list of field IDs o match againstexact
boolean value indicating whether to use fuzzy or exact value matching
Document match searches all previously uploaded documents where the match fields have been extracted.
A
list
of documentreference
values. One for each matched document.For example:
[
{
"file_id": "aaaaaa-bbbbb-..."
},
{
"file_id": "cccccc-ddddd-..."
}
...
]
Statistical signals return basic metrics for numerical field values with respect to historical data. This can be used for data analysis and benchmarking of extractions against historical values or market data precedents.
For example, configuring a value statistic signal over the
invoice.total
field, you may asses the percentile_rank
of the Total on an invoice and use this in combination with field conditions to dynamically flag invoices with unusually high values (e.g. 99th percentile) for human review.
Sample configuration to extract field value statistics for the invoice.total field
source
the field identifier to compute historical metrics for. Only fields with numerical data types are accepted.
A dictionary containing descriptive metrics including:
min
the minimum observed value of this field in previously extracted datamax
the maximum observed value of this field in previously extracted dataavg
the average or mean observed value for this field in previously extracted datavariance
the numerical variance of this field in previously extracted datapercentile_rank
the percentile rank of the observed value observed with respect to previously extracted data
{
"min": -594.30,
"max": 275474.84,
"avg": 1961.09,
"variance": 5829136.35,
"percentile_rank": 70.21478
}
Last modified 2yr ago