Get to know Signals field concepts and output data formats
Signals are a unique type of field which leverage previously extracted data to derive aggregated metrics or values. Depending on the signal type, you may compute statistics of your own document data set or a shared pool of aggregated information.
Sample use-cases for signal field types include:
- Comparing the Total on an invoice to the historical average to detect anomalies
- Detecting document duplicates by searching for old data matching key fields
- Computing the probability of observed fields combinations to detect potential fraud
Signals are currently available for access via private Beta.
They come in two forms:
Probability models compute the probability of observing the extracted field values with respect to others values present in the document. In the example below, we calculate the probability of the observed bank account details appearing for the detected Australian Business Number (ABN).
This can be a useful signal for detecting common invoice fraud where a third party replaces legitimate payment information for a known invoice issuer with their own payment details. These new details will not match historical payment information for this issuer and therefor return a low probability of being legitimate.
Sample config for computing the conditional probability of an observed value
observedthe set of fields being observed for probability estimates
conditionedthe set of fields being conditioned on
floatvalue indicating the probability
P(observed|conditioned)computed from historical data.
Field confidence for probability models is correlated with the size of the document dataset being searched. More source data yields higher-confidence probability estimates.
Document match models dynamically search and match previously uploaded documents based on values extracted on a query document. This can be used to power a 3-way match of invoice to purchase and delivery documentation; or in the example below, to detect near-duplicates such and avoid the invoice being processed multiple times:
Sample configuration to match documents by reference number and date
fieldsa list of field IDs o match against
exactboolean value indicating whether to use fuzzy or exact value matching
Document match searches all previously uploaded documents where the match fields have been extracted.
documentreferencevalues. One for each matched document.
Statistical signals return basic metrics for numerical field values with respect to historical data. This can be used for data analysis and benchmarking of extractions against historical values or market data precedents.
For example, configuring a value statistic signal over the
invoice.totalfield, you may asses the
percentile_rankof the Total on an invoice and use this in combination with field conditions to dynamically flag invoices with unusually high values (e.g. 99th percentile) for human review.
Sample configuration to extract field value statistics for the invoice.total field
sourcethe field identifier to compute historical metrics for. Only fields with numerical data types are accepted.
A dictionary containing descriptive metrics including:
minthe minimum observed value of this field in previously extracted data
maxthe maximum observed value of this field in previously extracted data
avgthe average or mean observed value for this field in previously extracted data
variancethe numerical variance of this field in previously extracted data
percentile_rankthe percentile rank of the observed value observed with respect to previously extracted data