Torch provides a data reconciliation policy feature that ensures consistency in data quality between data assets. This means that data quality can be established, validated, and maintained for data contained in different assets. This qualifies for data of similar types, or between/among any assets that can be profiled
Data quality is a measure of data based on its accuracy, consistency, reliability and how current it is. Data is considered high quality if it is accurate and aligns with its intended operational and decision-making purposes. When data in different sources are used together, poor quality data from one source can “contaminate” data in the other sources with which it connects.
The data reconciliation policy provides flexibility for users to be able to customize based on specific data they want to use for each source. It allows data teams to join both data sources based on the ID column provided in a source and evaluates the condition provided in the data columns selected by the user. If any one or more of the provided conditions on columns fail then, the policy execution fails. Greater than/less than operators are also allowed.
Acceldata allows checking the integrity of the migrated data by comparing the source and target datasets. It also helps you do RCA on migrated workloads that are not functioning as expected.
The following steps explain how to configure a reconciliation policy in Torch:
- Click Discover from the top menu bar. The Discover window is displayed.
- Search for an asset by its name in the search bar.
- On finding the asset, click the stacked, three-dot icon and click Add Reconciliation Policy from the drop-down list. This asset is added as the Source Asset for the reconciliation policy.
- Similarly, search for a second asset to be added as the Sink Asset, click thestacked, three-dot icon and click Add Reconciliation Policy from the drop-down list. This asset will be added to the reconciliation panel.
- Click the Continue button on the reconciliation panel. The Create Reconciliation Policy window is displayed.
The following tabs are displayed in the Create Reconciliation Policy window:
- Source & Sink Asset Info
- Sample Data
- Rule Definitions
- Check Incrementally
- Schedule Execution
- Alert Configurations
The following section explains the panels of the Create Reconciliation Policy window:
Source & Sink Asset Info
The Source & Sink Asset Info panel displays the hierarchy of both the source asset and the sink asset, along with all the tags of the assets, that are generated when crawled. To add tags, click the Add Tag button.
Enter a name for the reconciliation policy, inside the Info panel. Also, specify a description for the policy. To add tags to the policy, click on Add Tag.
The Sample Data panel displays all the columns of the source and sink asset. Accordingly, select the columns for which you would like to add rule definitions. Only the selected columns will appear while trying to add a rule definition. If none of the columns are selected, then all columns will appear while adding a rule definition.
Select the type of reconciliation match from the drop-down menu:
- Data Equality: Data equality match is a reconciliation policy where the system joins both data sources based on the id column provided and tries to evaluate the condition provided on the selected columns. If any one or more of the provided conditions on columns fail then, the policy execution fails.
- Hashed Data Equality: Hashed data equality match in a reconciliation policy where the system joins both data sources based on the id column provided and tries to compute a hash for the complete row on each side. It then equates both the computed hashes, and if they do not match, the policy execution fails.
- Profile Equality Match: Profile equality match is a reconciliation policy where the system fetches the profile of the data from both the sides independently and then compares them. If the profile does not match, the policy execution fails.
Also, select the values for the following properties:
- Left Column: Select the column name from the left column.
- Operator: Select an operation to compare the left hand column asset with the right hand column asset. The operators available are Equal, Not Equal, Greater Than or Equal, Greater Than, Less Than or Equal and Less Than.
- Right Column: Select a column name from the right column.
Check the Join Column checkbox to join both the columns. Check the Ignore null values? checkbox to ignore any null values in the columns.
Success Threshold: Specify a value for the Success Threshold, ranging from 0 to 100. If the quality score is less than the success threshold then, the policy fails.
Warning Threshold: Specify a value for the Warning Threshold, ranging from 0 to 100. If the quality score exceeds the warning threshold, then the status of the policy is displayed as a "WARNING".
Click the two-circled icon to incrementally check the conditions by selecting one of the following incremental strategies and specify required values accordingly:
Click the two-circled icon to schedule execution. To schedule, select any tag like minute, hour, day, week, month, or year. Enable the Start Scheduler Runs toggle.
Click the Alert on drop-down button to select whether to receive notifications only on error, warning or success of the rule execution. Select one or more of the following channels to receive alerts when the reconciliation policy has succeeded or when an error has occurred:
- Email: Email notifications is sent to your default email. Additional mail recipient can be added to also receive alerts.
- Slack: Slack notifications are sent to the default channel. Additional channel can also be added.
- Webhook: Webhook notifications are sent every time a rule execution fails or succeeds.
Click the Enable Policy icon to start receiving alerts. To configure the notification channels, see here.
Click the Save Policy button.
To view the details of the reconciliation policy execution result, learn more here.
Photo by Ashkan Forouzani on Unsplash