Data contracts specification¶
Backwards compatibility
While we are in a closed preview state, we are not guaranteeing backwards compatibility. Version 0.0.2
is not backwards compatible with 0.0.1
.
Following is the template for a data contract, where the highlighted lines are mandatory:
---
kind: DataContract # (1)
status: draft # (2)
template_version: 0.0.2 # (3)
dataset: sale_txn # (4)
type: Table # (5)
description: This is the ... # (6)
datasource: snowflake # (7)
owners: # (8)
users:
- jdoe
- jsmith
groups:
- data_producers_group
certification: # (9)
status: VERIFIED # (10)
message: Verified by data producers
announcement: # (11)
type: Informational # (12)
title: Informational announcement
description: Explanation of the ...
terms: # (13)
- Sales
- Transactions
tags: # (14)
- name: PII
propagate: false
restrict_propagation_through_lineage: true
restrict_propagation_through_hierarchy: false
- name: GDPR
propagate: false
restrict_propagation_through_lineage: true
restrict_propagation_through_hierarchy: false
customMetadata: # (15)
Data Quality:
Completeness Score: 100
Failed Checks:
- 884438be-82cc-4e04-bfe1-fba59276df38
- afa0e560-a916-4862-a2f2-c491f19f39f5
columns:
- name: txn_ref_dt # (16)
business_name: transaction date # (17)
description: null # (18)
is_primary: false # (19)
data_type: date # (20)
logical_type: date # (21)
invalid_format: null # (22)
valid_format: null # (23)
invalid_regex: null # (24)
valid_regex: null # (25)
missing_regex: null # (26)
invalid_values: [] # (27)
valid_values: [] # (28)
missing_values: [] # (29)
not_null: true # (30)
valid_length: null # (31)
valid_max_length: null # (32)
valid_min: null # (33)
valid_max: null # (34)
valid_min_length: null # (35)
unique: false # (36)
checks: # (37)
- missing_count(txn_ref_dt) = 0
- missing_count(txn_ref_dt) = 100
- current_time - date(record_date) < 5
...
- Must always be
DataContract
. - State of the contract:
draft
: contract is still being defined (work in progress)verified
: contract is published and ready to be used
- Version of the template for the data contract.
- Name of the asset as it exists inside Atlan.
- (Optional) Type of the dataset in Atlan:
Table
: a database tableView
: a database viewMaterialisedView
: a materialized view in a database
- (Optional) Description of this dataset, which can be synced to the asset being governed.
- Name that must match a data source defined in your config file.
- (Optional) Owners of the dataset, which can include users (by username) and / or groups (by internal Atlan alias), and can be synced to the asset being governed.
- (Optional) Certification to apply to the dataset, which can be synced to the asset being governed.
- Valid values:
DRAFT
: dataset is still being defined (work in progress)VERIFIED
: dataset is trusted and ready to be usedDEPRECATED
: dataset should no longer be trusted or used
- (Optional) Announcement to apply to the dataset, which can be synced to the asset being governed.
- Valid values:
information
: something should be noted about the dataset (appears blue in the UI)warning
: something is problematic with the dataset (appears yellow in the UI)issue
: something is wrong with the dataset (appears red in UI)
- (Optional) Glossary terms to assign to the dataset, which can be synced to the asset being governed.
- (Optional) List of the names of tags for this dataset, which can be synced to the asset being governed. For each tag you can optionally also specify:
propagate
: whether the tag should propagate to other assetsrestrict_propagation_through_lineage
: if propagation is enabled, whether the tag should propagate through lineagerestrict_propagation_through_hierarchy
: if propagation is enabled, whether the tag should propagate to child assets
- (Optional) Dictionary of custom metadata for this dataset, which can be synced to the asset being governed. Specify the name of the custom metadata and its attributes using their human-readable names. Multi-valued attributes should have their values provided as a list.
- Name of the column as it is defined in the source system (often technical).
- (Optional) Alias for the column, to make it's name more readable.
- (Optional) Description of this column, for documentation purposes.
- (Optional) When
true
, this column is the primary key for the table. - (Optional) Physical data type of values in this column (e.g.
varchar(20)
). - (Optional) Logical data type of values in this column (e.g.
string
). - (Optional) Format of data to consider invalid.
- (Optional) Format of data to consider valid.
- (Optional) Regular expression to match invalid values.
- (Optional) Regular expression to match valid values.
- (Optional) Regular expression to match missing values.
- (Optional) Enumeration of values that should be considered invalid.
- (Optional) Enumeration of values that should be considered valid.
- (Optional) Enumeration of values that should be considered missing.
- (Optional) When
true
, this column cannot be empty (without values). - (Optional) Fixed length for a string to be considered valid.
- (Optional) Maximum length for a string to be considered valid.
- (Optional) Minimum numeric value considered valid.
- (Optional) Maximum numeric value considered valid.
- (Optional) Minimum length for a string to be considered valid.
- (Optional) When
true
, this column must have unique values. - (Optional) List of checks to run to verify data quality of this dataset.