Skip to content

Data contracts specification

Backwards compatibility

While we are in a closed preview state, we are not guaranteeing backwards compatibility. Version 0.0.2 is not backwards compatible with 0.0.1.

Following is the template for a data contract, where the highlighted lines are mandatory:

---
kind: DataContract                   # (1)
status: draft                        # (2)
template_version: 0.0.2              # (3)

dataset: sale_txn                    # (4)
type: Table                          # (5)
description: This is the ...         # (6)

datasource: snowflake                # (7)

owners:                              # (8)
  users:
    - jdoe
    - jsmith
  groups:
    - data_producers_group

certification:                       # (9)
  status: VERIFIED                   # (10)
  message: Verified by data producers

announcement:                        # (11)
  type: Informational                # (12)
  title: Informational announcement
  description: Explanation of the ...

terms:                               # (13)
  - Sales
  - Transactions

tags:                                # (14)
  - name: PII
    propagate: false
    restrict_propagation_through_lineage: true
    restrict_propagation_through_hierarchy: false
  - name: GDPR
    propagate: false
    restrict_propagation_through_lineage: true
    restrict_propagation_through_hierarchy: false

columns:
  - name: txn_ref_dt                 # (15)
    business_name: transaction date  # (16)
    description: null                # (17)
    is_primary: false                # (18)
    data_type: date                  # (19)
    logical_type: date               # (20)
    invalid_format: null             # (21)
    valid_format: null               # (22)
    invalid_regex: null              # (23)
    valid_regex: null                # (24)
    missing_regex: null              # (25)
    invalid_values: []               # (26)
    valid_values: []                 # (27)
    missing_values: []               # (28)
    not_null: true                   # (29)
    valid_length: null               # (30)
    valid_max_length: null           # (31)
    valid_min: null                  # (32)
    valid_max: null                  # (33)
    valid_min_length: null           # (34)
    unique: false                    # (35)

checks:                              # (36)
  - missing_count(txn_ref_dt) = 0
  - missing_count(txn_ref_dt) = 100
  - current_time - date(record_date) < 5
...
  1. Must always be DataContract.
  2. State of the contract:
    • draft: contract is still being defined (work in progress)
    • verified: contract is published and ready to be used
  3. Version of the template for the data contract.
  4. Name of the asset as it exists inside Atlan.
  5. (Optional) Type of the dataset in Atlan:
    • Table: a database table
    • View: a database view
    • MaterialisedView: a materialized view in a database
  6. (Optional) Description of this dataset, for documentation purposes.
  7. Name that must match a data source defined in your config file.
  8. (Optional) Owners of the dataset, which can include users (by username) and / or groups.
  9. (Optional) Certification to apply to the dataset
  10. Valid values:
    • DRAFT: dataset is still being defined (work in progress)
    • VERIFIED: dataset is trusted and ready to be used
    • DEPRECATED: dataset should no longer be trusted or used
  11. (Optional) Announcement to apply to the dataset
  12. Valid values:
    • information: something should be noted about the dataset (appears blue in the UI)
    • warning: something is problematic with the dataset (appears yellow in the UI)
    • issue: something is wrong with the dataset (appears red in UI)
  13. (Optional) Glossary terms to assign to the dataset.
  14. (Optional) List of the names of tags for this dataset, for documentation purposes. For each tag you can optionally also specify:
    • propagate: whether the tag should propagate to other assets
    • restrict_propagation_through_lineage: if propagation is enabled, whether the tag should propagate through lineage
    • restrict_propagation_through_hierarchy: if propagation is enabled, whether the tag should propagate to child assets
  15. Name of the column as it is defined in the source system (often technical).
  16. (Optional) Alias for the column, to make it's name more readable.
  17. (Optional) Description of this column, for documentation purposes.
  18. (Optional) When true, this column is the primary key for the table.
  19. (Optional) Physical data type of values in this column (e.g. varchar(20)).
  20. (Optional) Logical data type of values in this column (e.g. string).
  21. (Optional) Format of data to consider invalid.
  22. (Optional) Format of data to consider valid.
  23. (Optional) Regular expression to match invalid values.
  24. (Optional) Regular expression to match valid values.
  25. (Optional) Regular expression to match missing values.
  26. (Optional) Enumeration of values that should be considered invalid.
  27. (Optional) Enumeration of values that should be considered valid.
  28. (Optional) Enumeration of values that should be considered missing.
  29. (Optional) When true, this column cannot be empty (without values).
  30. (Optional) Fixed length for a string to be considered valid.
  31. (Optional) Maximum length for a string to be considered valid.
  32. (Optional) Minimum numeric value considered valid.
  33. (Optional) Maximum numeric value considered valid.
  34. (Optional) Minimum length for a string to be considered valid.
  35. (Optional) When true, this column must have unique values.
  36. (Optional) List of checks to run to verify data quality of this dataset.