Skip to content

Data contracts specification

Following is the template for a data contract:

---
kind: DataContract                   # (1)
status: draft                        # (2)
template_version: 0.0.1              # (3)

dataset: sale_txn                    # (4)
type: Table                          # (5)
description: This is the ...         # (6)

data_source: snowflake               # (7)

tags:                                # (8)
  - name: PII
  - name: GDPR

certificate: DRAFT                   # (9)

columns:
  - name: txn_ref_dt                 # (10)
    business_name: transaction date  # (11)
    description: null                # (12)
    is_primary: false                # (13)
    data_type: date                  # (14)
    logical_type: date
    invalid_format: null             # (15)
    valid_format: null               # (16)
    invalid_regex: null              # (17)
    valid_regex: null                # (18)
    missing_regex: null              # (19)
    invalid_values: []               # (20)
    valid_values: []                 # (21)
    missing_values: []               # (22)
    not_null: true                   # (23)
    valid_length: null               # (24)
    valid_max_length: null           # (25)
    valid_min: null                  # (26)
    valid_max: null                  # (27)
    valid_min_length: null           # (28)
    unique: false                    # (29)
    reference:                       # (30)
      dataset: iso_3166-2
      column: available_dates
      samples_limit: 20
    quality:
      tool_name: DQ Platform

checks:                              # (31)
  - missing_count(txn_ref_dt) = 0
  - missing_count(txn_ref_dt) = 100
  - current_time - date(record_date) < 5
...
  1. Must always be DataContract.
  2. State of the contract:
    • draft: contract is still being defined (work in progress)
    • verified: contract is published and ready to be used
  3. Version of the template for the data contract.
  4. Name of the asset as it exists inside Atlan.
  5. Type of the asset in Atlan:
    • Table: a database table
    • View: a database view
    • MaterialisedView: a materialized view in a database
  6. (Optional) Description of this dataset, for documentation purposes.

    Read-only

    Treat this as read-only — any changes you make will not be synced back to Atlan.

  7. Name that must match a data source defined in your config file.

  8. (Optional) List of the names of tags for this dataset, for documentation purposes.

    Read-only

    Treat this as read-only — any changes you make will not be synced back to Atlan.

  9. (Optional) Status of the dataset:

    • DRAFT: dataset is still being defined (work in progress)
    • VERIFIED: dataset is trusted and ready to be used
    • DEPRECATED: dataset should no longer be trusted or used

    Read-only

    Treat this as read-only — any changes you make will not be synced back to Atlan.

  10. Name of the column as it is defined in the source system (often technical).

  11. (Optional) Alias for the column, to make it's name more readable.

    Read-only

    Treat this as read-only — any changes you make will not be synced back to Atlan.

  12. (Optional) Description of this column, for documentation purposes.

    Read-only

    Treat this as read-only — any changes you make will not be synced back to Atlan.

  13. (Optional) When true, this column is the primary key for the table.

    Read-only

    Treat this as read-only — any changes you make will not be synced back to Atlan.

  14. Physical data type of values in this column.

  15. (Optional) Format of data to consider invalid.
  16. (Optional) Format of data to consider valid.
  17. (Optional) Regular expression to match invalid values.
  18. (Optional) Regular expression to match valid values.
  19. (Optional) Regular expression to match missing values.
  20. (Optional) Enumeration of values that should be considered invalid.
  21. (Optional) Enumeration of values that should be considered valid.
  22. (Optional) Enumeration of values that should be considered missing.
  23. (Optional) When true, this column cannot be empty (without values).
  24. (Optional) Fixed length for a string to be considered valid.
  25. (Optional) Maximum length for a string to be considered valid.
  26. (Optional) Minimum numeric value considered valid.
  27. (Optional) Maximum numeric value considered valid.
  28. (Optional) Minimum length for a string to be considered valid.
  29. (Optional) When true, this column must have unique values.
  30. (Optional) Values in this column must match the related dataset and column's values.
  31. (Optional) List of checks to run to verify data quality of this dataset.