Skip to content

Lineage generator package

The lineage generator (no transformation) package automatically detects assets with the same (or similar) name between two connections and creates the lineage between them.

Configuration

Recommendation

To avoid to blindly let the package to create the lineage, an option to preview the ouput is provided. The typical path to use this package would be:

Ask the package to generate the lineage preview:

  1. If happy with the output, ask the package to generate the lineage on Atlan.
  2. The package also provides a method to delete lineage created by the package itself.

To generate lineage by automatically detecting assets with similar names between two connections:

Coming soon

Generate lineage for assets
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
from pyatlan.client.atlan import AtlanClient
from pyatlan.model.packages import LineageGenerator
from pyatlan.model.assets import Asset
from pyatlan.model.enums import AssetInputHandling

client = AtlanClient()

workflow = (
    LineageGenerator() # (1)
    .config( # (2)
        source_asset_type=LineageGenerator.SourceAssetType.Table,
        source_qualified_name="default/snowflake/1737456702/DB/SCHEMA/TABLE",
        target_asset_type=LineageGenerator.TargetAssetType.View,
        target_qualified_name="default/mssql/1737456702/DB/SCHEMA/VIEW",
        case_sensitive_match=False,
        match_on_schema=False,
        output_type=LineageGenerator.OutputType.PREVIEW,
        generate_on_child_assets=False,
        regex_match="regex_match/*",
        regex_replace="regex_replace/*",
        regex_match_schema="regex_match_schema/*",
        regex_replace_schema="regex_replace_schema/*",
        regex_match_schema_name="regex_match_schema_name/*",
        regex_replace_schema_name="regex_replace_schema_name/*",
        match_prefix="test-prefix",
        match_suffix="test-suffix",
        file_advanced_seperator="/",
        file_advanced_position=3,
        process_connection_qn="default/mssql/1737456702/DB/SCHEMA/MVIEW",
    )
).to_workflow() # (3)

response = client.workflow.run(workflow)  # (4)
  1. Lineage generator (no transformation) package automatically detects assets with the same (or similar) name between two connections and creates the lineage between them.
  2. Set up the lineage generator using config() with the following:

    source_asset_type: type name of the lineage input assets (sources).
    source_qualified_name: qualified name prefix of the lineage input assets (sources).
    target_asset_type: type name of the lineage output assets (targets).
    target_qualified_name: qualified name prefix of the lineage output assets (targets).
    case_sensitive_match: whether to match asset names using case-sensitive logic, default: False.
    match_on_schema: whether to include the schema name to match source and target assets, default: False. Ignored if one of the asset types is not relational (e.g., Table, View, Materialized View, Calculation View, Column, or MongoDB Collection).
    output_type: determines the type of lineage generation, default: Preview.

    • PREVIEW: generates a CSV preview of the lineage.
    • GENERATE: creates the lineage on Atlan.
    • DELETE: removes the lineage on Atlan.

    generate_on_child_assets: whether to generate lineage on child assets of the specified source a target types, default: False.
    regex_match (optional): regex pattern to identify renaming between source and target.
    regex_replace (optional): replacement characters for renaming identified by regex_match.
    regex_match_schema (optional): regex pattern for renaming between source and target schemas (used only if match_on_schema is False).
    regex_replace_schema (optional): replacement characters for schema renaming identified by regex_match_schema.
    regex_match_schema_name (optional): regex pattern for renaming source and target name + schema (used only if match_on_schema is True; overrides other regex patterns).
    regex_replace_schema_name (optional): replacement characters for schema name renaming identified by regex_match_schema_name.
    match_prefix (optional): prefix added to source assets to match with target assets.
    match_suffix (optional): suffix added to source assets to match with target assets.
    file_advanced_separator (optional): separator used to split the qualified name (applicable to file-based assets). eg: / splits default/s3/1707397085/arn:aws:s3:::mybucket/prefix/myobject.csv into ["default", "s3", "1707397085", "arn:aws:s3:::mybucket", "prefix", "myobject.csv"].
    file_advanced_position (optional): number of substrings (from the right) to use for matching (applies to file-based assets). eg: if the value is 3, it results in ["arn:aws:s3:::mybucket", "prefix", "myobject.csv"].
    process_connection_qn (optional): connection for process assets. Defaults to the source asset connection if blank.

  3. Convert the package into a Workflow object.

  4. Run the workflow by invoking the run() method on the workflow client, passing the created object.

    Workflows run asynchronously

    Remember that workflows run asynchronously. See the packages and workflows introduction for details on how to check the status and wait until the workflow has been completed.

Coming soon

Create the workflow via UI only

We recommend creating the workflow only via the UI. To rerun an existing workflow, see the steps below.

Re-run existing workflow

To re-run an existing lineage generator workflow:

Coming soon

Re-run existing lineage generator workflow
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from pyatlan.client.atlan import AtlanClient
from pyatlan.model.enums import WorkflowPackage

client = AtlanClient()

existing = client.workflow.find_by_type(  # (1)
  prefix=WorkflowPackage.LINEAGE_GENERATOR, max_results=5
)

# Determine which lineage generator workflow (n)
# from the list of results you want to re-run.
response = client.workflow.rerun(existing[n]) # (2)
  1. You can find workflows by their type using the workflow client find_by_type() method and providing the prefix for one of the packages. In this example, we do so for the LineageGenerator. (You can also specify the maximum number of resulting workflows you want to retrieve as results.)
  2. Once you've found the workflow you want to re-run, you can simply call the workflow client rerun() method.

    • Optionally, you can use rerun(idempotent=True) to avoid re-running a workflow that is already in running or in a pending state. This will return details of the already running workflow if found, and by default, it is set to False.

    Workflows run asynchronously

    Remember that workflows run asynchronously. See the packages and workflows introduction for details on how you can check the status and wait until the workflow has been completed.

Coming soon

Requires multiple steps through the raw REST API

  1. Find the existing workflow.
  2. Send through the resulting re-run request.
POST /api/service/workflows/indexsearch
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
{
  "from": 0,
  "size": 5,
  "query": {
    "bool": {
      "filter": [
        {
          "nested": {
            "path": "metadata",
            "query": {
              "prefix": {
                "metadata.name.keyword": {
                  "value": "csa-lineage-generator" // (1)
                }
              }
            }
          }
        }
      ]
    }
  },
  "sort": [
    {
      "metadata.creationTimestamp": {
        "nested": {
          "path": "metadata"
        },
        "order": "desc"
      }
    }
  ],
  "track_total_hits": true
}
  1. Searching by the csa-lineage-generator prefix will ensure you only find existing asset import workflows.

    Name of the workflow

    The name of the workflow will be nested within the _source.metadata.name property of the response object. (Remember since this is a search, there could be multiple results, so you may want to use the other details in each result to determine which workflow you really want.)

POST /api/service/workflows/submit
100
101
102
103
104
{
  "namespace": "default",
  "resourceKind": "WorkflowTemplate",
  "resourceName": "csa-lineage-generator-1684500411" // (1)
}
  1. Send the name of the workflow as the resourceName to rerun it.