Lineage generator package¶
The lineage generator (no transformation) package automatically detects assets with the same (or similar) name between two connections and creates the lineage between them.
Configuration¶
Recommendation
To avoid to blindly let the package to create the lineage, an option to preview the ouput is provided. The typical path to use this package would be:
Ask the package to generate the lineage preview:
- If happy with the output, ask the package to generate the lineage on Atlan.
- The package also provides a method to delete lineage created by the package itself.
To generate lineage by automatically detecting assets with similar names between two connections:
Coming soon
Generate lineage for assets | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
- Lineage generator (no transformation) package automatically detects assets with the same (or similar) name between two connections and creates the lineage between them.
-
Set up the lineage generator using
config()
with the following:source_asset_type
: type name of the lineage input assets (sources).
source_qualified_name
: qualified name prefix of the lineage input assets (sources).
target_asset_type
: type name of the lineage output assets (targets).
target_qualified_name
: qualified name prefix of the lineage output assets (targets).
case_sensitive_match
: whether to match asset names using case-sensitive logic, default:False
.
match_on_schema
: whether to include the schema name to match source and target assets, default:False
. Ignored if one of the asset types is not relational (e.g.,Table
,View
,Materialized View
,Calculation View
,Column
, orMongoDB
Collection).
output_type
: determines the type of lineage generation, default:Preview
.PREVIEW
: generates a CSV preview of the lineage.GENERATE
: creates the lineage on Atlan.DELETE
: removes the lineage on Atlan.
generate_on_child_assets
: whether to generate lineage on child assets of the specified source a target types, default:False
.
regex_match
(optional): regex pattern to identify renaming between source and target.
regex_replace
(optional): replacement characters for renaming identified byregex_match
.
regex_match_schema
(optional): regex pattern for renaming between source and target schemas (used only ifmatch_on_schema
isFalse
).
regex_replace_schema
(optional): replacement characters for schema renaming identified byregex_match_schema
.
regex_match_schema_name
(optional): regex pattern for renaming source and target name + schema (used only ifmatch_on_schema
isTrue
; overrides other regex patterns).
regex_replace_schema_name
(optional): replacement characters for schema name renaming identified byregex_match_schema_name
.
match_prefix
(optional): prefix added to source assets to match with target assets.
match_suffix
(optional): suffix added to source assets to match with target assets.
file_advanced_separator
(optional): separator used to split the qualified name (applicable to file-based assets). eg:/
splits default/s3/1707397085/arn:aws:s3:::mybucket/prefix/myobject.csv into ["default", "s3", "1707397085", "arn:aws:s3:::mybucket", "prefix", "myobject.csv"].
file_advanced_position
(optional): number of substrings (from the right) to use for matching (applies to file-based assets). eg: if the value is3
, it results in ["arn:aws:s3:::mybucket", "prefix", "myobject.csv"].
process_connection_qn
(optional): connection for process assets. Defaults to the source asset connection if blank. -
Convert the package into a
Workflow
object. -
Run the workflow by invoking the
run()
method on the workflow client, passing the created object.Workflows run asynchronously
Remember that workflows run asynchronously. See the packages and workflows introduction for details on how to check the status and wait until the workflow has been completed.
Coming soon
Create the workflow via UI only
We recommend creating the workflow only via the UI. To rerun an existing workflow, see the steps below.
Re-run existing workflow¶
To re-run an existing lineage generator workflow:
Coming soon
Re-run existing lineage generator workflow | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 |
|
- You can find workflows by their type using the workflow client
find_by_type()
method and providing the prefix for one of the packages. In this example, we do so for theLineageGenerator
. (You can also specify the maximum number of resulting workflows you want to retrieve as results.) -
Once you've found the workflow you want to re-run, you can simply call the workflow client
rerun()
method.- Optionally, you can use
rerun(idempotent=True)
to avoid re-running a workflow that is already in running or in a pending state. This will return details of the already running workflow if found, and by default, it is set toFalse
.
Workflows run asynchronously
Remember that workflows run asynchronously. See the packages and workflows introduction for details on how you can check the status and wait until the workflow has been completed.
- Optionally, you can use
Coming soon
Requires multiple steps through the raw REST API
- Find the existing workflow.
- Send through the resulting re-run request.
POST /api/service/workflows/indexsearch | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
-
Searching by the
csa-lineage-generator
prefix will ensure you only find existing asset import workflows.Name of the workflow
The name of the workflow will be nested within the
_source.metadata.name
property of the response object. (Remember since this is a search, there could be multiple results, so you may want to use the other details in each result to determine which workflow you really want.)
POST /api/service/workflows/submit | |
---|---|
100 101 102 103 104 |
|
- Send the name of the workflow as the
resourceName
to rerun it.