DocumentationTemplateLoader¶
This content has moved!
We have moved our samples to a separate, dedicated site: https://solutions.atlan.com.
This document is no longer being maintained.
Creates (or updates, if it already exists) metadata about:
Tabular Assets
: databases, schemas, tables, and columnsObject Store Assets
: accounts, buckets (containers), and objectsLineage
: between these objects, through processes that can have one or more inputs and produce one or more outputs
By default, unless overridden through the FILENAME
environment variable, the loader will try to read from a file named atlan-documentation-template.xlsx
in the directory in which it is run.
You can find the template, with some example values pre-populated, at the root of the repository in atlan-documentation-template.xlsx , or online in Google Sheets .
Use this loader primarily for creating assets
Use this loader primarily for creating assets that you do not anticipate crawling through an out-of-the-box crawler. For enriching assets that you crawl into Atlan, instead use the EnrichmentReporter and EnrichmentLoader. These will allow you to:
- enrich additional metadata (custom metadata, READMEs, and glossaries)
- reduce the risk of typos in the key identity characteristics of an asset accidentally creating rather than updating assets
Alternatively, you can also set the environment variable UPDATE_ONLY
to true
and the loader will only update any assets defined on Tabular Assets
and Object Store Assets
that already exist in Atlan. (It will not create them if they do not yet exist.)
Column details¶
The name of these columns is the critical piece to use, their order in the spreadsheet is irrelevant.
Any cell that expects a boolean value can be provided any of the following, case-insensitive, to mean true
:
X
Y
YES
TRUE
Only the bold fields are required
In each sheet, only the fields in bold with an are required. All other fields are optional — they are not mandatory for creating or updating an asset in Atlan.
In all cases, these column names must be the second row of the spreadsheet. The first row is ignored, and can therefore be used to give any groupings or instructions to users you want above the fixed set of column names.
Tabular assets¶
Expects a sheet in the Excel workbook named Tabular Assets
. Each row in the sheet should uniquely identify information about a tabular asset:
- If a row has only the
Connector
andConnection Name
columns populated, a connection will be created and the blue columns will populate metadata on that connection. - If a row has only these columns + the
Database Name
column populated, a database will be created and the blue columns will populate metadata on that database. - If a row has only these columns + the
Schema Name
column populated, a schema will be created and the blue columns will populate metadata on that schema. - If a row has only these columns + the
Container Name
andContainer Type
columns populated, a container (table, view, etc) will be created and the blue columns will populate metadata on that container. - If a row has only these columns + the
Column Name
andColumn Data Type
columns populated, a column will be created and the blue columns will populate metadata on that column.
In this way, you can use a single sheet to define the entire hierarchy of tabular assets.
The following header columns are expected (on row 2 of the spreadsheet):
Column name | Usage |
---|---|
Connector | Name of the connector (must be a valid Atlan connector type). |
Connection Name | Name of the connection. |
Database Name | Name of the database. |
Schema Name | Name of the schema. |
Container Name | Name of the container (table, view, etc) object. |
Container Type | Type of the container (Table, View or MaterialisedView). |
Column Name | Name of the column. |
Column Data Type | A SQL-style data type — for example, VARCHAR(30) or DECIMAL(3,5). |
Primary Key? | If true , indicates this column is a primary key (blank means false ). |
Foreign Key? | If true , indicates this column is a foreign key (blank means false ). |
Description | A system-level description of the asset. |
Owner Users | A pipe-delimited list of usernames for the owners of the asset. |
Owner Groups | A pipe-delimited list of the group names for the owners of the asset. |
Certificate | The certificate to apply to the asset — one of VERIFIED , DRAFT , or DEPRECATED . |
Certificate Message | An optional message to associate with the certificate. |
Announcement | The type of announcement to apply to the asset — one of information , warning , or issue . |
Announcement Title | A subject line for the announcement. |
Announcement Message | A more detailed message to associate with the announcement. |
Atlan Tags | A pipe-delimited list of the names of the Atlan tags you want to directly apply to the asset. Note that these will only be appended to the asset, so any pre-existing Atlan tags assigned to the asset in Atlan will not be overwritten. |
Special interpretation of the owners
When a row defines a connection, these will be set as the connection admins rather than owners:
- Owner Users
- Owner Groups
Furthermore, if the group name starts with $
then it will be interpreted as a role (for example $admin
is for "All Admins").
Finally, note that if no owners at all are listed in the spreadsheet, $admin
("All Admins") will be used as the default for connection admins.
Object store assets¶
Expects a sheet in the Excel workbook named Object Store Assets
. Each row in the sheet should uniquely identify information about an object store asset:
- If a row has only the
Connector
andConnection Name
columns populated, a connection will be created and the blue columns will populate metadata on that connection. -
If a row has only these columns + the
Account Name
column populated, an ADLS account will be created and the blue columns will populate metadata on that account.Only for ADLS
The
Account Name
column is only used for ADLS objects. If you are defining objects in some other object store (GCS, S3, etc) you can always leave this column blank. -
If a row has only these columns + the
Bucket Name
column populated, a bucket (or ADLS container) will be created and the blue columns will populate metadata on that bucket.Bucket ARN is required for S3
Unique to AWS, the
Bucket ARN
column must have a unique value when defining an AWS S3 bucket or object. -
If a row has only these columns + the
Object Name
column populated, an object will be created and the blue columns will populate metadata on that object.Object ARN is required for S3
Unique to AWS, the
Object ARN
column must have a unique value when defining an AWS S3 object.
In this way, you can use a single sheet to define the entire hierarchy of object store assets.
The following header columns are expected (on row 2 of the spreadsheet):
Column name | Usage |
---|---|
Connector | Name of the connector (must be a valid Atlan connector type). |
Connection Name | Name of the connection. |
Account Name | Name of the ADLS account (only used for ADLS assets). |
Bucket Name | Name of the bucket (or container, for ADLS). |
Bucket ARN | Unique Amazon Resource Number (ARN), required for an AWS bucket. |
Object Name | Name of the object. |
Object ARN | Unique Amazon Resource Number (ARN), required for an AWS object. |
Object Path | (Optional) Path to the object, used as the objectKey . |
Object Size | (Optional) Size of the object, in bytes. |
Content Type | (Optional) Content type of the object, for example text/csv . |
Description | A system-level description of the asset. |
Owner Users | A pipe-delimited list of usernames for the owners of the asset. |
Owner Groups | A pipe-delimited list of the group names for the owners of the asset. |
Certificate | The certificate to apply to the asset — one of VERIFIED , DRAFT , or DEPRECATED . |
Certificate Message | An optional message to associate with the certificate. |
Announcement | The type of announcement to apply to the asset — one of information , warning , or issue . |
Announcement Title | A subject line for the announcement. |
Announcement Message | A more detailed message to associate with the announcement. |
Atlan Tags | A pipe-delimited list of the names of the Atlan tags you want to directly apply to the asset. Note that these will only be appended to the asset, so any pre-existing Atlan tags assigned to the asset in Atlan will not be overwritten. |
Special interpretation of the owners
When a row defines a connection, these will be set as the connection admins rather than owners:
- Owner Users
- Owner Groups
Furthermore, if the group name starts with $
then it will be interpreted as a role (for example $admin
is for "All Admins").
Finally, note that if no owners at all are listed in the spreadsheet, $admin
("All Admins") will be used as the default for connection admins.
Lineage¶
Expects a sheet in the Excel workbook named Lineage
. Each row in the sheet should uniquely identify an input to and output from a lineage process between other assets.
Only creates and describes process assets
- The lineage loader will only create connections and process assets for the orchestrators — the source and target assets (and their connections) must already exist. (If you need these to be created, use the sheets above to first create the source and target assets.)
- The common columns in this sheet (description, certificate, announcement and so on) will all be used to describe the lineage process itself — not the source or target asset(s).
The following header columns are expected (on row 2 of the spreadsheet):
Column name | Usage |
---|---|
Connector (s) | Name of the connector for the source asset (must be a valid Atlan connector type). |
Connection (s) | Name of the connection for the source asset. |
Source Asset Type | Type of the source asset. |
Source Asset | Partial or fully-qualified name of the source asset. |
Orchestrator | Name of the software tool or system that orchestrated the process to be represented in lineage. Each unique value in this column will result in a connector in Atlan (of a type given by Process Type ), in which all lineage processes for that orchestrator will be contained. |
Process Type | Type of process that was run. This influences the icon that will appear in lineage for the process, and when combined with the Orchestrator value will uniquely identify the connection in which the lineage process will be created. |
Process ID | Unique name of the process that should be represented in lineage. All rows that share the combination of Orchestrator and Process ID will be combined into a single lineage process (with potentially multiple sources and targets). |
Target Asset | Partial or fully-qualified name of the target asset. |
Target Asset Type | Type of the target asset. |
Connector (t) | Name of the connector for the target asset (must be a valid Atlan connector type). |
Connection (t) | Name of the connection for the target asset. |
SQL / Code | (Optional) Any SQL or other code you want to use to describe in technical detail what occurred within the integration process. |
Process URL | (Optional) a URL to the integration process itself. This could be to the details of the process running in the orchestrator tool, to a GitHub repository holding the code for this integration process, or to any other arbitrary URL describing the process. |
Description | A system-level description of the asset. |
Owner Users | A pipe-delimited list of usernames for the owners of the asset. |
Owner Groups | A pipe-delimited list of the group names for the owners of the asset. |
Certificate | The certificate to apply to the asset — one of VERIFIED , DRAFT , or DEPRECATED . |
Certificate Message | An optional message to associate with the certificate. |
Announcement | The type of announcement to apply to the asset — one of information , warning , or issue . |
Announcement Title | A subject line for the announcement. |
Announcement Message | A more detailed message to associate with the announcement. |
Atlan Tags | A pipe-delimited list of the names of the Atlan tags you want to directly apply to the asset. Note that these will only be appended to the asset, so any pre-existing Atlan tags assigned to the asset in Atlan will not be overwritten. |
Partial vs fully-qualified names
If you provide a full qualifiedName
(for example, copied and pasted from Atlan) then the Connector
and Connection
columns are ignored. If you provide a partial qualifiedName
(for example, db/schema/table
) the connection portion of the qualified name will be looked up using the Connector
and Connection
details.
Configuration¶
You can configure the following options for this loader as environment variables:
ATLAN_BASE_URL
¶
URL for your Atlan tenant (for example: https://tenant.atlan.com
).
ATLAN_API_KEY
¶
API token to use when accessing Atlan.
FILENAME
¶
Defines the filename the loader will use as its input. (Defaults to atlan-documentation-template.xlsx
if not specified.)
DELIMITER
¶
Defines the character to use to separate values in multi-valued cells (like assigned terms, Atlan tags). If you anticipate the default pipe (|
) to appear in any of these multi-valued objects' values, you should change this to some other character.
UPDATE_ONLY
¶
Specifies how to handle assets that do not already exist in the environment. By default (false
), the loader will create any assets that do not already exist in the environment. If set to true
, the loader will only update an asset if it already exists and will not create any new assets. (Note that when true
there is additional work to lookup existing assets, so it will be slower.)
Logging¶
By default, only messages at INFO
and above will be logged (to the console). You can change this level
by copying the main/resources/log4j2.xml
from the source repo and modifying it to give:
- more detail (
DEBUG
will log every API request and response, including their full payloads) - less detail (
WARN
will only print warnings and errors,ERROR
only errors).
Running the loaders locally¶
You can run the loaders locally on any machine with network access to your Atlan tenant.
When run in this mode, the loader will read the input Excel file locally on the machine running the loader. If you want to change any of the settings, you can simply prepend the command with the environment variables.
You can run the DocumentationTemplateLoader
by downloading the pre-compiled jar files for both:
With this approach you only need a JRE installed, as you will be using pre-compiled code.
Requires at least Java 11, though 17+ is recommended
You must have a Java 11 (or higher) JRE installed to use the samples. If possible, we recommend using Java 17 (or higher) for significantly improved performance.
Run DocumentationTemplateLoader from pre-compiled jar files | |
---|---|
1 2 3 4 5 6 |
|
-
Export environment variables for the configuration you want to use. At a minimum:
ATLAN_BASE_URL
giving the URL of your Atlan tenantATLAN_API_KEY
giving the value of an API token with access to that tenant to load the assetsFILENAME
giving the name of the Excel file from which to load metadata
Overriding the configuration
Note that you can specify any optional overrides as environment variables before running the loader in the same way. For example, you could use
export BATCH_SIZE=20
to change the batch size.(Optional) If you want to override the logging settings, also add
-Dlog4j.configurationFile=./log4j2.xml
just before the canonical classname, pointing to your modified version of thelog4j2.xml
file. -
Within the directory containing the downloaded jar files, run the
java
command providing at least two arguments:- The two jar files, separated by a
:
. - The canonical classname of the
DocumentationTemplateLoader
.
Replace the
...
with proper version numbersDon't forget to replace the
...
in the-cp
argument's values with proper version numbers, or running this will not work. - The two jar files, separated by a
Run DocumentationTemplateLoader from pre-compiled jar files | |
---|---|
1 2 3 4 5 |
|
-
Export environment variables for the configuration you want to use. At a minimum:
ATLAN_BASE_URL
giving the URL of your Atlan tenantATLAN_API_KEY
giving the value of an API token with access to that tenant to load the assetsFILENAME
giving the name of the Excel file from which to load metadata
Overriding the configuration
Note that you can specify any optional overrides as environment variables before running the loader in the same way. For example, you could use
$env:BATCH_SIZE="20"
to change the batch size.(Optional) If you want to override the logging settings, also add
-Dlog4j.configurationFile=./log4j2.xml
just before the canonical classname, pointing to your modified version of thelog4j2.xml
file.Note that on windows you must surround the value of the variable in double-quotes!
-
In our experience, it seems that the
java -cp
command does not always work on Windows. Therefore, ensure you set the$env:CLASSPATH
environment variable to include the names of the downloaded jar files, separated by a;
.Replace the
...
with proper version numbersDon't forget to replace the
...
in the-cp
argument's values with proper version numbers, or running this will not work. -
Within the directory containing the downloaded jar files, run the
java
command providing the canonical classname of theDocumentationTemplateLoader
.
Run DocumentationTemplateLoader from pre-compiled jar files | |
---|---|
1 2 3 4 5 6 |
|
-
Export environment variables for the configuration you want to use. At a minimum:
ATLAN_BASE_URL
giving the URL of your Atlan tenantATLAN_API_KEY
giving the value of an API token with access to that tenant to load the assetsFILENAME
giving the name of the Excel file from which to load metadata
Overriding the configuration
Note that you can specify any optional overrides as environment variables before running the loader in the same way. For example, you could use
export BATCH_SIZE=20
to change the batch size.(Optional) If you want to override the logging settings, also add
-Dlog4j.configurationFile=./log4j2.xml
just before the canonical classname, pointing to your modified version of thelog4j2.xml
file. -
Within the directory containing the downloaded jar files, run the
java
command providing at least two arguments:- The two jar files, separated by a
:
. - The canonical classname of the
DocumentationTemplateLoader
.
Replace the
...
with proper version numbersDon't forget to replace the
...
in the-cp
argument's values with proper version numbers, or running this will not work. - The two jar files, separated by a
You can also run the DocumentationTemplateLoader
from a clone of the samples repository:
If you want to change the sample code provided to include additional logic, this is your best option. You have full control over all of the code this way and can make any changes you like before running it.
Additional requirements
You will need a JDK installed to run this way, not only the JRE, as the code must be compiled locally before it can be executed. The computer on which you run the code must also have network access to be able to configure Gradle itself and download the necessary dependencies of the code from Maven Central.
FILENAME=atlan-documentation-template.xlsx \
./gradlew EnrichmentLoader # (1)!
-
Within the root directory of your local clone of the repo, run the Gradle task for the
DocumentationTemplateLoader
.Overriding the configuration
Note that you can specify any optional overrides as environment variables before running the loader. You can either do this in the same line where you run the command, or by exporting them first for example using
export FILENAME=atlan-documentation-template.xlsx
.
10:02:04.101 [main] INFO com.atlan.samples.loaders.DocumentationTemplateLoader - Retrieving configuration and context...
10:02:06.483 [main] INFO com.atlan.samples.loaders.DocumentationTemplateLoader - Loading tabular assets from: atlan-documentation-template.xlsx::Tabular Assets
10:02:06.500 [main] INFO com.atlan.samples.loaders.models.ConnectionDetails - ... looking for existing (1) connections...
10:02:07.622 [main] INFO com.atlan.samples.loaders.models.ConnectionDetails - ...... found: production (default/redshift/1694020655)
10:02:07.847 [main] INFO com.atlan.samples.loaders.models.ConnectionDetails - ... processed 1/1 (100%)
10:02:08.306 [main] INFO com.atlan.samples.loaders.models.DatabaseDetails - ... processed 1/1 (100%)
10:02:08.538 [main] INFO com.atlan.samples.loaders.models.SchemaDetails - ... processed 3/3 (100%)
10:02:08.807 [main] INFO com.atlan.samples.loaders.models.ContainerDetails - ... processed 3/3 (100%)
10:02:09.238 [main] INFO com.atlan.samples.loaders.models.ColumnDetails - ... processed 17/17 (100%)
10:02:09.238 [main] INFO com.atlan.samples.loaders.DocumentationTemplateLoader - Updating assets with counts...
10:02:12.507 [main] INFO com.atlan.samples.loaders.DocumentationTemplateLoader - Loading object store assets from: atlan-documentation-template.xlsx::Object Store Assets
10:02:12.515 [main] INFO com.atlan.samples.loaders.models.ConnectionDetails - ... looking for existing (4) connections...
10:02:12.935 [main] INFO com.atlan.samples.loaders.models.ConnectionDetails - ...... found: production (default/s3/1694020669)
10:02:13.355 [main] INFO com.atlan.samples.loaders.models.ConnectionDetails - ...... found: production (default/adls/1694020672)
10:02:13.776 [main] INFO com.atlan.samples.loaders.models.ConnectionDetails - ...... found: production (default/gcs/1694020674)
10:02:14.220 [main] INFO com.atlan.samples.loaders.models.ConnectionDetails - ...... found: raw (default/s3/1694020676)
10:02:14.456 [main] INFO com.atlan.samples.loaders.models.ConnectionDetails - ... processed 4/4 (100%)
10:02:15.746 [main] INFO com.atlan.samples.loaders.models.AccountDetails - ... processed 1/1 (100%)
10:02:16.045 [main] INFO com.atlan.samples.loaders.models.BucketDetails - ... processed 4/4 (100%)
10:02:16.502 [main] INFO com.atlan.samples.loaders.models.ObjectDetails - ... processed 5/5 (100%)
10:02:16.502 [main] INFO com.atlan.samples.loaders.DocumentationTemplateLoader - Updating assets with counts...
10:02:17.600 [main] INFO com.atlan.samples.loaders.DocumentationTemplateLoader - Loading lineage from: atlan-documentation-template.xlsx::Lineage
10:02:17.626 [main] INFO com.atlan.samples.loaders.models.ConnectionDetails - ... looking for existing (3) connections...
10:02:18.032 [main] INFO com.atlan.samples.loaders.models.ConnectionDetails - ...... found: DataGrip (default/api/1694020706)
10:02:18.447 [main] INFO com.atlan.samples.loaders.models.ConnectionDetails - ...... found: MWAA (default/airflow/1694020708)
10:02:18.874 [main] INFO com.atlan.samples.loaders.models.ConnectionDetails - ...... found: Airflow (default/airflow/1694020711)
10:02:19.090 [main] INFO com.atlan.samples.loaders.models.ConnectionDetails - ... processed 3/3 (100%)
10:02:20.332 [main] INFO com.atlan.samples.loaders.models.ConnectionDetails - ...... found: production (default/redshift/1694020655)
10:02:20.761 [main] INFO com.atlan.samples.loaders.models.ConnectionDetails - ...... found: production (default/s3/1694020669)
10:02:21.174 [main] INFO com.atlan.samples.loaders.models.ConnectionDetails - ...... found: production (default/adls/1694020672)
10:02:21.594 [main] INFO com.atlan.samples.loaders.models.ConnectionDetails - ...... found: raw (default/s3/1694020676)
10:02:22.159 [main] INFO com.atlan.samples.loaders.models.LineageDetails - ... processed 5/5 (100%)
More details
- The first line will always tell you the name of the Excel file the loader is processing.
- The loader will only attempt to load those sheets that exist within the Excel workbook, and only the data in those sheets.