Skip to content

DocumentationTemplateLoader

This content has moved!

We have moved our samples to a separate, dedicated site: https://solutions.atlan.com.

This document is no longer being maintained.

Creates (or updates, if it already exists) metadata about:

  • Tabular Assets: databases, schemas, tables, and columns
  • Object Store Assets: accounts, buckets (containers), and objects
  • Lineage: between these objects, through processes that can have one or more inputs and produce one or more outputs

By default, unless overridden through the FILENAME environment variable, the loader will try to read from a file named atlan-documentation-template.xlsx in the directory in which it is run.

You can find the template, with some example values pre-populated, at the root of the repository in atlan-documentation-template.xlsx , or online in Google Sheets .

Use this loader primarily for creating assets

Use this loader primarily for creating assets that you do not anticipate crawling through an out-of-the-box crawler. For enriching assets that you crawl into Atlan, instead use the EnrichmentReporter and EnrichmentLoader. These will allow you to:

  • enrich additional metadata (custom metadata, READMEs, and glossaries)
  • reduce the risk of typos in the key identity characteristics of an asset accidentally creating rather than updating assets

Alternatively, you can also set the environment variable UPDATE_ONLY to true and the loader will only update any assets defined on Tabular Assets and Object Store Assets that already exist in Atlan. (It will not create them if they do not yet exist.)

Column details

The name of these columns is the critical piece to use, their order in the spreadsheet is irrelevant.

Any cell that expects a boolean value can be provided any of the following, case-insensitive, to mean true:

  • X
  • Y
  • YES
  • TRUE

Only the bold fields are required

In each sheet, only the fields in bold with an are required. All other fields are optional — they are not mandatory for creating or updating an asset in Atlan.

In all cases, these column names must be the second row of the spreadsheet. The first row is ignored, and can therefore be used to give any groupings or instructions to users you want above the fixed set of column names.

Tabular assets

Expects a sheet in the Excel workbook named Tabular Assets. Each row in the sheet should uniquely identify information about a tabular asset:

  • If a row has only the Connector and Connection Name columns populated, a connection will be created and the blue columns will populate metadata on that connection.
  • If a row has only these columns + the Database Name column populated, a database will be created and the blue columns will populate metadata on that database.
  • If a row has only these columns + the Schema Name column populated, a schema will be created and the blue columns will populate metadata on that schema.
  • If a row has only these columns + the Container Name and Container Type columns populated, a container (table, view, etc) will be created and the blue columns will populate metadata on that container.
  • If a row has only these columns + the Column Name and Column Data Type columns populated, a column will be created and the blue columns will populate metadata on that column.

In this way, you can use a single sheet to define the entire hierarchy of tabular assets.

The following header columns are expected (on row 2 of the spreadsheet):

Column name Usage
Connector Name of the connector (must be a valid Atlan connector type).
Connection Name Name of the connection.
Database Name Name of the database.
Schema Name Name of the schema.
Container Name Name of the container (table, view, etc) object.
Container Type Type of the container (Table, View or MaterialisedView).
Column Name Name of the column.
Column Data Type A SQL-style data type — for example, VARCHAR(30) or DECIMAL(3,5).
Primary Key? If true, indicates this column is a primary key (blank means false).
Foreign Key? If true, indicates this column is a foreign key (blank means false).
Description A system-level description of the asset.
Owner Users A pipe-delimited list of usernames for the owners of the asset.
Owner Groups A pipe-delimited list of the group names for the owners of the asset.
Certificate The certificate to apply to the asset — one of VERIFIED, DRAFT, or DEPRECATED.
Certificate Message An optional message to associate with the certificate.
Announcement The type of announcement to apply to the asset — one of information, warning, or issue.
Announcement Title A subject line for the announcement.
Announcement Message A more detailed message to associate with the announcement.
Atlan Tags A pipe-delimited list of the names of the Atlan tags you want to directly apply to the asset. Note that these will only be appended to the asset, so any pre-existing Atlan tags assigned to the asset in Atlan will not be overwritten.

Special interpretation of the owners

When a row defines a connection, these will be set as the connection admins rather than owners:

  • Owner Users
  • Owner Groups

Furthermore, if the group name starts with $ then it will be interpreted as a role (for example $admin is for "All Admins").

Finally, note that if no owners at all are listed in the spreadsheet, $admin ("All Admins") will be used as the default for connection admins.

Object store assets

Expects a sheet in the Excel workbook named Object Store Assets. Each row in the sheet should uniquely identify information about an object store asset:

  • If a row has only the Connector and Connection Name columns populated, a connection will be created and the blue columns will populate metadata on that connection.
  • If a row has only these columns + the Account Name column populated, an ADLS account will be created and the blue columns will populate metadata on that account.

    Only for ADLS

    The Account Name column is only used for ADLS objects. If you are defining objects in some other object store (GCS, S3, etc) you can always leave this column blank.

  • If a row has only these columns + the Bucket Name column populated, a bucket (or ADLS container) will be created and the blue columns will populate metadata on that bucket.

    Bucket ARN is required for S3

    Unique to AWS, the Bucket ARN column must have a unique value when defining an AWS S3 bucket or object.

  • If a row has only these columns + the Object Name column populated, an object will be created and the blue columns will populate metadata on that object.

    Object ARN is required for S3

    Unique to AWS, the Object ARN column must have a unique value when defining an AWS S3 object.

In this way, you can use a single sheet to define the entire hierarchy of object store assets.

The following header columns are expected (on row 2 of the spreadsheet):

Column name Usage
Connector Name of the connector (must be a valid Atlan connector type).
Connection Name Name of the connection.
Account Name Name of the ADLS account (only used for ADLS assets).
Bucket Name Name of the bucket (or container, for ADLS).
Bucket ARN Unique Amazon Resource Number (ARN), required for an AWS bucket.
Object Name Name of the object.
Object ARN Unique Amazon Resource Number (ARN), required for an AWS object.
Object Path (Optional) Path to the object, used as the objectKey.
Object Size (Optional) Size of the object, in bytes.
Content Type (Optional) Content type of the object, for example text/csv.
Description A system-level description of the asset.
Owner Users A pipe-delimited list of usernames for the owners of the asset.
Owner Groups A pipe-delimited list of the group names for the owners of the asset.
Certificate The certificate to apply to the asset — one of VERIFIED, DRAFT, or DEPRECATED.
Certificate Message An optional message to associate with the certificate.
Announcement The type of announcement to apply to the asset — one of information, warning, or issue.
Announcement Title A subject line for the announcement.
Announcement Message A more detailed message to associate with the announcement.
Atlan Tags A pipe-delimited list of the names of the Atlan tags you want to directly apply to the asset. Note that these will only be appended to the asset, so any pre-existing Atlan tags assigned to the asset in Atlan will not be overwritten.

Special interpretation of the owners

When a row defines a connection, these will be set as the connection admins rather than owners:

  • Owner Users
  • Owner Groups

Furthermore, if the group name starts with $ then it will be interpreted as a role (for example $admin is for "All Admins").

Finally, note that if no owners at all are listed in the spreadsheet, $admin ("All Admins") will be used as the default for connection admins.

Lineage

Expects a sheet in the Excel workbook named Lineage. Each row in the sheet should uniquely identify an input to and output from a lineage process between other assets.

Only creates and describes process assets

  • The lineage loader will only create connections and process assets for the orchestrators — the source and target assets (and their connections) must already exist. (If you need these to be created, use the sheets above to first create the source and target assets.)
  • The common columns in this sheet (description, certificate, announcement and so on) will all be used to describe the lineage process itself — not the source or target asset(s).

The following header columns are expected (on row 2 of the spreadsheet):

Column name Usage
Connector (s) Name of the connector for the source asset (must be a valid Atlan connector type).
Connection (s) Name of the connection for the source asset.
Source Asset Type Type of the source asset.
Source Asset Partial or fully-qualified name of the source asset.
Orchestrator Name of the software tool or system that orchestrated the process to be represented in lineage. Each unique value in this column will result in a connector in Atlan (of a type given by Process Type), in which all lineage processes for that orchestrator will be contained.
Process Type Type of process that was run. This influences the icon that will appear in lineage for the process, and when combined with the Orchestrator value will uniquely identify the connection in which the lineage process will be created.
Process ID Unique name of the process that should be represented in lineage. All rows that share the combination of Orchestrator and Process ID will be combined into a single lineage process (with potentially multiple sources and targets).
Target Asset Partial or fully-qualified name of the target asset.
Target Asset Type Type of the target asset.
Connector (t) Name of the connector for the target asset (must be a valid Atlan connector type).
Connection (t) Name of the connection for the target asset.
SQL / Code (Optional) Any SQL or other code you want to use to describe in technical detail what occurred within the integration process.
Process URL (Optional) a URL to the integration process itself. This could be to the details of the process running in the orchestrator tool, to a GitHub repository holding the code for this integration process, or to any other arbitrary URL describing the process.
Description A system-level description of the asset.
Owner Users A pipe-delimited list of usernames for the owners of the asset.
Owner Groups A pipe-delimited list of the group names for the owners of the asset.
Certificate The certificate to apply to the asset — one of VERIFIED, DRAFT, or DEPRECATED.
Certificate Message An optional message to associate with the certificate.
Announcement The type of announcement to apply to the asset — one of information, warning, or issue.
Announcement Title A subject line for the announcement.
Announcement Message A more detailed message to associate with the announcement.
Atlan Tags A pipe-delimited list of the names of the Atlan tags you want to directly apply to the asset. Note that these will only be appended to the asset, so any pre-existing Atlan tags assigned to the asset in Atlan will not be overwritten.

Partial vs fully-qualified names

If you provide a full qualifiedName (for example, copied and pasted from Atlan) then the Connector and Connection columns are ignored. If you provide a partial qualifiedName (for example, db/schema/table) the connection portion of the qualified name will be looked up using the Connector and Connection details.

Configuration

You can configure the following options for this loader as environment variables:

ATLAN_BASE_URL

URL for your Atlan tenant (for example: https://tenant.atlan.com).

ATLAN_API_KEY

API token to use when accessing Atlan.

FILENAME

Defines the filename the loader will use as its input. (Defaults to atlan-documentation-template.xlsx if not specified.)

DELIMITER

Defines the character to use to separate values in multi-valued cells (like assigned terms, Atlan tags). If you anticipate the default pipe (|) to appear in any of these multi-valued objects' values, you should change this to some other character.

UPDATE_ONLY

Specifies how to handle assets that do not already exist in the environment. By default (false), the loader will create any assets that do not already exist in the environment. If set to true, the loader will only update an asset if it already exists and will not create any new assets. (Note that when true there is additional work to lookup existing assets, so it will be slower.)

Logging

By default, only messages at INFO and above will be logged (to the console). You can change this level by copying the main/resources/log4j2.xml from the source repo and modifying it to give:

  • more detail (DEBUG will log every API request and response, including their full payloads)
  • less detail (WARN will only print warnings and errors, ERROR only errors).

Running the loaders locally

You can run the loaders locally on any machine with network access to your Atlan tenant.

When run in this mode, the loader will read the input Excel file locally on the machine running the loader. If you want to change any of the settings, you can simply prepend the command with the environment variables.

You can run the DocumentationTemplateLoader by downloading the pre-compiled jar files for both:

  • Java SDK
  • Samples

With this approach you only need a JRE installed, as you will be using pre-compiled code.

Requires at least Java 11, though 17+ is recommended

You must have a Java 11 (or higher) JRE installed to use the samples. If possible, we recommend using Java 17 (or higher) for significantly improved performance.

Run DocumentationTemplateLoader from pre-compiled jar files
1
2
3
4
5
6
export ATLAN_BASE_URL=https://tenant.atlan.com  # (1)
export ATLAN_API_KEY=eyNnCJd2T9Y8fEsbdx...
export FILENAME=atlan-documentation-template.xlsx
java \
    -cp atlan-java-...-jar-with-dependencies.jar:atlan-java-samples-...-jar-with-dependencies.jar \
    com.atlan.samples.loaders.DocumentationTemplateLoader  # (2)
  1. Export environment variables for the configuration you want to use. At a minimum:

    • ATLAN_BASE_URL giving the URL of your Atlan tenant
    • ATLAN_API_KEY giving the value of an API token with access to that tenant to load the assets
    • FILENAME giving the name of the Excel file from which to load metadata

    Overriding the configuration

    Note that you can specify any optional overrides as environment variables before running the loader in the same way. For example, you could use export BATCH_SIZE=20 to change the batch size.

    (Optional) If you want to override the logging settings, also add -Dlog4j.configurationFile=./log4j2.xml just before the canonical classname, pointing to your modified version of the log4j2.xml file.

  2. Within the directory containing the downloaded jar files, run the java command providing at least two arguments:

    1. The two jar files, separated by a :.
    2. The canonical classname of the DocumentationTemplateLoader.

    Replace the ... with proper version numbers

    Don't forget to replace the ... in the -cp argument's values with proper version numbers, or running this will not work.

Run DocumentationTemplateLoader from pre-compiled jar files
1
2
3
4
5
$env:ATLAN_BASE_URL="https://tenant.atlan.com"  # (1)
$env:ATLAN_API_KEY="eyNnCJd2T9Y8fEsbdx..."
$env:FILENAME="atlan-documentation-template.xlsx"
$env:CLASSPATH="atlan-java-...-jar-with-dependencies.jar;atlan-java-samples-...-jar-with-dependencies.jar"  # (2)
java com.atlan.samples.loaders.DocumentationTemplateLoader  # (3)
  1. Export environment variables for the configuration you want to use. At a minimum:

    • ATLAN_BASE_URL giving the URL of your Atlan tenant
    • ATLAN_API_KEY giving the value of an API token with access to that tenant to load the assets
    • FILENAME giving the name of the Excel file from which to load metadata

    Overriding the configuration

    Note that you can specify any optional overrides as environment variables before running the loader in the same way. For example, you could use $env:BATCH_SIZE="20" to change the batch size.

    (Optional) If you want to override the logging settings, also add -Dlog4j.configurationFile=./log4j2.xml just before the canonical classname, pointing to your modified version of the log4j2.xml file.

    Note that on windows you must surround the value of the variable in double-quotes!

  2. In our experience, it seems that the java -cp command does not always work on Windows. Therefore, ensure you set the $env:CLASSPATH environment variable to include the names of the downloaded jar files, separated by a ;.

    Replace the ... with proper version numbers

    Don't forget to replace the ... in the -cp argument's values with proper version numbers, or running this will not work.

  3. Within the directory containing the downloaded jar files, run the java command providing the canonical classname of the DocumentationTemplateLoader.

Run DocumentationTemplateLoader from pre-compiled jar files
1
2
3
4
5
6
export ATLAN_BASE_URL=https://tenant.atlan.com  # (1)
export ATLAN_API_KEY=eyNnCJd2T9Y8fEsbdx...
export FILENAME=atlan-documentation-template.xlsx
java \
    -cp atlan-java-...-jar-with-dependencies.jar:atlan-java-samples-...-jar-with-dependencies.jar \
    com.atlan.samples.loaders.DocumentationTemplateLoader  # (2)
  1. Export environment variables for the configuration you want to use. At a minimum:

    • ATLAN_BASE_URL giving the URL of your Atlan tenant
    • ATLAN_API_KEY giving the value of an API token with access to that tenant to load the assets
    • FILENAME giving the name of the Excel file from which to load metadata

    Overriding the configuration

    Note that you can specify any optional overrides as environment variables before running the loader in the same way. For example, you could use export BATCH_SIZE=20 to change the batch size.

    (Optional) If you want to override the logging settings, also add -Dlog4j.configurationFile=./log4j2.xml just before the canonical classname, pointing to your modified version of the log4j2.xml file.

  2. Within the directory containing the downloaded jar files, run the java command providing at least two arguments:

    1. The two jar files, separated by a :.
    2. The canonical classname of the DocumentationTemplateLoader.

    Replace the ... with proper version numbers

    Don't forget to replace the ... in the -cp argument's values with proper version numbers, or running this will not work.

You can also run the DocumentationTemplateLoader from a clone of the samples repository:

Repo

If you want to change the sample code provided to include additional logic, this is your best option. You have full control over all of the code this way and can make any changes you like before running it.

Additional requirements

You will need a JDK installed to run this way, not only the JRE, as the code must be compiled locally before it can be executed. The computer on which you run the code must also have network access to be able to configure Gradle itself and download the necessary dependencies of the code from Maven Central.

Run DocumentationTemplateLoader from a clone of the GitHub repo
FILENAME=atlan-documentation-template.xlsx \
./gradlew EnrichmentLoader # (1)!
  1. Within the root directory of your local clone of the repo, run the Gradle task for the DocumentationTemplateLoader.

    Overriding the configuration

    Note that you can specify any optional overrides as environment variables before running the loader. You can either do this in the same line where you run the command, or by exporting them first for example using export FILENAME=atlan-documentation-template.xlsx.

Example output
10:02:04.101 [main] INFO  com.atlan.samples.loaders.DocumentationTemplateLoader - Retrieving configuration and context...
10:02:06.483 [main] INFO  com.atlan.samples.loaders.DocumentationTemplateLoader - Loading tabular assets from: atlan-documentation-template.xlsx::Tabular Assets
10:02:06.500 [main] INFO  com.atlan.samples.loaders.models.ConnectionDetails - ... looking for existing (1) connections...
10:02:07.622 [main] INFO  com.atlan.samples.loaders.models.ConnectionDetails - ...... found: production (default/redshift/1694020655)
10:02:07.847 [main] INFO  com.atlan.samples.loaders.models.ConnectionDetails -  ... processed 1/1 (100%)
10:02:08.306 [main] INFO  com.atlan.samples.loaders.models.DatabaseDetails -  ... processed 1/1 (100%)
10:02:08.538 [main] INFO  com.atlan.samples.loaders.models.SchemaDetails -  ... processed 3/3 (100%)
10:02:08.807 [main] INFO  com.atlan.samples.loaders.models.ContainerDetails -  ... processed 3/3 (100%)
10:02:09.238 [main] INFO  com.atlan.samples.loaders.models.ColumnDetails -  ... processed 17/17 (100%)
10:02:09.238 [main] INFO  com.atlan.samples.loaders.DocumentationTemplateLoader - Updating assets with counts...
10:02:12.507 [main] INFO  com.atlan.samples.loaders.DocumentationTemplateLoader - Loading object store assets from: atlan-documentation-template.xlsx::Object Store Assets
10:02:12.515 [main] INFO  com.atlan.samples.loaders.models.ConnectionDetails - ... looking for existing (4) connections...
10:02:12.935 [main] INFO  com.atlan.samples.loaders.models.ConnectionDetails - ...... found: production (default/s3/1694020669)
10:02:13.355 [main] INFO  com.atlan.samples.loaders.models.ConnectionDetails - ...... found: production (default/adls/1694020672)
10:02:13.776 [main] INFO  com.atlan.samples.loaders.models.ConnectionDetails - ...... found: production (default/gcs/1694020674)
10:02:14.220 [main] INFO  com.atlan.samples.loaders.models.ConnectionDetails - ...... found: raw (default/s3/1694020676)
10:02:14.456 [main] INFO  com.atlan.samples.loaders.models.ConnectionDetails -  ... processed 4/4 (100%)
10:02:15.746 [main] INFO  com.atlan.samples.loaders.models.AccountDetails -  ... processed 1/1 (100%)
10:02:16.045 [main] INFO  com.atlan.samples.loaders.models.BucketDetails -  ... processed 4/4 (100%)
10:02:16.502 [main] INFO  com.atlan.samples.loaders.models.ObjectDetails -  ... processed 5/5 (100%)
10:02:16.502 [main] INFO  com.atlan.samples.loaders.DocumentationTemplateLoader - Updating assets with counts...
10:02:17.600 [main] INFO  com.atlan.samples.loaders.DocumentationTemplateLoader - Loading lineage from: atlan-documentation-template.xlsx::Lineage
10:02:17.626 [main] INFO  com.atlan.samples.loaders.models.ConnectionDetails - ... looking for existing (3) connections...
10:02:18.032 [main] INFO  com.atlan.samples.loaders.models.ConnectionDetails - ...... found: DataGrip (default/api/1694020706)
10:02:18.447 [main] INFO  com.atlan.samples.loaders.models.ConnectionDetails - ...... found: MWAA (default/airflow/1694020708)
10:02:18.874 [main] INFO  com.atlan.samples.loaders.models.ConnectionDetails - ...... found: Airflow (default/airflow/1694020711)
10:02:19.090 [main] INFO  com.atlan.samples.loaders.models.ConnectionDetails -  ... processed 3/3 (100%)
10:02:20.332 [main] INFO  com.atlan.samples.loaders.models.ConnectionDetails - ...... found: production (default/redshift/1694020655)
10:02:20.761 [main] INFO  com.atlan.samples.loaders.models.ConnectionDetails - ...... found: production (default/s3/1694020669)
10:02:21.174 [main] INFO  com.atlan.samples.loaders.models.ConnectionDetails - ...... found: production (default/adls/1694020672)
10:02:21.594 [main] INFO  com.atlan.samples.loaders.models.ConnectionDetails - ...... found: raw (default/s3/1694020676)
10:02:22.159 [main] INFO  com.atlan.samples.loaders.models.LineageDetails -  ... processed 5/5 (100%)
More details
  • The first line will always tell you the name of the Excel file the loader is processing.
  • The loader will only attempt to load those sheets that exist within the Excel workbook, and only the data in those sheets.