Skip to content

EnrichmentLoader

Atlan University

See it in action in our code samples course.

This content has moved!

We have moved our samples to a separate, dedicated site: https://solutions.atlan.com.

This document is no longer being maintained.

Updates (or creates, if it does not exist) metadata about all assets in Atlan:

  • Glossaries
  • Categories
  • Terms
  • Databases objects
  • Business intelligence objects
  • Object store objects
  • and so on.

This uses the same Excel workbook extracted by the EnrichmentReporter, composed of 4 sheets:

  • Glossary enrichment containing details of all glossaries
  • Category enrichment containing details of all categories
  • Term enrichment containing details of all terms
  • Asset enrichment containing details of all other assets

By default, unless overridden through the FILENAME environment variable, the loader will try to read from a file named atlan-enrichment.xlsx in the directory in which it is run.

Column details

The name of these columns is the critical piece to use, their order in the spreadsheet is irrelevant.

Any cell that expects a boolean value can be provided any of the following, case-insensitive, to mean true:

  • X
  • Y
  • YES
  • TRUE

Only the bold fields are required

In each sheet, only the fields in bold with an are required. All other fields are optional — they are not mandatory for creating or updating an asset in Atlan.

All other fields are ignored

Only the fields listed below are used in the loader. All other fields (the various created / updated maintenance details) are ignored, as they are managed by Atlan itself and cannot be overridden. (This is to ensure the validity of auditing details.)

Common columns

The following fields are common to all sheets:

Column name Usage
Description A system-level description, which is a fallback if no User Description is provided.
User Description A user-entered description, typically via the user interface. It takes precedence over the Description.
Owner Users A comma-separated list of usernames for the owners of the asset.
Owner Groups A comma-separated list of the group names for the owners of the asset.
Certificate The certificate to apply to the asset — one of VERIFIED, DRAFT, or DEPRECATED.
Certificate Message An optional message to associate with the certificate.
Announcement The type of announcement to apply to the asset — one of information, warning, or issue.
Announcement Title A subject line for the announcement.
Announcement Message A more detailed message to associate with the announcement.
README Content to use for the README of the asset. This should be written as HTML code.
<CustomMetadataName>|<PropertyName> Unique columns for each possible custom metadata property in your tenant. These will vary depending on the custom metadata available in your environment, and the values for these columns should match the type of the custom metadata property.

Owners and Atlan tags must already exist

Any owners or Atlan tags you specify in the spreadsheet must already exist in Atlan. For example, if you want to set the PII Atlan tag on an asset through the spreadsheet, you must already have PII defined as a kind of Atlan tag in Atlan beforehand.

Custom metadata specifics

  • Dates should be entered as a unix-style epoch, in milliseconds
  • Boolean values can be entered as true, yes, or x for truthful values, and all others for false values
  • Properties that allow multiple values should have their values separated by the configured delimiter

Glossary enrichment

These columns are unique to the Glossary enrichment sheet:

Column name Usage
Glossary Name The name of the glossary. This uniquely identifies a glossary, so if changed you will create a new glossary rather than updating an existing one. There must be a value for this column in the spreadsheet for the rest of the row to be applied.

Category enrichment

These columns are unique to the Category enrichment sheet:

Column name Usage
Glossary Name The name of the glossary the category is within. This must exist, so if you want to add a category to a new glossary, make sure the glossary itself is defined on the Glossary enrichment sheet.
Category Path The hierarchical location of the category. This should be @-separated starting with the top-level ancestor's name down to the name of this category. For example, a 4-level deep category would be written as Ancestor Name@Grandparent Name@Parent Name@Category Name. The loader will process in multiple passes to ensure ancestors are created before children, but make sure there are separate rows in the sheet for each ancestor. (In this example you would need a row for Ancestor Name, another row for Ancestor Name@Grandparent Name, another for Ancestor Name@Grandparent Name@Parent Name, and finally one for Ancestor Name@Grandparent Name@Parent Name@Category Name.)

Term enrichment

These columns are unique to the Term enrichment sheet:

Column name Usage
Glossary Name The name of the glossary the term is within. This must exist, so if you want to add a term to a new glossary, make sure the glossary itself is defined on the Glossary enrichment sheet.
Term Name* The name of the term itself. This, combined with the glossary name, uniquely identifies a term, so if changed you will create a new term rather than updating an existing one.
Categories A comma-separated list of the category paths this term is organized within. These follow the same format as the Category Path field in the Category enrichment worksheet, and must exist (make sure any you use here are defined first in the Category enrichment sheet).
Atlan Tag A comma-separated list of the names of the Atlan tags you want to directly apply to the asset. This should not include propagated Atlan tags, so if you are producing the spreadsheet from the EnrichmentReporter make sure the DIRECT_ATLAN_TAGS_ONLY option is set to true.
Related Terms A comma-separated list of the terms that should be related to this term.
Recommended Terms A comma-separated list of the terms that are preferred to this term.
Synonyms A comma-separated list of the terms that have the same meaning as this term.
Antonyms A comma-separated list of the terms that have an opposite meaning to this term.
Translated Terms A comma-separated list of this terms translated terms.
Valid Values For A comma-separated list of the terms this term is a valid value for.
Classifies A comma-separated list of the terms this term classifies.

Term relationship format

The separate columns for term relationships will all be processed in a second pass over the worksheet, so you do not need to worry about dependencies or the order of your rows. Just ensure that any terms you reference here are defined in their own row somewhere in this worksheet.

  • In all cases these should be written as Term Name@Glossary Name, since you can have relationships between terms across glossaries.

Asset enrichment

These columns are unique to the Asset enrichment sheet:

Column name Usage
Qualified Name The unique name of the asset to be enriched. This should be left unchanged from what was extracted using the EnrichmentReporter, or should be copy/pasted from the asset in Atlan (from the Properties tab).
Type The type of the asset. This name must match exactly the type used within Atlan, so is best left as extracted using the EnrichmentReporter.
Name The basic name of the asset. For example, for a database schema this would be only the name of the schema, not including the database or connection.
Atlan Tags A comma-separated list of the names of the Atlan tags you want to directly apply to the asset. This should not include propagated Atlan tags, so if you are producing the spreadsheet from the EnrichmentReporter make sure the DIRECT_ATLAN_TAGS_ONLY option is set to true.
Assigned Terms A comma-separated list of the terms that should be linked to this asset.

Term relationship format

The column for assigned terms should have values written as Term Name@Glossary Name, each separated by a comma (,) if multiple terms should be assigned to a given asset.

Configuration

You can configure the following options for this loader as environment variables:

ATLAN_BASE_URL

URL for your Atlan tenant (for example: https://tenant.atlan.com).

ATLAN_API_KEY

API token to use when accessing Atlan.

FILENAME

Defines the filename the loader will use as its input. (Defaults to atlan-enrichment.xlsx if not specified.)

DELIMITER

Defines the character to use to separate values in multi-valued cells (like assigned terms, Atlan tags). If you anticipate the default comma (,) to appear in any of these multi-valued objects' values, you should change this to some other character. (Of course, if you're loading a file extracted by the EnrichmentReporter you should use the same delimiter used by the export!)

REPLACE_ATLAN_TAGS

Specifies how to handle the Atlan tags provided in the input file. By default (false), the loader will only append any Atlan tags listed against each asset (will not overwrite or remove any Atlan tags that might already exist on the assets in Atlan). If set to true, the loader will instead ensure the set of Atlan tags on the asset match what is in the input file — overwriting any that might exist in Atlan on that asset already (though note this only applies to direct Atlan tags — propagated Atlan tags will remain regardless).

REPLACE_CUSTOM_METADATA

Specifies how to handle the custom metadata values provided in the input file. By default (false), the loader will only attempt to update the specific custom metadata values listed against each asset (will not overwrite or remove any blank values, where these might already be populated on the assets in Atlan). If set to true, the loader will instead ensure the custom metadata on the asset matches what is in the input file — overwriting and removing any values that might exist in Atlan when those values are blank in the input file.

UPDATE_ONLY

Specifies how to handle assets that do not already exist in the environment. By default (false), the loader will create any assets that do not already exist in the environment. If set to true, the loader will only update an asset if it already exists and will not create any new assets. (Note that when true there is additional work to lookup existing assets, so it will be slower.)

Logging

By default, only messages at INFO and above will be logged (to the console). You can change this level by copying the main/resources/log4j2.xml from the source repo and modifying it to give:

  • more detail (DEBUG will log every API request and response, including their full payloads)
  • less detail (WARN will only print warnings and errors, ERROR only errors).

Running the loaders locally

You can run the loaders locally on any machine with network access to your Atlan tenant.

When run in this mode, the loader will read the input Excel file locally on the machine running the loader. If you want to change any of the settings, you can simply prepend the command with the environment variables.

You can run the EnrichmentLoader by downloading the pre-compiled jar files for both:

  • Java SDK
  • Samples

With this approach you only need a JRE installed, as you will be using pre-compiled code.

Requires at least Java 11, though 17+ is recommended

You must have a Java 11 (or higher) JRE installed to use the samples. If possible, we recommend using Java 17 (or higher) for significantly improved performance.

Run EnrichmentLoader from pre-compiled jar files
1
2
3
4
5
6
export ATLAN_BASE_URL=https://tenant.atlan.com  # (1)
export ATLAN_API_KEY=eyNnCJd2T9Y8fEsbdx...
export FILENAME=enrichment-report-20230710-085713-454.xlsx
java \
    -cp atlan-java-...-jar-with-dependencies.jar:atlan-java-samples-...-jar-with-dependencies.jar \
    com.atlan.samples.loaders.EnrichmentLoader  # (2)
  1. Export environment variables for the configuration you want to use. At a minimum:

    • ATLAN_BASE_URL giving the URL of your Atlan tenant
    • ATLAN_API_KEY giving the value of an API token with access to that tenant to load the assets
    • FILENAME giving the name of the Excel file from which to load asset information

    Overriding the configuration

    Note that you can specify any optional overrides as environment variables before running the loader in the same way. For example, you could use export BATCH_SIZE=20 to change the batch size.

    (Optional) If you want to override the logging settings, also add -Dlog4j.configurationFile=./log4j2.xml just before the canonical classname, pointing to your modified version of the log4j2.xml file.

  2. Within the directory containing the downloaded jar files, run the java command providing at least two arguments:

    1. The two jar files, separated by a :.
    2. The canonical classname of the EnrichmentLoader.

    Replace the ... with proper version numbers

    Don't forget to replace the ... in the -cp argument's values with proper version numbers, or running this will not work.

Run EnrichmentLoader from pre-compiled jar files
1
2
3
4
5
$env:ATLAN_BASE_URL="https://tenant.atlan.com"  # (1)
$env:ATLAN_API_KEY="eyNnCJd2T9Y8fEsbdx..."
$env:FILENAME="enrichment-report-20230710-085713-454.xlsx"
$env:CLASSPATH="atlan-java-...-jar-with-dependencies.jar;atlan-java-samples-...-jar-with-dependencies.jar"  # (2)
java com.atlan.samples.loaders.EnrichmentLoader  # (3)
  1. Export environment variables for the configuration you want to use. At a minimum:

    • ATLAN_BASE_URL giving the URL of your Atlan tenant
    • ATLAN_API_KEY giving the value of an API token with access to that tenant to load the assets
    • FILENAME giving the name of the Excel file from which to load asset information

    Overriding the configuration

    Note that you can specify any optional overrides as environment variables before running the loader in the same way. For example, you could use $env:BATCH_SIZE="20" to change the batch size.

    (Optional) If you want to override the logging settings, also add -Dlog4j.configurationFile=./log4j2.xml just before the canonical classname, pointing to your modified version of the log4j2.xml file.

    Note that on windows you must surround the value of the variable in double-quotes!

  2. In our experience, it seems that the java -cp command does not always work on Windows. Therefore, ensure you set the $env:CLASSPATH environment variable to include the names of the downloaded jar files, separated by a ;.

    Replace the ... with proper version numbers

    Don't forget to replace the ... in the -cp argument's values with proper version numbers, or running this will not work.

  3. Within the directory containing the downloaded jar files, run the java command providing the canonical classname of the EnrichmentLoader.

Run EnrichmentLoader from pre-compiled jar files
1
2
3
4
5
6
export ATLAN_BASE_URL=https://tenant.atlan.com  # (1)
export ATLAN_API_KEY=eyNnCJd2T9Y8fEsbdx...
export FILENAME=enrichment-report-20230710-085713-454.xlsx
java \
    -cp atlan-java-...-jar-with-dependencies.jar:atlan-java-samples-...-jar-with-dependencies.jar \
    com.atlan.samples.loaders.EnrichmentLoader  # (2)
  1. Export environment variables for the configuration you want to use. At a minimum:

    • ATLAN_BASE_URL giving the URL of your Atlan tenant
    • ATLAN_API_KEY giving the value of an API token with access to that tenant to load the assets
    • FILENAME giving the name of the Excel file from which to load asset information

    Overriding the configuration

    Note that you can specify any optional overrides as environment variables before running the loader in the same way. For example, you could use export BATCH_SIZE=20 to change the batch size.

    (Optional) If you want to override the logging settings, also add -Dlog4j.configurationFile=./log4j2.xml just before the canonical classname, pointing to your modified version of the log4j2.xml file.

  2. Within the directory containing the downloaded jar files, run the java command providing at least two arguments:

    1. The two jar files, separated by a :.
    2. The canonical classname of the EnrichmentLoader.

    Replace the ... with proper version numbers

    Don't forget to replace the ... in the -cp argument's values with proper version numbers, or running this will not work.

You can also run the EnrichmentLoader from a clone of the samples repository:

Repo

If you want to change the sample code provided to include additional logic, this is your best option. You have full control over all of the code this way and can make any changes you like before running it.

Additional requirements

You will need a JDK installed to run this way, not only the JRE, as the code must be compiled locally before it can be executed. The computer on which you run the code must also have network access to be able to configure Gradle itself and download the necessary dependencies of the code from Maven Central.

Run EnrichmentLoader from a clone of the GitHub repo
FILENAME=enrichment-report-20230710-085713-454.xlsx \
./gradlew EnrichmentLoader # (1)!
  1. Within the root directory of your local clone of the repo, run the Gradle task for the EnrichmentLoader.

    Overriding the configuration

    Note that you can specify any optional overrides as environment variables before running the loader. You can either do this in the same line where you run the command, or by exporting them first for example using export FILENAME=enrichment-report-20230710-085713-454.xlsx.

Example output
18:00:04.816 [main] INFO  com.atlan.samples.loaders.EnrichmentLoader - Retrieving configuration and context...
18:00:06.483 [main] INFO  com.atlan.samples.loaders.EnrichmentLoader - Loading enrichment details from: enrichment-report-20230906-134813-692.xlsx
18:00:06.823 [main] INFO  com.atlan.samples.loaders.EnrichmentLoader - Processing sheet Glossary enrichment, with 4 rows.
18:00:08.484 [main] INFO  com.atlan.samples.loaders.models.GlossaryEnrichmentDetails -  ... processed 1/4 (25%)
18:00:09.142 [main] INFO  com.atlan.samples.loaders.models.GlossaryEnrichmentDetails -  ... processed 2/4 (50%)
18:00:09.799 [main] INFO  com.atlan.samples.loaders.models.GlossaryEnrichmentDetails -  ... processed 3/4 (75%)
18:00:10.496 [main] INFO  com.atlan.samples.loaders.models.GlossaryEnrichmentDetails -  ... processed 4/4 (100%)
18:00:10.505 [main] INFO  com.atlan.samples.loaders.EnrichmentLoader - Processing sheet Category enrichment, with 4 rows.
18:00:11.543 [main] INFO  com.atlan.samples.loaders.models.CategoryEnrichmentDetails -  ... processed 1/4 (25%)
18:00:12.605 [main] INFO  com.atlan.samples.loaders.models.CategoryEnrichmentDetails -  ... processed 2/4 (50%)
18:00:12.802 [main] INFO  com.atlan.samples.loaders.models.CategoryEnrichmentDetails -  ... processed 3/4 (75%)
18:00:13.011 [main] INFO  com.atlan.samples.loaders.models.CategoryEnrichmentDetails -  ... processed 4/4 (100%)
18:00:13.018 [main] INFO  com.atlan.samples.loaders.EnrichmentLoader - Processing sheet Term enrichment, with 8 rows.
18:00:13.648 [main] INFO  com.atlan.samples.loaders.models.TermEnrichmentDetails -  ... processed 1/8 (13%)
18:00:14.287 [main] INFO  com.atlan.samples.loaders.models.TermEnrichmentDetails -  ... processed 2/8 (25%)
18:00:14.907 [main] INFO  com.atlan.samples.loaders.models.TermEnrichmentDetails -  ... processed 3/8 (38%)
18:00:15.533 [main] INFO  com.atlan.samples.loaders.models.TermEnrichmentDetails -  ... processed 4/8 (50%)
18:00:16.146 [main] INFO  com.atlan.samples.loaders.models.TermEnrichmentDetails -  ... processed 5/8 (63%)
18:00:16.765 [main] INFO  com.atlan.samples.loaders.models.TermEnrichmentDetails -  ... processed 6/8 (75%)
18:00:17.597 [main] INFO  com.atlan.samples.loaders.models.TermEnrichmentDetails -  ... processed 7/8 (88%)
18:00:18.240 [main] INFO  com.atlan.samples.loaders.models.TermEnrichmentDetails -  ... processed 8/8 (100%)
18:00:18.768 [main] INFO  com.atlan.samples.loaders.EnrichmentLoader - Processing sheet Asset enrichment, with 133 rows.
18:00:19.240 [main] INFO  com.atlan.samples.loaders.models.AssetEnrichmentDetails -  ... processed 0/133 (0%)
18:00:32.047 [main] INFO  com.atlan.samples.loaders.models.AssetEnrichmentDetails -  ... processed 50/133 (38%)
18:00:44.833 [main] INFO  com.atlan.samples.loaders.models.AssetEnrichmentDetails -  ... processed 100/133 (75%)
18:00:52.715 [main] INFO  com.atlan.samples.loaders.models.AssetDetails - ... selectively updating custom metadata QD on asset 9f6319ef-e6fb-4261-929b-4d6c198dde30
18:00:52.913 [main] INFO  com.atlan.samples.loaders.models.AssetDetails - ... selectively updating custom metadata QD on asset 9f46a3f8-d1bc-4790-bb55-6bc6d6bf8cea
18:00:53.110 [main] INFO  com.atlan.samples.loaders.models.AssetDetails - ... selectively updating custom metadata QD on asset dafeb71f-3b70-4c20-9194-2e5cb182de0b
More details
  • The first line will always tell you the name of the Excel file the loader is processing.
  • The loader will only attempt to load those sheets that exist within the Excel workbook, and only the data in those sheets.