EnrichmentLoader¶
Atlan University
See it in action in our code samples course.
This content has moved!
We have moved our samples to a separate, dedicated site: https://solutions.atlan.com.
This document is no longer being maintained.
Updates (or creates, if it does not exist) metadata about all assets in Atlan:
- Glossaries
- Categories
- Terms
- Databases objects
- Business intelligence objects
- Object store objects
- and so on.
This uses the same Excel workbook extracted by the EnrichmentReporter, composed of 4 sheets:
Glossary enrichment
containing details of all glossariesCategory enrichment
containing details of all categoriesTerm enrichment
containing details of all termsAsset enrichment
containing details of all other assets
By default, unless overridden through the FILENAME
environment variable, the loader will try to read from a file named atlan-enrichment.xlsx
in the directory in which it is run.
Column details¶
The name of these columns is the critical piece to use, their order in the spreadsheet is irrelevant.
Any cell that expects a boolean value can be provided any of the following, case-insensitive, to mean true
:
X
Y
YES
TRUE
Only the bold fields are required
In each sheet, only the fields in bold with an are required. All other fields are optional — they are not mandatory for creating or updating an asset in Atlan.
All other fields are ignored
Only the fields listed below are used in the loader. All other fields (the various created / updated maintenance details) are ignored, as they are managed by Atlan itself and cannot be overridden. (This is to ensure the validity of auditing details.)
Common columns¶
The following fields are common to all sheets:
Column name | Usage |
---|---|
Description | A system-level description, which is a fallback if no User Description is provided. |
User Description | A user-entered description, typically via the user interface. It takes precedence over the Description . |
Owner Users | A comma-separated list of usernames for the owners of the asset. |
Owner Groups | A comma-separated list of the group names for the owners of the asset. |
Certificate | The certificate to apply to the asset — one of VERIFIED , DRAFT , or DEPRECATED . |
Certificate Message | An optional message to associate with the certificate. |
Announcement | The type of announcement to apply to the asset — one of information , warning , or issue . |
Announcement Title | A subject line for the announcement. |
Announcement Message | A more detailed message to associate with the announcement. |
README | Content to use for the README of the asset. This should be written as HTML code. |
<CustomMetadataName>|<PropertyName> |
Unique columns for each possible custom metadata property in your tenant. These will vary depending on the custom metadata available in your environment, and the values for these columns should match the type of the custom metadata property. |
Owners and Atlan tags must already exist
Any owners or Atlan tags you specify in the spreadsheet must already exist in Atlan. For example, if you want to set the PII
Atlan tag on an asset through the spreadsheet, you must already have PII
defined as a kind of Atlan tag in Atlan beforehand.
Custom metadata specifics
- Dates should be entered as a unix-style epoch, in milliseconds
- Boolean values can be entered as
true
,yes
, orx
for truthful values, and all others for false values - Properties that allow multiple values should have their values separated by the configured delimiter
Glossary enrichment¶
These columns are unique to the Glossary enrichment
sheet:
Column name | Usage |
---|---|
Glossary Name | The name of the glossary. This uniquely identifies a glossary, so if changed you will create a new glossary rather than updating an existing one. There must be a value for this column in the spreadsheet for the rest of the row to be applied. |
Category enrichment¶
These columns are unique to the Category enrichment
sheet:
Column name | Usage |
---|---|
Glossary Name | The name of the glossary the category is within. This must exist, so if you want to add a category to a new glossary, make sure the glossary itself is defined on the Glossary enrichment sheet. |
Category Path | The hierarchical location of the category. This should be @ -separated starting with the top-level ancestor's name down to the name of this category. For example, a 4-level deep category would be written as Ancestor Name@Grandparent Name@Parent Name@Category Name . The loader will process in multiple passes to ensure ancestors are created before children, but make sure there are separate rows in the sheet for each ancestor. (In this example you would need a row for Ancestor Name , another row for Ancestor Name@Grandparent Name , another for Ancestor Name@Grandparent Name@Parent Name , and finally one for Ancestor Name@Grandparent Name@Parent Name@Category Name .) |
Term enrichment¶
These columns are unique to the Term enrichment
sheet:
Column name | Usage |
---|---|
Glossary Name | The name of the glossary the term is within. This must exist, so if you want to add a term to a new glossary, make sure the glossary itself is defined on the Glossary enrichment sheet. |
Term Name* | The name of the term itself. This, combined with the glossary name, uniquely identifies a term, so if changed you will create a new term rather than updating an existing one. |
Categories | A comma-separated list of the category paths this term is organized within. These follow the same format as the Category Path field in the Category enrichment worksheet, and must exist (make sure any you use here are defined first in the Category enrichment sheet). |
Atlan Tag | A comma-separated list of the names of the Atlan tags you want to directly apply to the asset. This should not include propagated Atlan tags, so if you are producing the spreadsheet from the EnrichmentReporter make sure the DIRECT_ATLAN_TAGS_ONLY option is set to true . |
Related Terms | A comma-separated list of the terms that should be related to this term. |
Recommended Terms | A comma-separated list of the terms that are preferred to this term. |
Synonyms | A comma-separated list of the terms that have the same meaning as this term. |
Antonyms | A comma-separated list of the terms that have an opposite meaning to this term. |
Translated Terms | A comma-separated list of this terms translated terms. |
Valid Values For | A comma-separated list of the terms this term is a valid value for. |
Classifies | A comma-separated list of the terms this term classifies. |
Term relationship format
The separate columns for term relationships will all be processed in a second pass over the worksheet, so you do not need to worry about dependencies or the order of your rows. Just ensure that any terms you reference here are defined in their own row somewhere in this worksheet.
- In all cases these should be written as
Term Name@Glossary Name
, since you can have relationships between terms across glossaries.
Asset enrichment¶
These columns are unique to the Asset enrichment
sheet:
Column name | Usage |
---|---|
Qualified Name | The unique name of the asset to be enriched. This should be left unchanged from what was extracted using the EnrichmentReporter, or should be copy/pasted from the asset in Atlan (from the Properties tab). |
Type | The type of the asset. This name must match exactly the type used within Atlan, so is best left as extracted using the EnrichmentReporter. |
Name | The basic name of the asset. For example, for a database schema this would be only the name of the schema, not including the database or connection. |
Atlan Tags | A comma-separated list of the names of the Atlan tags you want to directly apply to the asset. This should not include propagated Atlan tags, so if you are producing the spreadsheet from the EnrichmentReporter make sure the DIRECT_ATLAN_TAGS_ONLY option is set to true . |
Assigned Terms | A comma-separated list of the terms that should be linked to this asset. |
Term relationship format
The column for assigned terms should have values written as Term Name@Glossary Name
, each separated by a comma (,
) if multiple terms should be assigned to a given asset.
Configuration¶
You can configure the following options for this loader as environment variables:
ATLAN_BASE_URL
¶
URL for your Atlan tenant (for example: https://tenant.atlan.com
).
ATLAN_API_KEY
¶
API token to use when accessing Atlan.
FILENAME
¶
Defines the filename the loader will use as its input. (Defaults to atlan-enrichment.xlsx
if not specified.)
DELIMITER
¶
Defines the character to use to separate values in multi-valued cells (like assigned terms, Atlan tags). If you anticipate the default comma (,
) to appear in any of these multi-valued objects' values, you should change this to some other character. (Of course, if you're loading a file extracted by the EnrichmentReporter you should use the same delimiter used by the export!)
REPLACE_ATLAN_TAGS
¶
Specifies how to handle the Atlan tags provided in the input file. By default (false
), the loader will only append any Atlan tags listed against each asset (will not overwrite or remove any Atlan tags that might already exist on the assets in Atlan). If set to true
, the loader will instead ensure the set of Atlan tags on the asset match what is in the input file — overwriting any that might exist in Atlan on that asset already (though note this only applies to direct Atlan tags — propagated Atlan tags will remain regardless).
REPLACE_CUSTOM_METADATA
¶
Specifies how to handle the custom metadata values provided in the input file. By default (false
), the loader will only attempt to update the specific custom metadata values listed against each asset (will not overwrite or remove any blank values, where these might already be populated on the assets in Atlan). If set to true
, the loader will instead ensure the custom metadata on the asset matches what is in the input file — overwriting and removing any values that might exist in Atlan when those values are blank in the input file.
UPDATE_ONLY
¶
Specifies how to handle assets that do not already exist in the environment. By default (false
), the loader will create any assets that do not already exist in the environment. If set to true
, the loader will only update an asset if it already exists and will not create any new assets. (Note that when true
there is additional work to lookup existing assets, so it will be slower.)
Logging¶
By default, only messages at INFO
and above will be logged (to the console). You can change this level
by copying the main/resources/log4j2.xml
from the source repo and modifying it to give:
- more detail (
DEBUG
will log every API request and response, including their full payloads) - less detail (
WARN
will only print warnings and errors,ERROR
only errors).
Running the loaders locally¶
You can run the loaders locally on any machine with network access to your Atlan tenant.
When run in this mode, the loader will read the input Excel file locally on the machine running the loader. If you want to change any of the settings, you can simply prepend the command with the environment variables.
You can run the EnrichmentLoader
by downloading the pre-compiled jar files for both:
With this approach you only need a JRE installed, as you will be using pre-compiled code.
Requires at least Java 11, though 17+ is recommended
You must have a Java 11 (or higher) JRE installed to use the samples. If possible, we recommend using Java 17 (or higher) for significantly improved performance.
Run EnrichmentLoader from pre-compiled jar files | |
---|---|
1 2 3 4 5 6 |
|
-
Export environment variables for the configuration you want to use. At a minimum:
ATLAN_BASE_URL
giving the URL of your Atlan tenantATLAN_API_KEY
giving the value of an API token with access to that tenant to load the assetsFILENAME
giving the name of the Excel file from which to load asset information
Overriding the configuration
Note that you can specify any optional overrides as environment variables before running the loader in the same way. For example, you could use
export BATCH_SIZE=20
to change the batch size.(Optional) If you want to override the logging settings, also add
-Dlog4j.configurationFile=./log4j2.xml
just before the canonical classname, pointing to your modified version of thelog4j2.xml
file. -
Within the directory containing the downloaded jar files, run the
java
command providing at least two arguments:- The two jar files, separated by a
:
. - The canonical classname of the
EnrichmentLoader
.
Replace the
...
with proper version numbersDon't forget to replace the
...
in the-cp
argument's values with proper version numbers, or running this will not work. - The two jar files, separated by a
Run EnrichmentLoader from pre-compiled jar files | |
---|---|
1 2 3 4 5 |
|
-
Export environment variables for the configuration you want to use. At a minimum:
ATLAN_BASE_URL
giving the URL of your Atlan tenantATLAN_API_KEY
giving the value of an API token with access to that tenant to load the assetsFILENAME
giving the name of the Excel file from which to load asset information
Overriding the configuration
Note that you can specify any optional overrides as environment variables before running the loader in the same way. For example, you could use
$env:BATCH_SIZE="20"
to change the batch size.(Optional) If you want to override the logging settings, also add
-Dlog4j.configurationFile=./log4j2.xml
just before the canonical classname, pointing to your modified version of thelog4j2.xml
file.Note that on windows you must surround the value of the variable in double-quotes!
-
In our experience, it seems that the
java -cp
command does not always work on Windows. Therefore, ensure you set the$env:CLASSPATH
environment variable to include the names of the downloaded jar files, separated by a;
.Replace the
...
with proper version numbersDon't forget to replace the
...
in the-cp
argument's values with proper version numbers, or running this will not work. -
Within the directory containing the downloaded jar files, run the
java
command providing the canonical classname of theEnrichmentLoader
.
Run EnrichmentLoader from pre-compiled jar files | |
---|---|
1 2 3 4 5 6 |
|
-
Export environment variables for the configuration you want to use. At a minimum:
ATLAN_BASE_URL
giving the URL of your Atlan tenantATLAN_API_KEY
giving the value of an API token with access to that tenant to load the assetsFILENAME
giving the name of the Excel file from which to load asset information
Overriding the configuration
Note that you can specify any optional overrides as environment variables before running the loader in the same way. For example, you could use
export BATCH_SIZE=20
to change the batch size.(Optional) If you want to override the logging settings, also add
-Dlog4j.configurationFile=./log4j2.xml
just before the canonical classname, pointing to your modified version of thelog4j2.xml
file. -
Within the directory containing the downloaded jar files, run the
java
command providing at least two arguments:- The two jar files, separated by a
:
. - The canonical classname of the
EnrichmentLoader
.
Replace the
...
with proper version numbersDon't forget to replace the
...
in the-cp
argument's values with proper version numbers, or running this will not work. - The two jar files, separated by a
You can also run the EnrichmentLoader
from a clone of the samples repository:
If you want to change the sample code provided to include additional logic, this is your best option. You have full control over all of the code this way and can make any changes you like before running it.
Additional requirements
You will need a JDK installed to run this way, not only the JRE, as the code must be compiled locally before it can be executed. The computer on which you run the code must also have network access to be able to configure Gradle itself and download the necessary dependencies of the code from Maven Central.
FILENAME=enrichment-report-20230710-085713-454.xlsx \
./gradlew EnrichmentLoader # (1)!
-
Within the root directory of your local clone of the repo, run the Gradle task for the
EnrichmentLoader
.Overriding the configuration
Note that you can specify any optional overrides as environment variables before running the loader. You can either do this in the same line where you run the command, or by exporting them first for example using
export FILENAME=enrichment-report-20230710-085713-454.xlsx
.
18:00:04.816 [main] INFO com.atlan.samples.loaders.EnrichmentLoader - Retrieving configuration and context...
18:00:06.483 [main] INFO com.atlan.samples.loaders.EnrichmentLoader - Loading enrichment details from: enrichment-report-20230906-134813-692.xlsx
18:00:06.823 [main] INFO com.atlan.samples.loaders.EnrichmentLoader - Processing sheet Glossary enrichment, with 4 rows.
18:00:08.484 [main] INFO com.atlan.samples.loaders.models.GlossaryEnrichmentDetails - ... processed 1/4 (25%)
18:00:09.142 [main] INFO com.atlan.samples.loaders.models.GlossaryEnrichmentDetails - ... processed 2/4 (50%)
18:00:09.799 [main] INFO com.atlan.samples.loaders.models.GlossaryEnrichmentDetails - ... processed 3/4 (75%)
18:00:10.496 [main] INFO com.atlan.samples.loaders.models.GlossaryEnrichmentDetails - ... processed 4/4 (100%)
18:00:10.505 [main] INFO com.atlan.samples.loaders.EnrichmentLoader - Processing sheet Category enrichment, with 4 rows.
18:00:11.543 [main] INFO com.atlan.samples.loaders.models.CategoryEnrichmentDetails - ... processed 1/4 (25%)
18:00:12.605 [main] INFO com.atlan.samples.loaders.models.CategoryEnrichmentDetails - ... processed 2/4 (50%)
18:00:12.802 [main] INFO com.atlan.samples.loaders.models.CategoryEnrichmentDetails - ... processed 3/4 (75%)
18:00:13.011 [main] INFO com.atlan.samples.loaders.models.CategoryEnrichmentDetails - ... processed 4/4 (100%)
18:00:13.018 [main] INFO com.atlan.samples.loaders.EnrichmentLoader - Processing sheet Term enrichment, with 8 rows.
18:00:13.648 [main] INFO com.atlan.samples.loaders.models.TermEnrichmentDetails - ... processed 1/8 (13%)
18:00:14.287 [main] INFO com.atlan.samples.loaders.models.TermEnrichmentDetails - ... processed 2/8 (25%)
18:00:14.907 [main] INFO com.atlan.samples.loaders.models.TermEnrichmentDetails - ... processed 3/8 (38%)
18:00:15.533 [main] INFO com.atlan.samples.loaders.models.TermEnrichmentDetails - ... processed 4/8 (50%)
18:00:16.146 [main] INFO com.atlan.samples.loaders.models.TermEnrichmentDetails - ... processed 5/8 (63%)
18:00:16.765 [main] INFO com.atlan.samples.loaders.models.TermEnrichmentDetails - ... processed 6/8 (75%)
18:00:17.597 [main] INFO com.atlan.samples.loaders.models.TermEnrichmentDetails - ... processed 7/8 (88%)
18:00:18.240 [main] INFO com.atlan.samples.loaders.models.TermEnrichmentDetails - ... processed 8/8 (100%)
18:00:18.768 [main] INFO com.atlan.samples.loaders.EnrichmentLoader - Processing sheet Asset enrichment, with 133 rows.
18:00:19.240 [main] INFO com.atlan.samples.loaders.models.AssetEnrichmentDetails - ... processed 0/133 (0%)
18:00:32.047 [main] INFO com.atlan.samples.loaders.models.AssetEnrichmentDetails - ... processed 50/133 (38%)
18:00:44.833 [main] INFO com.atlan.samples.loaders.models.AssetEnrichmentDetails - ... processed 100/133 (75%)
18:00:52.715 [main] INFO com.atlan.samples.loaders.models.AssetDetails - ... selectively updating custom metadata QD on asset 9f6319ef-e6fb-4261-929b-4d6c198dde30
18:00:52.913 [main] INFO com.atlan.samples.loaders.models.AssetDetails - ... selectively updating custom metadata QD on asset 9f46a3f8-d1bc-4790-bb55-6bc6d6bf8cea
18:00:53.110 [main] INFO com.atlan.samples.loaders.models.AssetDetails - ... selectively updating custom metadata QD on asset dafeb71f-3b70-4c20-9194-2e5cb182de0b
More details
- The first line will always tell you the name of the Excel file the loader is processing.
- The loader will only attempt to load those sheets that exist within the Excel workbook, and only the data in those sheets.