Skip to content

EnrichmentReporter

Atlan University

See it in action in our code samples course.

This content has moved!

We have moved our samples to a separate, dedicated site: https://solutions.atlan.com.

This document is no longer being maintained.

Extracts metadata about assets and their enrichment, producing an Excel workbook consisting of 4 worksheets:

  • Glossary enrichment contains details about each of the glossaries
  • Category enrichment contains details about each of the categories, across all glossaries
  • Term enrichment contains details about each term, across all glossaries
  • Asset enrichment contains details about all non-glossary assets that fit the filter criteria (see below)

Filter criteria

To filter the assets to include in the Asset enrichment worksheet, you must choose one of the following criteria:

  • GROUP (the default) will include all assets that have at least one group owner defined
  • ATLAN_TAG will include all assets that are assigned a specified Atlan tag
  • PREFIX will include all assets whose qualifiedName starts with a specified string

Configuration

You can configure the following options for this reporter as environment variables:

ATLAN_BASE_URL

URL for your Atlan tenant (for example: https://tenant.atlan.com).

ATLAN_API_KEY

API token to use when accessing Atlan.

FILTER_BY

Defines which filter criteria to use for the non-glossary assets: GROUP, ATLAN_TAG, PREFIX. (Will default to GROUP if not specified.)

ATLAN_TAG

Defines which Atlan tag you want to report on (only assets with this Atlan tag will be included in the report). Note that this only has any effect when FILTER_BY is set to ATLAN_TAG. (Defaults to empty.)

PREFIX

Defines the string that all assets' qualifiedNames must start with to be reported. Note that this only has any effect when FILTER_BY is set to PREFIX. (Defaults to empty.)

INCLUDE_FIELD_LEVEL

Specifies whether to include details of field-level assets like columns (true). By default (false), the reporter will only report on container-level objects (such as tables) and summarize field-level details through counts.

DIRECT_ATLAN_TAGS_ONLY

Specifies whether to include only direct Atlan tags on each asset (true). By default (false), all Atlan tags — both direct and propagated — are listed for each asset. If you are using this as input for the EnrichmentLoader, you will want to set this to true.

DELIMITER

Defines the character to use to separate values in multi-valued cells (like assigned terms, Atlan tags). If you anticipate the default comma (,) to appear in any of these multi-valued objects' names, you should change this to some other character.

BATCH_SIZE

Defines how many records the API calls attempt to retrieve on each request. (Defaults to 50 if not specified.)

FILE_PREFIX

Defines what the Excel file's name will start with. (Defaults to enrichment-report if not specified.)

REGION

Defines the AWS region of the S3 bucket where you want to write the Excel file output.

BUCKET

Defines the S3 bucket where you want the reporter to store the Excel file output. (Note that there is no default value for this, so this must be sent for the utility to run via a Lambda function and produce consumable output — if blank (default) the reporter produces a local Excel file.)

Details can be loaded using EnrichmentLoader

Note: the details extracted by this reporter can be loaded back into Atlan from the same spreadsheet using the EnrichmentLoader.

Lambda configuration

To use this reporter as an AWS Lambda function, you can configure the same variables as above, but send them in a JSON structure:

Lambda configuration
{
  "FILTER_BY": "GROUP",
  "ATLAN_TAG": "(name of Atlan tag to filter by)",
  "PREFIX": "(qualifiedName prefix to filter by)",
  "INCLUDE_FIELD_LEVEL": "false",
  "DIRECT_ATLAN_TAGS_ONLY": "false",
  "DELIMITER": ",",
  "BATCH_SIZE": "50",
  "FILE_PREFIX": "enrichment-report",
  "REGION": "ap-south-1",
  "BUCKET": ""
}

Logging

By default, only messages at INFO and above will be logged (to the console). You can change this level by copying the main/resources/log4j2.xml from the source repo and modifying it to give:

  • more detail (DEBUG will log every API request and response, including their full payloads)
  • less detail (WARN will only print warnings and errors, ERROR only errors).

Running the reporters locally

You can run the reporters locally on any machine with network access to your Atlan tenant.

When run in this mode, the reporter will read the input Excel file locally on the machine running the reporter. If you want to change any of the settings, you can simply prepend the command with the environment variables.

You can run the EnrichmentReporter by downloading the pre-compiled jar files for both:

  • Java SDK
  • Samples

With this approach you only need a JRE installed, as you will be using pre-compiled code.

Requires at least Java 11, though 17+ is recommended

You must have a Java 11 (or higher) JRE installed to use the samples. If possible, we recommend using Java 17 (or higher) for significantly improved performance.

Run EnrichmentReporter from pre-compiled jar files
1
2
3
4
5
6
7
export ATLAN_BASE_URL=https://tenant.atlan.com  # (1)
export ATLAN_API_KEY=eyNnCJd2T9Y8fEsbdx...
export FILTER_BY=PREFIX
export PREFIX=default/snowflake/1234567890
java \
    -cp atlan-java-...-jar-with-dependencies.jar:atlan-java-samples-...-jar-with-dependencies.jar \
    com.atlan.samples.reporters.EnrichmentReporter  # (2)
  1. Export environment variables for the configuration you want to use. At a minimum:

    • ATLAN_BASE_URL giving the URL of your Atlan tenant
    • ATLAN_API_KEY giving the value of an API token with access to that tenant to load the assets

    Overriding the configuration

    Note that you can specify any optional overrides as environment variables before running the reporter in the same way. For example, you could use export FILTER_BY=PREFIX to export assets based on a specific prefix, and export PREFIX=... to specify that prefix.

    (Optional) If you want to override the logging settings, also add -Dlog4j.configurationFile=./log4j2.xml just before the canonical classname, pointing to your modified version of the log4j2.xml file.

  2. Within the directory containing the downloaded jar files, run the java command providing at least two arguments:

    1. The two jar files, separated by a :.
    2. The canonical classname of the EnrichmentReporter.

    Replace the ... with proper version numbers

    Don't forget to replace the ... in the -cp argument's values with proper version numbers, or running this will not work.

Run EnrichmentReporter from pre-compiled jar files
1
2
3
4
5
6
$env:ATLAN_BASE_URL="https://tenant.atlan.com"  # (1)
$env:ATLAN_API_KEY="eyNnCJd2T9Y8fEsbdx..."
$env:FILTER_BY="PREFIX"
$env:PREFIX="default/snowflake/1234567890"
$env:CLASSPATH="atlan-java-...-jar-with-dependencies.jar;atlan-java-samples-...-jar-with-dependencies.jar"  # (2)
java com.atlan.samples.reporters.EnrichmentReporter  # (3)
  1. Export environment variables for the configuration you want to use. At a minimum:

    • ATLAN_BASE_URL giving the URL of your Atlan tenant
    • ATLAN_API_KEY giving the value of an API token with access to that tenant to load the assets

    Overriding the configuration

    Note that you can specify any optional overrides as environment variables before running the reporter in the same way. For example, you could use $env:FILTER_BY="PREFIX" to export assets based on a specific prefix, and $env$PREFIX="..." to specify that prefix.

    (Optional) If you want to override the logging settings, also add -Dlog4j.configurationFile=./log4j2.xml just before the canonical classname, pointing to your modified version of the log4j2.xml file.

    Note that on windows you must surround the value of the variable in double-quotes!

  2. In our experience, it seems that the java -cp command does not always work on Windows. Therefore, ensure you set the $env:CLASSPATH environment variable to include the names of the downloaded jar files, separated by a ;.

    Replace the ... with proper version numbers

    Don't forget to replace the ... in the -cp argument's values with proper version numbers, or running this will not work.

  3. Within the directory containing the downloaded jar files, run the java command providing the canonical classname of the EnrichmentReporter.

Run EnrichmentReporter from pre-compiled jar files
1
2
3
4
5
6
7
export ATLAN_BASE_URL=https://tenant.atlan.com  # (1)
export ATLAN_API_KEY=eyNnCJd2T9Y8fEsbdx...
export FILTER_BY=PREFIX
export PREFIX=default/snowflake/1234567890
java \
    -cp atlan-java-...-jar-with-dependencies.jar:atlan-java-samples-...-jar-with-dependencies.jar \
    com.atlan.samples.reporters.EnrichmentReporter  # (2)
  1. Export environment variables for the configuration you want to use. At a minimum:

    • ATLAN_BASE_URL giving the URL of your Atlan tenant
    • ATLAN_API_KEY giving the value of an API token with access to that tenant to load the assets

    Overriding the configuration

    Note that you can specify any optional overrides as environment variables before running the reporter in the same way. For example, you could use export FILTER_BY=PREFIX to export assets based on a specific prefix, and export PREFIX=... to specify that prefix.

    (Optional) If you want to override the logging settings, also add -Dlog4j.configurationFile=./log4j2.xml just before the canonical classname, pointing to your modified version of the log4j2.xml file.

  2. Within the directory containing the downloaded jar files, run the java command providing at least two arguments:

    1. The two jar files, separated by a :.
    2. The canonical classname of the EnrichmentReporter.

    Replace the ... with proper version numbers

    Don't forget to replace the ... in the -cp argument's values with proper version numbers, or running this will not work.

You can also run the EnrichmentReporter from a clone of the samples repository:

Repo

If you want to change the sample code provided to include additional logic, this is your best option. You have full control over all of the code this way and can make any changes you like before running it.

Additional requirements

You will need a JDK installed to run this way, not only the JRE, as the code must be compiled locally before it can be executed. The computer on which you run the code must also have network access to be able to configure Gradle itself and download the necessary dependencies of the code from Maven Central.

Run EnrichmentReporter from a clone of the GitHub repo
export FILTER_BY=PREFIX
export PREFIX=default/snowflake/1234567890
./gradlew EnrichmentLoader # (1)!
  1. Within the root directory of your local clone of the repo, run the Gradle task for the EnrichmentReporter.

    Overriding the configuration

    Note that you can specify any optional overrides as environment variables before running the loader. You can either do this in the same line where you run the command, or by exporting them first for example using export FILTER_BY=PREFIX and export PREFIX=default/snowflake/1234567890.

Example output
18:01:43.172 [main] INFO  com.atlan.samples.reporters.EnrichmentReporter - Creating Excel file (streaming)...
18:01:46.194 [main] INFO  com.atlan.samples.reporters.EnrichmentReporter - Retrieving 4 glossaries from https://tenant.atlan.com in batches of: 50
18:01:46.502 [main] INFO  com.atlan.samples.reporters.EnrichmentReporter -  ... processed 0/4 (0%)
18:01:46.715 [main] INFO  com.atlan.samples.reporters.EnrichmentReporter - Retrieving 8 terms from https://tenant.atlan.com in batches of: 50
18:01:47.548 [main] INFO  com.atlan.samples.reporters.EnrichmentReporter -  ... processed 0/8 (0%)
18:01:48.794 [main] INFO  com.atlan.samples.reporters.EnrichmentReporter - Retrieving 133 asset details from https://tenant.atlan.com in batches of: 50
18:01:52.731 [main] INFO  com.atlan.samples.reporters.EnrichmentReporter -  ... processed 0/133 (0%)
18:01:56.402 [main] INFO  com.atlan.samples.reporters.EnrichmentReporter -  ... processed 50/133 (38%)
18:01:58.325 [main] INFO  com.atlan.samples.reporters.EnrichmentReporter -  ... processed 100/133 (75%)
18:01:58.751 [main] INFO  com.atlan.samples.reporters.EnrichmentReporter - Retrieving 4 categories from https://tenant.atlan.com in batches of: 50
18:01:59.046 [main] INFO  com.atlan.samples.reporters.EnrichmentReporter -  ... processed 0/4 (0%)
18:01:59.083 [main] INFO  com.atlan.samples.reporters.EnrichmentReporter - Writing report to file: enrichment-report-20230906-170144-883.xlsx
More details
  • The reporter will always extract glossaries, categories and terms.
  • The reporter will extract only those data assets that match your defined FILTER_BY criteria. If there are many assets, you will see multiple lines (one per batch) about the extraction of the data assets.
  • The final line will always tell you the name of the Excel file that it produced.