EnrichmentReporter¶
Atlan University
See it in action in our code samples course.
This content has moved!
We have moved our samples to a separate, dedicated site: https://solutions.atlan.com.
This document is no longer being maintained.
Extracts metadata about assets and their enrichment, producing an Excel workbook consisting of 4 worksheets:
Glossary enrichment
contains details about each of the glossariesCategory enrichment
contains details about each of the categories, across all glossariesTerm enrichment
contains details about each term, across all glossariesAsset enrichment
contains details about all non-glossary assets that fit the filter criteria (see below)
Filter criteria¶
To filter the assets to include in the Asset enrichment
worksheet, you must choose one of the following criteria:
GROUP
(the default) will include all assets that have at least one group owner definedATLAN_TAG
will include all assets that are assigned a specified Atlan tagPREFIX
will include all assets whose qualifiedName starts with a specified string
Configuration¶
You can configure the following options for this reporter as environment variables:
ATLAN_BASE_URL
¶
URL for your Atlan tenant (for example: https://tenant.atlan.com
).
ATLAN_API_KEY
¶
API token to use when accessing Atlan.
FILTER_BY
¶
Defines which filter criteria to use for the non-glossary assets: GROUP
, ATLAN_TAG
, PREFIX
. (Will default to GROUP
if not specified.)
ATLAN_TAG
¶
Defines which Atlan tag you want to report on (only assets with this Atlan tag will be included in the report). Note that this only has any effect when FILTER_BY
is set to ATLAN_TAG
. (Defaults to empty.)
PREFIX
¶
Defines the string that all assets' qualifiedNames must start with to be reported. Note that this only has any effect when FILTER_BY
is set to PREFIX
. (Defaults to empty.)
INCLUDE_FIELD_LEVEL
¶
Specifies whether to include details of field-level assets like columns (true
). By default (false
), the reporter will only report on container-level objects (such as tables) and summarize field-level details through counts.
DIRECT_ATLAN_TAGS_ONLY
¶
Specifies whether to include only direct Atlan tags on each asset (true
). By default (false
), all Atlan tags — both direct and propagated — are listed for each asset. If you are using this as input for the EnrichmentLoader, you will want to set this to true
.
DELIMITER
¶
Defines the character to use to separate values in multi-valued cells (like assigned terms, Atlan tags). If you anticipate the default comma (,
) to appear in any of these multi-valued objects' names, you should change this to some other character.
BATCH_SIZE
¶
Defines how many records the API calls attempt to retrieve on each request. (Defaults to 50
if not specified.)
FILE_PREFIX
¶
Defines what the Excel file's name will start with. (Defaults to enrichment-report
if not specified.)
REGION
¶
Defines the AWS region of the S3 bucket where you want to write the Excel file output.
BUCKET
¶
Defines the S3 bucket where you want the reporter to store the Excel file output. (Note that there is no default value for this, so this must be sent for the utility to run via a Lambda function and produce consumable output — if blank (default) the reporter produces a local Excel file.)
Details can be loaded using EnrichmentLoader
Note: the details extracted by this reporter can be loaded back into Atlan from the same spreadsheet using the EnrichmentLoader.
Lambda configuration¶
To use this reporter as an AWS Lambda function, you can configure the same variables as above, but send them in a JSON structure:
{
"FILTER_BY": "GROUP",
"ATLAN_TAG": "(name of Atlan tag to filter by)",
"PREFIX": "(qualifiedName prefix to filter by)",
"INCLUDE_FIELD_LEVEL": "false",
"DIRECT_ATLAN_TAGS_ONLY": "false",
"DELIMITER": ",",
"BATCH_SIZE": "50",
"FILE_PREFIX": "enrichment-report",
"REGION": "ap-south-1",
"BUCKET": ""
}
Logging¶
By default, only messages at INFO
and above will be logged (to the console). You can change this level
by copying the main/resources/log4j2.xml
from the source repo and modifying it to give:
- more detail (
DEBUG
will log every API request and response, including their full payloads) - less detail (
WARN
will only print warnings and errors,ERROR
only errors).
Running the reporters locally¶
You can run the reporters locally on any machine with network access to your Atlan tenant.
When run in this mode, the reporter will read the input Excel file locally on the machine running the reporter. If you want to change any of the settings, you can simply prepend the command with the environment variables.
You can run the EnrichmentReporter
by downloading the pre-compiled jar files for both:
With this approach you only need a JRE installed, as you will be using pre-compiled code.
Requires at least Java 11, though 17+ is recommended
You must have a Java 11 (or higher) JRE installed to use the samples. If possible, we recommend using Java 17 (or higher) for significantly improved performance.
Run EnrichmentReporter from pre-compiled jar files | |
---|---|
1 2 3 4 5 6 7 |
|
-
Export environment variables for the configuration you want to use. At a minimum:
ATLAN_BASE_URL
giving the URL of your Atlan tenantATLAN_API_KEY
giving the value of an API token with access to that tenant to load the assets
Overriding the configuration
Note that you can specify any optional overrides as environment variables before running the reporter in the same way. For example, you could use
export FILTER_BY=PREFIX
to export assets based on a specific prefix, andexport PREFIX=...
to specify that prefix.(Optional) If you want to override the logging settings, also add
-Dlog4j.configurationFile=./log4j2.xml
just before the canonical classname, pointing to your modified version of thelog4j2.xml
file. -
Within the directory containing the downloaded jar files, run the
java
command providing at least two arguments:- The two jar files, separated by a
:
. - The canonical classname of the
EnrichmentReporter
.
Replace the
...
with proper version numbersDon't forget to replace the
...
in the-cp
argument's values with proper version numbers, or running this will not work. - The two jar files, separated by a
Run EnrichmentReporter from pre-compiled jar files | |
---|---|
1 2 3 4 5 6 |
|
-
Export environment variables for the configuration you want to use. At a minimum:
ATLAN_BASE_URL
giving the URL of your Atlan tenantATLAN_API_KEY
giving the value of an API token with access to that tenant to load the assets
Overriding the configuration
Note that you can specify any optional overrides as environment variables before running the reporter in the same way. For example, you could use
$env:FILTER_BY="PREFIX"
to export assets based on a specific prefix, and$env$PREFIX="..."
to specify that prefix.(Optional) If you want to override the logging settings, also add
-Dlog4j.configurationFile=./log4j2.xml
just before the canonical classname, pointing to your modified version of thelog4j2.xml
file.Note that on windows you must surround the value of the variable in double-quotes!
-
In our experience, it seems that the
java -cp
command does not always work on Windows. Therefore, ensure you set the$env:CLASSPATH
environment variable to include the names of the downloaded jar files, separated by a;
.Replace the
...
with proper version numbersDon't forget to replace the
...
in the-cp
argument's values with proper version numbers, or running this will not work. -
Within the directory containing the downloaded jar files, run the
java
command providing the canonical classname of theEnrichmentReporter
.
Run EnrichmentReporter from pre-compiled jar files | |
---|---|
1 2 3 4 5 6 7 |
|
-
Export environment variables for the configuration you want to use. At a minimum:
ATLAN_BASE_URL
giving the URL of your Atlan tenantATLAN_API_KEY
giving the value of an API token with access to that tenant to load the assets
Overriding the configuration
Note that you can specify any optional overrides as environment variables before running the reporter in the same way. For example, you could use
export FILTER_BY=PREFIX
to export assets based on a specific prefix, andexport PREFIX=...
to specify that prefix.(Optional) If you want to override the logging settings, also add
-Dlog4j.configurationFile=./log4j2.xml
just before the canonical classname, pointing to your modified version of thelog4j2.xml
file. -
Within the directory containing the downloaded jar files, run the
java
command providing at least two arguments:- The two jar files, separated by a
:
. - The canonical classname of the
EnrichmentReporter
.
Replace the
...
with proper version numbersDon't forget to replace the
...
in the-cp
argument's values with proper version numbers, or running this will not work. - The two jar files, separated by a
You can also run the EnrichmentReporter
from a clone of the samples repository:
If you want to change the sample code provided to include additional logic, this is your best option. You have full control over all of the code this way and can make any changes you like before running it.
Additional requirements
You will need a JDK installed to run this way, not only the JRE, as the code must be compiled locally before it can be executed. The computer on which you run the code must also have network access to be able to configure Gradle itself and download the necessary dependencies of the code from Maven Central.
export FILTER_BY=PREFIX
export PREFIX=default/snowflake/1234567890
./gradlew EnrichmentLoader # (1)!
-
Within the root directory of your local clone of the repo, run the Gradle task for the
EnrichmentReporter
.Overriding the configuration
Note that you can specify any optional overrides as environment variables before running the loader. You can either do this in the same line where you run the command, or by exporting them first for example using
export FILTER_BY=PREFIX
andexport PREFIX=default/snowflake/1234567890
.
18:01:43.172 [main] INFO com.atlan.samples.reporters.EnrichmentReporter - Creating Excel file (streaming)...
18:01:46.194 [main] INFO com.atlan.samples.reporters.EnrichmentReporter - Retrieving 4 glossaries from https://tenant.atlan.com in batches of: 50
18:01:46.502 [main] INFO com.atlan.samples.reporters.EnrichmentReporter - ... processed 0/4 (0%)
18:01:46.715 [main] INFO com.atlan.samples.reporters.EnrichmentReporter - Retrieving 8 terms from https://tenant.atlan.com in batches of: 50
18:01:47.548 [main] INFO com.atlan.samples.reporters.EnrichmentReporter - ... processed 0/8 (0%)
18:01:48.794 [main] INFO com.atlan.samples.reporters.EnrichmentReporter - Retrieving 133 asset details from https://tenant.atlan.com in batches of: 50
18:01:52.731 [main] INFO com.atlan.samples.reporters.EnrichmentReporter - ... processed 0/133 (0%)
18:01:56.402 [main] INFO com.atlan.samples.reporters.EnrichmentReporter - ... processed 50/133 (38%)
18:01:58.325 [main] INFO com.atlan.samples.reporters.EnrichmentReporter - ... processed 100/133 (75%)
18:01:58.751 [main] INFO com.atlan.samples.reporters.EnrichmentReporter - Retrieving 4 categories from https://tenant.atlan.com in batches of: 50
18:01:59.046 [main] INFO com.atlan.samples.reporters.EnrichmentReporter - ... processed 0/4 (0%)
18:01:59.083 [main] INFO com.atlan.samples.reporters.EnrichmentReporter - Writing report to file: enrichment-report-20230906-170144-883.xlsx
More details
- The reporter will always extract glossaries, categories and terms.
- The reporter will extract only those data assets that match your defined
FILTER_BY
criteria. If there are many assets, you will see multiple lines (one per batch) about the extraction of the data assets. - The final line will always tell you the name of the Excel file that it produced.