An introductory walkthrough¶
Atlan University
You might also like our Atlan Platform Essentials certification.
Not sure where to start? Allow us to introduce Atlan development through example.1
Setting up¶
We strongly recommend using one of our SDKs to simplify the development process. As a first step, set one up:
The SDK is available on Maven Central, ready to be included in your project:
repositories {
mavenCentral()
}
dependencies {
implementation("com.atlan:atlan-java:+") // (1)
testRuntimeOnly("ch.qos.logback:logback-classic:1.2.11") // (2)
}
- Include the latest version of the Java SDK in your project as a dependency. You can also give a specific version instead of the
+
, if you'd like. - The Java SDK uses slf4j for logging purposes. You can include logback as a simple binding mechanism to send any logging information out to your console (standard out).
Set two values on the static Atlan
class:
AtlanLiveTest.java | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 |
|
- Provide your Atlan tenant URL to the
setBaseUrl
method. You can also read the value from an environment variable, as in this example. - Provide your API token to the
setApiToken
method. You can also read the value from another environment variable, as in this example. - You can then start writing some actual code to run within a static
main
method. (We'll show some examples of this further below.)
Set up logging for SDK
You can also checkout to the advanced configuration section of the SDK to learn about how to set up logging.
Don't forget to give permissions
If you want to be able to access existing metadata with an API token, don't forget that you need to assign one or more personas to the API token that grant it access to metadata.
The SDK is available on PyPI. You can use pip to install it as follows:
pip install pyatlan
Provide two values to create an Atlan client:
atlan_live_test.py | |
---|---|
1 2 3 4 5 6 |
|
- Provide your Atlan tenant URL to the
base_url
parameter. (You can also do this through environment variables.) - Provide your API token to the
api_key
parameter. (You can also do this through environment variables.)
Set up logging for SDK
You can also checkout to the advanced configuration section of the SDK to learn about how to set up logging.
Don't forget to give permissions
If you want to be able to access existing metadata with an API token, don't forget that you need to assign one or more personas to the API token that grant it access to metadata.
The SDK is available on Maven Central, ready to be included in your project:
repositories {
mavenCentral()
}
dependencies {
implementation("com.atlan:atlan-java:+") // (1)
implementation("io.github.microutils:kotlin-logging-jvm:3.0.5") // (2)
implementation("org.slf4j:slf4j-simple:2.0.7")
}
- Include the latest version of the Java SDK in your project as a dependency. You can also give a specific version instead of the
+
, if you'd like. - The Java SDK uses slf4j for logging purposes. You can include slf4j-simple as a simple binding mechanism to send any logging information out to your console (standard out), along with the
kotlin-logging-jvm
microutil.
Set two values on the static Atlan
class:
AtlanLiveTest.kt | |
---|---|
1 2 3 4 5 6 |
|
- Provide your Atlan tenant URL to the
setBaseUrl
method. You can also read the value from an environment variable, as in this example. - Provide your API token to the
setApiToken
method. You can also read the value from another environment variable, as in this example.
Set up logging for SDK
You can also checkout to the advanced configuration section of the SDK to learn about how to set up logging.
Don't forget to give permissions
To access existing metadata with an API token , don't forget you need to assign one or more personas to the token. These are necessary to grant it access to metadata.
Coming soon
Retrieving metadata¶
Now that you have an SDK installed and configured, you are ready to code! Before we jump straight to code, though, let's first introduce some key concepts in Atlan:
What is an asset?¶
In Atlan, we refer to all objects that provide context to your data as assets.
classDiagram
class Table {
certificateStatus
announcementType
columnCount
rowCount
...
atlanSchema()
columns()
}
class Column {
certificateStatus
announcementType
dataType
isNullable
...
table()
}
Table *-- Column
Each type of asset in Atlan has a set of:
-
Properties, such as:
- Certificates
- Announcements
-
Relationships to other assets, such as:
- Schema child tables
- Table parent schema
- Table child columns
- Column parent table
Assets are instances of metadata.
In an object-oriented programming sense, think of an asset as an instance of a class. The structure of an asset (the class itself, in this analogy) is defined by something called a type definition, but that's for another day.
So as you can see:
- There are many different kinds of assets: tables, columns, schemas, databases, business intelligence dashboards, reports, and so on.
- Assets inter-relate with each other: a table has a parent schema and child columns, a schema has a parent database and child tables, and so on.
- Different kinds of assets have some common properties (like certificates) and other properties that are unique to that kind of asset (like a columnCount that only exists on tables, not on schemas or databases).
When you know the asset¶
When you already know which asset you want to retrieve, you can read it from Atlan using one of its identifiers. We'll discuss these in more detail as part of updates, but for now you can think of them as:
guid
¶
is a primary key for an asset: completely unique, but meaningless by itself
qualifiedName
¶
is a business key for an asset: unique for a given kind of asset, and interpretable
Retrieve an asset (AtlanLiveTest.java) | |
---|---|
8 9 10 11 |
|
- You can retrieve an asset using the static
get()
method on any asset type, providing either the asset's GUID orqualifiedName
. (Each asset type is its own unique class in the SDK.)
Retrieve an asset (atlan_live_test.py) | |
---|---|
7 8 9 10 11 12 13 14 |
|
- You can retrieve an asset using the
asset.get_by_guid()
method on the Atlan client, providing both the type of asset you expect to retrieve and its GUID. (Each asset type is its own unique class in the SDK.) - You can also retrieve an asset using the
asset.get_by_qualified_name()
method on the Atlan client, providing the type of asset you expect to retrieve and itsqualified_name
. (Each asset type is its own unique class in the SDK.)
Retrieve an asset (AtlanLiveTest.kt) | |
---|---|
6 7 |
|
- You can retrieve an asset using the static
get()
method on any asset type, providing either the asset's GUID orqualifiedName
. (Each asset type is its own unique class in the SDK.)
Coming soon
Note that the response is strongly typed:
- If you are retrieving a table, you will get a table back (as long as it exists).
- You do not need to figure out what properties or relationships exist on a table - the
Table
class defines them for for you already.
In any modern IDE, this means you have type-ahead support for retrieving the properties and relationships from the table
variable. You can also refer to the types reference in this portal for full details of every kind of asset.
Retrieval by identifier can be more costly than you might expect
Even though you are retrieving an asset by an identifier, this can be more costly than you might expect. Retrieving an asset in this way will:
- Retrieve all its properties and their values
- Retrieve all its relationships
Imagine the asset you are retrieving has 100's or 1000's of these. If you only care about its certificate and any owners, you will be retrieving far more information than you need.
When you need to find it first¶
What if you don't know the asset's identifier? Or what if you want to retrieve many assets with some common set of characteristics? In that case, you can search for the asset(s).
For example, imagine you want to find all tables named MY_TABLE
:
Search for an asset (AtlanLiveTest.java) | |
---|---|
8 9 10 11 12 13 |
|
- You can search all active assets of a given type using the
select()
static method. - Chain onto this method any conditions you want to apply to the search, in this example a
where
clause that will match any table whose name equalsMY_TABLE
. - You can then stream the results from this search and process them as any standard Java stream: filter them, limit them, apply an action to each one, and so on. The results of the search are automatically paged and each page is lazily-fetched.
Search for an asset (atlan_live_test.py) | |
---|---|
7 8 9 10 11 12 13 14 15 16 17 18 |
|
-
You can search all active assets of a given type by creating a
FluentSearch()
object and chaining twowhere
clauses:FluentSearch.asset_type
to limit to a particular kind of assetFluentSearch.active_assets()
to limit to only active assets of that kind
-
Chain onto this method any conditions you want to apply to the search, in this example a
where
clause that will match any table whose name equalsMY_TABLE
. - You can then convert this object into a search request using the
to_request()
method. - Run the request using the
asset.search()
method on the Atlan client, and you can directly iterate through the search results. The results of the search are automatically paged and each page is lazily-fetched.
Search for an asset (AtlanLiveTest.kt) | |
---|---|
6 7 8 9 |
|
- You can search all active assets of a given type using the
select()
static method. - Chain onto this method any conditions you want to apply to the search, in this example a
where
clause that will match any table whose name equalsMY_TABLE
. - You can then stream the results from this search and process them as any standard Kotlin stream: filter them, limit them, apply an action to each one, and so on. The results of the search are automatically paged and each page is lazily-fetched.
Coming soon
By default, the search only returns minimal information about each asset (only its identifiers). However, you can also specify what information you want.
For example, if you want to know the certificate of the asset you only need to tack that onto the query:
Search for an asset (AtlanLiveTest.java) | |
---|---|
8 9 10 11 12 13 14 |
|
- Only this line differs from the original query. You can chain as many
includeOnResults
calls as you want to specify the properties and relationships you want to retrieve for matching assets.
Search for an asset (atlan_live_test.py) | |
---|---|
7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
- Only this line differs from the original query. You can chain as many
include_on_results
calls as you want to specify the properties and relationships you want to retrieve for matching assets.
Search for an asset (AtlanLiveTest.kt) | |
---|---|
6 7 8 9 10 |
|
- Only this line differs from the original query. You can chain as many
includeOnResults
calls as you want to specify the properties and relationships you want to retrieve for matching assets.
Coming soon
Also gives the best performance
Searching not only allows you to find an asset without knowing its identifier, it also improves retrieval performance. You no longer retrieve information you don't need — you can specify precisely the properties and relationships you want.
Updating metadata¶
If all you want to do is check or report on metadata, you should have a starting point from the information above.
Or, now that you've found an asset of interest, maybe you want to update the asset with additional metadata?
Once again, before we jump to code, let's first understand some key concepts about how Atlan handles updates:
Importance of identifiers¶
Most operations on assets are upserts, that is, they could either create (insert) or update a given asset.
How do you know which is going to happen?
To answer this question, you need to understand how Atlan uniquely identifies each asset.
Recall earlier we discussed asset's different identifiers in Atlan. Every asset in Atlan has at least the following two unique identifiers. These are both mandatory for every asset, so no asset can exist without these:
GUID¶
Atlan uses globally-unique identifiers (GUIDs) to uniquely identify each asset, globally. They look something like this:
17f0356e-75f6-4e0b-8b05-32cebe8cd953
As the name implies, GUIDs are:
- Globally unique (across all systems).
They are:
- Generated in a way that makes it nearly impossible for anything else to ever generate that same ID.2
Note that this means the GUID itself is not:
- Meaningful or capable of being interpreted in any way
qualifiedName¶
Atlan uses qualifiedName
s to uniquely identify assets based on their characteristics. They look something like this:
default/snowflake/1234567890/DB/SCHEMA
Qualified names are not:
- Globally unique (across all systems).
Instead, they are:
- Consistently constructed in a meaningful way, making it possible for them to be reconstructed.
Note that this means the qualifiedName
is:
- Meaningful and capable of being interpreted
How these impact updates
Since they are truly unique, operations that include a GUID will only update an asset, not create one. Conversely, operations that take a qualifiedName
can:
- Create an asset, if no exactly-matching
qualifiedName
is found in Atlan. - Update an asset, if an exact-match for the
qualifiedName
is found in Atlan.
These operations also require a typeName
, so that if creation does occur the correct type of asset is created.
Unintended consequences of this behavior
Be careful when using operations with only the qualifiedName
. You may end up creating assets when you were only expecting them to be updated or to fail if they did not already exist. This is particularly true when you do not give the exact, case-sensitive qualifiedName
of an asset. a/b/c/d
is not the same as a/B/c/d
when it comes to qualifiedName
s.
Perhaps this leaves you wondering: why have a qualifiedName
at all?
The qualifiedName
's purpose is to identify what is a unique asset. Many different tools might all have information about that asset. Having a common "identity" means that many different systems can each independently construct its identifier the same way.
- If a crawler gets table details from Snowflake it can upsert based on those identity characteristics in Atlan. The crawler will not create duplicate tables every time it runs. This gives idempotency.
- Looker knows the same identity characteristics for the Snowflake tables and columns. So if you get details from Looker about the tables it uses for reporting, you can link them together in lineage. (Looker can construct the same identifier for the table as Snowflake itself.)
These characteristics are not possible using GUIDs alone.
Limit to changes only¶
Now that you understand the nuances of identifiers, let's look at how you can update metadata in Atlan.
In general, you only need to send changes to Atlan. You do not need to send an entire asset each time you want to make changes to it. For example, imagine you want to mark a table as certified but do not want to change anything else (its name, description, owner details, and so on):
Update an asset (AtlanLiveTest.java) | |
---|---|
8 9 10 11 12 13 14 15 |
|
- You can update an asset without first looking the asset up, if you know (can construct) its identifying
qualifiedName
. Using theupdater()
static method on any asset type, you pass in (typically) thequalifiedName
and name of the asset. This returns a builder onto which you can then chain any updates. - You can then chain onto the returned builder as many updates as you want. In this example, we change the certificate status to
VERIFIED
. - At the end of your chain of updates, you need to build the builder (into an object, in-memory).
- And then, finally, you need to
.save()
that object to persist those changes in Atlan. The response will contain details of the change: whether the asset was created, updated, or nothing happened because the asset already had those changes.
Update an asset (atlan_live_test.py) | |
---|---|
7 8 9 10 11 12 13 14 15 |
|
- You can update an asset without first looking the asset up, if you know (can construct) its identifying
qualified_name
. Using theupdater()
class method on any asset type, you pass in (typically) thequalified_name
and name of the asset. - You can then add onto the returned object as many updates as you want. In this example, we change the certificate status to
VERIFIED
. - And then, finally, you need to
client.asset.save()
that object to persist those changes in Atlan. The response will contain details of the change: whether the asset was created, updated, or nothing happened because the asset already had those changes.
Update an asset (AtlanLiveTest.kt) | |
---|---|
6 7 8 9 10 11 12 |
|
- You can update an asset without first looking the asset up, if you know (can construct) its identifying
qualifiedName
. Using theupdater()
static method on any asset type, you pass in (typically) thequalifiedName
and name of the asset. This returns a builder onto which you can then chain any updates. - You can then chain onto the returned builder as many updates as you want. In this example, we change the certificate status to
VERIFIED
. - At the end of your chain of updates, you need to build the builder (into an object, in-memory).
- And then, finally, you need to
.save()
that object to persist those changes in Atlan. The response will contain details of the change: whether the asset was created, updated, or nothing happened because the asset already had those changes.
Coming soon
Atlan will handle idempotency
By sending only the changes you want to apply, Atlan can make idempotent updates.
- Atlan will only attempt to update the asset with the changes you send.
- Atlan leaves any existing metadata on the asset as-is.
- If the asset already has the metadata values you are sending, Atlan does nothing. It will not even update audit details like the last update timestamp, and is thus idempotent.
Bulk changes¶
What if you want to make changes to many assets, as efficiently as possible?
In that case, you are best making use of a combination of SDK functionality — search, trim, and batch:
Bulk changes (AtlanLiveTest.java) | |
---|---|
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
|
- Start by initializing a batch. Through this batch, we can automatically queue up and bulk-upsert assets — in this example, 20 at a time.
- Then use the search pattern we discussed earlier to find all the assets you want to update.
- Be sure to include any details you might need to make a decision about whether to update the asset or not (and what to update it with).
- It is a good idea to set the page size for search results to match the asset batch size, for maximal efficiency.
- When you stream the results of the search, you can send an optional boolean parameter. If set to
true
, this will stream the pages of results in parallel (across multiple threads), improving throughput. - When you then operate on each search result, you can
add()
any updates directly into the batch you created earlier. The batch itself will handle saving these to Atlan when a sufficient number have been queued up (20, in this example). - To make an update to a search result, first call
trimToRequired()
against the result. This will pare down the asset to its minimal required attributes and return a builder. You can then chain as many updates onto this builder as you want, keeping to the pattern we discussed above — ensuring you are sending only changes. - You must
flush()
the batch outside of any loop where you've added assets into it. This ensures any final remaining elements in the batch are still sent to Atlan, even if the batch is not "full". - Finally, from the batch you can retrieve the minimal details about any assets it created or updated.
Bulk changes (atlan_live_test.py) | |
---|---|
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
- Start by initializing a batch. Through this batch, we can automatically queue up and bulk-upsert assets — in this example, 20 at a time.
- Then use the search pattern we discussed earlier to find all the assets you want to update.
- Be sure to include any details you might need to make a decision about whether to update the asset or not (and what to update it with).
- It is a good idea to set the page size for search results to match the asset batch size, for maximal efficiency.
- When you then operate on each search result, first call
trim_to_required()
against the result. This will pare down the asset to its minimal required attributes. You can then add as many updates onto this object as you want, keeping to the pattern we discussed above — ensuring you are sending only changes. - You can then
add()
any updated objects directly into the batch you created earlier. The batch itself will handle saving these to Atlan when a sufficient number have been queued up (20, in this example). - You must
flush()
the batch outside of any loop where you've added assets into it. This ensures any final remaining elements in the batch are still sent to Atlan, even if the batch is not "full". - Finally, from the batch you can retrieve the minimal details about any assets it created or updated.
Bulk changes (AtlanLiveTest.kt) | |
---|---|
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
- Start by initializing a batch. Through this batch, we can automatically queue up and bulk-upsert assets — in this example, 20 at a time.
- Then use the search pattern we discussed earlier to find all the assets you want to update.
- Be sure to include any details you might need to make a decision about whether to update the asset or not (and what to update it with).
- It is a good idea to set the page size for search results to match the asset batch size, for maximal efficiency.
- When you stream the results of the search, you can send an optional boolean parameter. If set to
true
, this will stream the pages of results in parallel (across multiple threads), improving throughput. - When you then operate on each search result, you can
add()
any updates directly into the batch you created earlier. The batch itself will handle saving these to Atlan when a sufficient number have been queued up (20, in this example). - To make an update to a search result, first call
trimToRequired()
against the result. This will pare down the asset to its minimal required attributes and return a builder. You can then chain as many updates onto this builder as you want, keeping to the pattern we discussed above — ensuring you are sending only changes. - You must
flush()
the batch outside of any loop where you've added assets into it. This ensures any final remaining elements in the batch are still sent to Atlan, even if the batch is not "full". - Finally, from the batch you can retrieve the minimal details about any assets it created or updated.
Coming soon
Where to go from here¶
Now that you know the basics, it's up to you to delve further into whichever areas you like. You can search (upper-right) or use the top-level menu:
-
Common tasks
Common operations on assets, that are available across all assets.
-
Asset-specific
Operations that are specific to certain assets.
-
Governance structures
Operations dealing with governance structures, rather than assets.
-
Samples
Real code samples our customers use to solve particular use cases.
-
Searching
Delve deep into searching and aggregating metadata.
-
Events
Delve deep into the details of the events Atlan triggers.
-
Note that this is intentionally kept as simple as possible. The walkthrough is not intended to be exhaustive. Where possible, we have cross-referenced other detailed examples elsewhere in the site. ↩
-
There are orders of magnitude lower chances of GUIDs conflicting with each other than there are grains of sand on the planet. (And generating them does not rely on a central ID-assigning registry.) ↩