Assets¶
In Atlan, we refer to all objects that provide context to your data as assets.
classDiagram
class Table {
certificateStatus
announcementType
columnCount
rowCount
...
atlanSchema()
columns()
}
class Column {
certificateStatus
announcementType
dataType
isNullable
...
table()
}
Table *-- Column
Each type of asset in Atlan has a set of:
-
Properties, such as:
- Certificates
- Announcements
-
Relationships to other assets, such as:
- Schema child tables
- Table parent schema
- Table child columns
- Column parent table
Assets are instances of metadata.
In an object-oriented programming sense, think of an asset as an instance of a class.
Type definitions¶
Type definitions (or typedefs for short) describe the properties and relationships that each different type of asset can have in Atlan.
Type definitions are the structure for metadata
In an object-oriented programming sense, think of a type definition as the class itself. They describe the underlying data model of Atlan.
For example:
- The model for database tables in Atlan is defined by the
Table
typedef. - The
Table
typedef describes characteristics unique to database tables, such as column counts and row counts. - The
Table
typedef inherits from anAsset
typedef. (As do most other objects in Atlan.) - The
Asset
typedef describes characteristics that apply to all of these objects, such as certificates and announcements.
classDiagram
class Asset {
<<abstract>>
name
qualifiedName
certificateStatus
certificateStatusMessage
announcementType
announcementTitle
announcementMessage
...
assignedTerms()
}
class Table {
columnCount
rowCount
atlanSchema()
columns()
}
class Column {
dataType
isNullable
table()
}
Asset <|-- Table : extends
Asset <|-- Column : extends
Special assets¶
While all assets follow the above principles, there are two types of assets to be aware of that have further specific uses in Atlan.
Connections¶
classDiagram
class Connection {
...
connector[Name|Type]
...
}
Connections play several important roles:
- They form the basis of Atlan's access control policies.
- Their
connectorName
property (renamedconnectorType
in some SDKs) decides the icon Atlan will use for assets contained within the connection.
Processes¶
Processes form the basis for Atlan's data lineage. They define how data inputs (sources) are translated into data outputs (targets).
Without a process asset to link these upstream and downstream assets, you cannot have data lineage in Atlan.
graph LR
s1[(Source 1)]
s2[(Source 2)]
s3[(Source 3)]
t1[(Target 1)]
t2[(Target 2)]
p([Process])
s1 & s2 & s3-->|upstream|p-->|downstream|t1 & t2
Identifiers¶
Most operations dealing with assets operate as upserts, that is, they could either create (insert) or update a given asset.
How do you know which is going to happen?
To answer this question, you need to understand how Atlan uniquely identifies each asset.
Every asset in Atlan has at least the following two unique identifiers. These are both mandatory for every asset, so no asset can exist without these:
GUID¶
Atlan uses globally-unique identifiers (GUIDs) to uniquely identify each asset, globally. They look something like this:
17f0356e-75f6-4e0b-8b05-32cebe8cd953
As the name implies, GUIDs are:
- Globally unique (across all systems).
They are:
- Generated in a way that makes it nearly impossible for anything else to ever generate that same ID.1
Note that this means the GUID itself is not:
- Meaningful or capable of being interpreted in any way
Used strictly for updates
Since they are truly unique, operations that include a GUID will only update an asset, not create one.
When creating an asset, do not provide a GUID. Atlan will generate a GUID for your new asset automatically.
More details
GUIDs guarantee uniqueness full stop (even across systems, applications and vendors). As a key, it is unique even if that asset goes into another system, or even another instance of Atlan.
But it isn't very human-readable or meaningful in any way. And it wouldn't be useful to integrate between systems that each have some knowledge about an asset because neither could know in advance how the other would uniquely refer to that asset. (If they both used GUIDs for the key of the asset, both systems would use different GUIDs — the GUID wouldn't provide a way to match or "join" the metadata.)
qualifiedName¶
Atlan uses qualifiedName
s to uniquely identify assets based on their characteristics. They look something like this:
default/snowflake/1234567890/DB/SCHEMA
Qualified names are not:
- Globally unique (across all systems).
Instead, they are:
- Consistently constructed in a meaningful way, making it possible for them to be reconstructed.
Note that this means the qualifiedName
is:
- Meaningful and capable of being interpreted
Enable an upsert semantic
Operations that take a qualifiedName
can:
- Create an asset, if no (exactly-)matching
qualifiedName
is found in Atlan. - Update an asset, if an exact-match for the
qualifiedName
is found in Atlan.
These operations also require a typeName
to accompany the qualifiedName
, so that if a creation does need to occur the correct type of asset is created.
Unintended consequences of this behavior
Be careful when using operations with only the qualifiedName
— you may end up creating assets when you were only expecting them to be updated (or fail). This is particularly true when you do not give the exact, case-sensitive qualifiedName
of an asset. a/b/c/d
is not the same as a/B/c/d
when it comes to qualifiedName
s.
More details
The qualifiedName
's purpose is to identify what is a unique asset, but many different tools might all have info about that unique asset. So having a common "identity" that can be constructed for it (like database name, schema, table) means that asset can still be uniquely referred to by many different systems in the same way — by each system knowing the combination of characteristics that uniquely identify the asset.
Hence if you get column details from Snowflake you can upsert based on those identity characteristics in Atlan (not create duplicate columns every time a crawler runs).
And if you get model details from Looker, you can link those in lineage — because Looker knows those same identity characteristics for the Snowflake tables and columns.
-
There are orders of magnitude lower chances of GUIDs conflicting with each other than there are grains of sand on the planet. (And generating them does not rely on a central ID-assigning registry.) ↩