Skip to content

Upserts and unique identifiers

Most operations dealing with assets operate as upserts, that is, they could either create (insert) or update a given asset.

How do you know which is going to happen?

To answer this question, you need to understand how Atlan uniquely identifies each asset.

GUID

Atlan uses globally-unique identifiers (GUIDs) to uniquely identify each asset, globally. They look something like this: 17f0356e-75f6-4e0b-8b05-32cebe8cd953.

As the name implies, these are globally unique:

  • They are generated in a way that makes it nearly impossible for anything else to ever generate that same ID.1
  • They do not rely on a central ID-assigning registry.

Note that this means, by definition, the GUID itself is not:

  • Some hashed form of another identifier
  • Meaningful or capable of being interpreted in any way

Since they are truly unique, operations that include a GUID will update that asset, not create one.

More details

GUIDs guarantee uniqueness full stop (even across systems, applications and vendors). As a key, it is unique even if that asset goes into another system, or even another instance of Atlan.

But it isn't very human-readable or meaningful in any way. And it wouldn't be useful to integrate between systems that each have some knowledge about an asset because neither could know in advance how the other would uniquely refer to that asset. (If they both used GUIDs for the key of the asset, both systems would use different GUIDs — they wouldn't provide a way to match or "join" the metadata.)

qualifiedName

Conversely, Atlan uses qualifiedNames to uniquely identify assets based on their intrinsic characteristics. They look something like this: default/snowflake/1234567890/DB/SCHEMA/TABLE

They are not:

  • Globally unique (across all systems).

Instead, they are:

  • Consistently constructed in a meaningful way, capable of being interpreted

Operations that take a qualifiedName, without a GUID, will therefore:

  • Create an asset, if no (exactly-)matching qualifiedName is found in Atlan.
  • Update an asset, if an exact-match for the qualifiedName is found in Atlan.

These operations also require a typeName to accompany the qualifiedName, so that if a creation does need to occur the correct type of asset is created.

Be careful with unintended consequences of this behavior

Be careful when using operations with only the qualifiedName — you may end up creating assets when you were only expecting them to be updated (or fail). This is particularly true when you do not give the exact, case-sensitive qualifiedName of an asset. a/b/c/d is not the same as a/B/c/d when it comes to qualifiedNames.

More details

The qualifiedName's purpose is to identify what is a unique asset, but many different tools might all have info about that unique asset. So having a common "identity" that can be constructed for it (like database name, schema, table) means that asset can still be uniquely referred to by many different systems in the same way — by each system knowing the combination of characteristics that uniquely identify the asset.

Hence if you get column details from Snowflake you can upsert based on those identity characteristics in Atlan (not create duplicate columns every time a crawler runs).

And if you get model details from Looker, you can link those in lineage — because Looker knows those same identity characteristics for the Snowflake tables and columns.


  1. There are orders of magnitude lower chances of GUIDs conflicting with each other than there are grains of sand on the planet.