Searching for assets¶
Searching is a very flexible operation in Atlan. This also makes it a bit more complex to understand than the other operations. To encapsulate the full flexibility of Atlan's search, the SDK provides a dedicated IndexSearchRequest
object and a FluentSearch
class for configuring such a request using a fluent builder pattern.
More details on the power and flexibility of searching
See the dedicated Searching section of this site for more details on Atlan's search. This covers the various kinds of searches you can run, and the detailed attributes you can search against.
Build the query¶
To run a search in Atlan, you need to define the query using Elastic's structures. While you can always use Elastic's own structures to make use of its full power, for the vast majority of cases you may find it easier to use the helpers built-in to the SDK:
Build the query | |
---|---|
1 2 3 |
|
-
You can start building a query across all assets using the
select()
method on theassets
member of any client. You can chain as many conditions as you want:where()
is mandatory inclusionwhereNot()
is mandatory exclusionwhereSome()
for conditions where some of them must match
-
This helper provides a query that ensures results are active (not archived) assets.
Equivalent Elastic queryQuery beActive = TermQuery.of(m -> m .field("__state") .value(AtlanStatus.ACTIVE.getValue())) ._toQuery();
-
This condition provides a query that restricts results to a specific type of assets (glossary terms in this example).
Equivalent Elastic queryQuery beTerm = TermQuery.of(m -> m .field("__typeName.keyword") .value(GlossaryTerm.TYPE_NAME)) ._toQuery();
Build the query | |
---|---|
1 2 3 4 5 6 7 8 9 |
|
- You can start building a query using a
FluentSearch
object. You can have as many mandatory (where()
) conditions, mandatory exclusion (where_not()
) conditions, and set of conditions some of which must match (where_some()
) as you want. - This helper provides a query that ensures results are active (not archived) assets.
- This helper provides a query that restricts results to a specific type of assets (glossary terms in this example).
Build the query | |
---|---|
1 2 3 |
|
- You can start building a query across all assets using the
select()
method on theassets
member of any client. You can chain as many mandatory (where()
) conditions, mandatory exclusion (whereNot()
) conditions, and set of conditions some of which must match (whereSome()
) as you want. -
This helper provides a query that ensures results are active (not archived) assets.
Equivalent Elastic queryval beActive = TermQuery.of(m -> m .field("__state") .value(AtlanStatus.ACTIVE.getValue())) ._toQuery()
-
This helper provides a query that restricts results to a specific type of assets (glossary terms in this example).
Equivalent Elastic queryval beTerm = TermQuery.of(m -> m .field("__typeName.keyword") .value(GlossaryTerm.TYPE_NAME)) ._toQuery()
Query contents | |
---|---|
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
- A
bool
query combines together multiple conditions. - A
filter
clause exactly matches all of the conditions, without scoring (so can be slightly faster than other scoring-based combination mechanisms). - Term queries are generally used to exactly match values.
- The
__state
field will match the status of an asset in Atlan. - So in this example you will only match assets that are currently ACTIVE (not archived or soft-deleted).
-
You will also only match assets that are of a specific type, since
__typeName.keyword
will match the type of asset.Note these names do not exactly match attribute names
Note that these names are field names in the search index, and may vary from the attribute names of the assets in Atlan. To find the appropriate field name and how it relates to an attribute name, use the full model reference.
-
In this example, you will only match terms.
Build the request¶
Once the query is defined, we can then build up the search request. The request includes not only the query, but also parameters like paging and which attributes to include in the response:
Build the request | |
---|---|
4 5 6 7 8 |
|
- You can then chain additional parameters onto the fluent search. (You could of course do this all directly as part of the same chain, you do not need to store the interim
builder
variable.) - The number of results to include (per page).
- You can chain as many attributes as you want to include in each result. In this case we will return the
anchor
attribute for terms, which gives the relationship from the term to its parent glossary. - You can chain as many attributes to include on each related asset to each result. Since we are returning
anchor
relationships, this will ensure that thecertificateStatus
of those related glossaries is also included in each result. - You can now build all of this search configuration into a request.
Build the request | |
---|---|
10 11 12 13 14 15 |
|
- You can then chain additional parameters onto the fluent search. (You could of course do this all directly as part of the same chain, you do not need to store the interim
builder
variable.) - The number of results to include (per page).
- You can chain as many attributes as you want to include in each result. In this case we will return the
anchor
attribute for terms, which gives the relationship from the term to its parent glossary. - You can chain as many attributes to include on each related asset to each result. Since we are returning
anchor
relationships, this will ensure that thecertificate_status
of those related glossaries is also included in each result. - You can now build all of this search configuration into a request.
Build the request | |
---|---|
4 5 6 7 8 |
|
- You can then chain additional parameters onto the fluent search. (You could of course do this all directly as part of the same chain, you do not need to store the interim
builder
variable.) - The number of results to include (per page).
- You can chain as many attributes as you want to include in each result. In this case we will return the
anchor
attribute for terms, which gives the relationship from the term to its parent glossary. - You can chain as many attributes to include on each related asset to each result. Since we are returning
anchor
relationships, this will ensure that thecertificateStatus
of those related glossaries is also included in each result. - You can now build all of this search configuration into a request.
POST /api/meta/search/indexsearch | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
- A query should always be defined within the
dsl
portion of the request. - In addition to the query, you can specify
from
andsize
parameters for pagination. - The query itself should be provided within the
query
portion of thedsl
. Here you would use the query body provided in the earlier step. - You must set
track_total_hits
totrue
if you want an exact count of the number of results (in particular for pagination). - The list of attributes to include in each result. In this case we will return the
anchor
attribute for terms, which gives the relationship from the term to its parent glossary. - The list of attributes to include on each relationship that is included in each result. Since we are returning
anchor
relationships, this will ensure that thecertificateStatus
of those related glossaries is also included in each result. - You can also choose whether to include other related information, such as the terms and Atlan tags assigned to each result. In general, only include the information you require — this will provide the best performance.
Run the search¶
To now run the search, we call the search()
method against our request object:
Run the search | |
---|---|
9 10 |
|
- The
getApproximateCount()
method gives the total number of results overall (not restricted by page).
Run the search | |
---|---|
16 17 |
|
- The
count
property gives the total number of results overall (not restricted by page).
Run the search | |
---|---|
9 10 |
|
- The
.approximateCount
member gives the total number of results overall (not restricted by page).
Implicit in the previous step
Actually running the search is implicit in the example above for the previous raw API step.
Iterate through results¶
One page of results¶
To iterate through one page of results, loop through the list of assets:
Iterate through one page of results | |
---|---|
11 12 13 14 15 16 |
|
- The page of results itself can be accessed through the
getAssets()
method on the response. - You can then iterate through these results from a single page.
- Remember that each result is a generic
Asset
. In our example we searched for a specific type, but another example may search for any asset with a given name (or Atlan tag) — so each result could be a different type. So again we should check and cast the results as-needed.
Iterate through one page of results | |
---|---|
18 19 20 |
|
- You can iterate through the results from a single page.
- Remember per the type hints each result is a generic
Asset
. In our example we searched for a specific type, but another example may search for any asset with a given name (or Atlan tags) - so each result could be a different type. So if we want to take allow an IDE to provide better code completion, we need include anif isinstance(asset, asset_type)
whereasset_type
is the type of the asset we want the IDE to know about. Inside the IDE will know the object is of the specified type. It's also a good practice that will prevent run-time errors if an asset is not of the expected type.
Iterate through one page of results | |
---|---|
11 12 13 14 15 16 |
|
- The page of results itself can be accessed through the
.assets
member on the response. - You can then iterate through these results from a single page.
- Remember that each result is a generic
Asset
. In our example we searched for a specific type, but another example may search for any asset with a given name (or Atlan tag) — so each result could be a different type. So again we should check and cast the results as-needed.
Each object in entities
is a matching asset
Each item in the entities
array of the response will give details about a matching asset.
Multiple pages of results¶
To iterate through multiple pages of results:
Iterate through multiple pages of results | |
---|---|
11 12 13 |
|
- You can simply iterate over the reponse itself. This will lazily load and loop through each page of results until the loop finishes or you break out of it. (You could also use
response.forEach()
, which uses the same iteratable-based implementation behind-the-scenes.)
Iterate through multiple pages of results (streaming) | |
---|---|
11 12 13 14 |
|
-
Alternatively, you can also stream the results direct from the response. This will also lazily load and loop through each page of results.
Can be chained without creating a request in-between
You can actually chain the
stream()
method directly onto the end of your query and request construction, without creating arequest
orresponse
object in-between. -
With streaming, you can apply your own limits to the maximum number of results you want to process.
Independent of page size
Note that this is independent of page size. You could page through results 50 at a time, but only process a maximum of 100 total results this way. Since the results are lazily-loaded when streaming, only the first two pages of results would be retrieved in such a scenario.
-
You can also apply your own logical filters to the results.
Push-down as much as you can to the query
You should of course push-down as many of the filters as you can to the query itself, but if you have a particular complex check to apply that cannot be encoded in the query this can be a useful secondary filter over the results.
-
The
forEach()
on the resulting stream will then apply whatever actions you want with the results that come through.
Iterate through multiple pages of results one page at a time | |
---|---|
18 19 20 21 22 23 |
|
- The
current_page()
method returns alist
of the assets for the current page. If there are none then an emptylist
will be returned. - Iterate through the assets in the current page.
- Remember per the type hints each result is a generic
Asset
. In our example we searched for a specific type, but another example may search for any asset with a given name (or classifications) - so each result could be a different type. So if we want to take allow an IDE to provide better code completion, we need include anif isinstance(asset, asset_type)
whereasset_type
is the type of the asset we want the IDE to know about. Inside the IDE will know the object is of the specified type. It's also a good practice that will prevent run-time errors if an asset is not of the expected type. - The
next_pages()
method retrieves the next page of results and returnTrue
if more assets are available andFalse
if they are not. - Break out of the
While
loop if no more assets are available.
Alternatively iterate through all the pages of results | |
---|---|
18 19 20 |
|
-
This will iterate through all the results without the need to be concerned with pages.
Iterating over results produces a Generator
This means that results are retrieved from the backend a page at time. This also means that you can only iterate over the results once.
-
Remember that each result is a generic
Asset
. In our example we searched for a specific type, but another example may search for any asset with a given name (or classification) — so each result could be a different type. So again we should check and cast the results as-needed.
Iterate through multiple pages of results | |
---|---|
11 12 13 |
|
- You can simply iterate over the reponse itself. This will lazily load and loop through each page of results until the loop finishes or you break out of it. (You could also use
response.forEach{ }
, which uses the same iteratable-based implementation behind-the-scenes.)
Iterate through multiple pages of results (streaming) | |
---|---|
11 12 13 14 |
|
-
Alternatively, you can also stream the results direct from the response. This will also lazily load and loop through each page of results.
Can be chained without creating a request in-between
You can actually chain the
stream()
method directly onto the end of your query and request construction, without creating arequest
orresponse
object in-between. -
With streaming, you can apply your own limits to the maximum number of results you want to process.
Independent of page size
Note that this is independent of page size. You could page through results 50 at a time, but only process a maximum of 100 total results this way. Since the results are lazily-loaded when streaming, only the first two pages of results would be retrieved in such a scenario.
-
You can also apply your own logical filters to the results.
Push-down as much as you can to the query
You should of course push-down as many of the filters as you can to the query itself, but if you have a particular complex check to apply that cannot be encoded in the query this can be a useful secondary filter over the results.
-
The
forEach{ }
on the resulting stream will then apply whatever actions you want with the results that come through.
Use the searchParameters.query
of the response
Each search response includes a searchParameters
with a nested query
string. This query string gives the details of the query that was run to produce the response — so to get a next page you can:
- Use this
query
string from the response to start building a new query using the same logic. - Add the page size to the
from
parameter embedded in that query string, to give the starting point for the next page of results. - Re-include any
attributes
orrelationAttributes
from the query string into the new query. - Send this new query to retrieve the next page of results.