Aggregating search results¶

You can aggregate information about your search results in a few ways.

Currently only the following are implemented through the SDKs, though Elasticsearch itself supports many additional scenarios¹.

Bucket aggregation¶

You can group results together based on a field using bucket aggregation. With this, you can answer questions like:

Which kinds of assets most frequently match my search criteria?

1.4.0 1.1.0

For example:

Java Python Raw REST API

Build a bucket aggregation
IndexSearchRequest request = Atlan.getDefaultClient().assets.select() // (1)
    .aggregate("type", Asset.TYPE_NAME.bucketBy()) // (2)
    .sort(Asset.CREATE_TIME.order(SortOrder.Desc))
    .toRequest(); // (3)
IndexSearchResponse response = request.search(); // (4)

Start building a query from a client, using its 'assets' member's select() method.
Add an aggregation by chaining one or more aggregate() methods, and passing:
- Any arbitrary key you want, which you'll use to look up the results of the aggregation in the response. You can add as many aggregations as you want, but each must have a unique key to look up its unique results.
- The field you want to aggregate, along with the kind of aggregation you want to do on that field. This example will bucket the results based on the distinct types of assets (tables, columns, etc).
You can then turn these criteria into a search request using the toRequest() helper.
And once you have a request, you can then run the search.

Do something with the results
Map<String, AggregationResult> aggregates = response.getAggregations(); // (1)
AggregationBucketResult result = (AggregationBucketResult) aggregates.get("type"); // (2)
List<AggregationBucketDetails> buckets = result.getBuckets(); // (3)
for (AggregationBucketDetails detail : buckets) { // (4)
    detail.getKey(); // (5)
    detail.getDocCount(); // (6)
}

From the search response, not only can you retrieve the results (as in previous examples), but when an aggregation is requested you can also retrieve the aggregation result.
Since multiple aggregations can be requested, you can retrieve a specific aggregation result by name. (You would probably want to type-check this before the explicit cast.)
If the result is to a request that produces aggregation buckets, there will be bucket-specific details within it.
You can iterate through these details...
...to retrieve the key of the bucket (in the example this would be the type of asset: table, column, etc).
...to retrieve the number of results that match that bucket key (in the example, how many tables, columns, etc there are in the results).

Build a bucket aggregation
from pyatlan.client.atlan import AtlanClient
from pyatlan.model.assets import Asset
from pyatlan.model.fluent_search import FluentSearch

client = AtlanClient()
request = (
    FluentSearch.select() # (1)
    .aggregate("type", Asset.TYPE_NAME.bucket_by()) # (2)
    .sort(Asset.CREATE_TIME.order())
).to_request() # (3)
results = client.asset.search(criteria=request) # (4)

Start building a query from a FluentSearch, using its select() method.
Add an aggregation by chaining one or more aggregate() methods, and passing:
- Any arbitrary key you want, which you'll use to look up the results of the aggregation in the response. You can add as many aggregations as you want, but each must have a unique key to look up its unique results.
- The field you want to aggregate, along with the kind of aggregation you want to do on that field. This example will bucket the results based on the distinct types of assets (tables, columns, etc).
You can then turn these criteria into a search request using the to_request() helper.
And once you have a request, you can then run the search.

Do something with the results
result = results.aggregations["type"] # (1)
buckets = result.buckets: # (2)
for detail in buckets: # (3)
    detail.key # (4)
    detail.doc_count # (5)

Since multiple aggregations can be requested, you can retrieve a specific aggregation result by name.
If the result is to a request that produces aggregation buckets, there will be bucket-specific details within it.
You can iterate through these details...
...to retrieve the key of the bucket (in the example this would be the type of asset: table, column, etc).
...to retrieve the number of results that match that bucket key (in the example, how many tables, columns, etc there are in the results).

POST /api/meta/search/indexsearch
{
  "dsl": {
    "aggregations": { // (1)
      "type": {
        "terms": {
          "field": "__typeName.keyword"
        }
      }
    }
    "query": {
      "bool": {
        "filter": [
          { "term": { "__typeName.keyword": "Table" }}
        ]
      }
    },
    "sort": [
      { "__modificationTimestamp": { "order": "desc" }}
    ]
  }
}

Add an aggregation to your search. You can add multiple aggregations to a single search, but each must have a unique name (type in this example is such a name).

Metric aggregation¶

You can also calculate metrics about your search results. With this, you can answer questions like:

What is the average number of columns I have in tables and views in a particular schema?

1.4.0 1.1.0

For example:

Java Python Raw REST API

Build a metric aggregation
IndexSearchRequest request = Atlan.getDefaultClient().assets.select() // (1)
    .aggregate("avg_columns", Table.COLUMN_COUNT.avg()) // (2)
    .sort(Asset.CREATE_TIME.order(SortOrder.Desc))
    .toRequest(); // (3)
IndexSearchResponse response = request.search(); // (4)

Start building a query from a client, using its 'assets' member's select() method.
Add an aggregation by chaining one or more aggregate() methods, and passing:
- Any arbitrary key you want, which you'll use to look up the results of the aggregation in the response. You can add as many aggregations as you want, but each must have a unique key to look up its unique results.
- The field you want to aggregate, along with the kind of aggregation you want to do on that field. This example will calculate an average of numeric values across the results (in this case, column counts on tables).
You can then turn these criteria into a search request using the toRequest() helper.
And once you have a request, you can then run the search.

Do something with the results
Map<String, AggregationResult> aggregates = response.getAggregations(); // (1)
AggregationMetricResult result = (AggregationMetricResult) aggregates.get("avg_columns"); // (2)
result.getValue(); // (3)

From the search response, not only can you retrieve the results (as in previous examples), but when an aggregation is requested you can also retrieve the aggregation result.
Since multiple aggregations can be requested, you can retrieve a specific aggregation result by name. (You would probably want to type-check this before the explicit cast.)
If the result is to a request that produces an aggregation metric, you can retrieve the value of that calculated metric directly.

Build a metric aggregation
from pyatlan.client.atlan import AtlanClient
from pyatlan.model.assets import Asset, Table
from pyatlan.model.fluent_search import FluentSearch

client = AtlanClient()
request = (
    FluentSearch
    .select() # (1)
    .aggregate("avg_columns", Table.COLUMN_COUNT.avg()) # (2)
    .sort(Asset.CREATE_TIME.order())
).to_request() # (3)
results = client.asset.search(criteria=request)

Start building a query from the FluentSearch, using its select() method.
Add an aggregation by chaining one or more aggregate() methods, and passing:
- Any arbitrary key you want, which you'll use to look up the results of the aggregation in the response. You can add as many aggregations as you want, but each must have a unique key to look up its unique results.
- The field you want to aggregate, along with the kind of aggregation you want to do on that field. This example will calculate an average of numeric values across the results (in this case, column counts on tables).
You can then turn these criteria into a search request using the to_request() helper.

Do something with the results
result = results.aggregations['avg_columns'] # (1)
result.value # (2)

Since multiple aggregations can be requested, you can retrieve a specific aggregation result by name.
If the result is to a request that produces an aggregation metric, you can retrieve the value of that calculated metric directly.

POST /api/meta/search/indexsearch
{
  "dsl": {
    "aggregations": { // (1)
      "avg_columns": {
        "avg": {
          "field": "columnCount"
        }
      }
    }
    "query": {
      "bool": {
        "filter": [
          { "term": { "__typeName.keyword": "Table" }}
        ]
      }
    },
    "sort": [
      { "__modificationTimestamp": { "order": "desc" }}
    ]
  }
}

Add an aggregation to your search. You can add multiple aggregations to a single search, but each must have a unique name (avg_columns in this example is such a name).

This page is a summary of the details in the Elasticsearch Guide's aggregation guide ↩