Skip to content

Paging search results

Automatically (via SDK)

Our SDKs are designed to simplify paging, so you do not need to worry about the underlying details. You can simply iterate through a search response and the SDK will automatically fetch the next page(s) when it needs to (lazily).

The SDKs will even add a default sort by GUID to ensure stable results across pages, even when you do not provide any sorting criteria yourself.

Automatic paging
1
2
3
4
5
6
client.assets.select() // 
    .pageSize(50) // 
    .stream() // 
    .limit(100) // 
    .filter(a -> !(a instanceof ILineageProcess)) // 
    .forEach(a -> log.info("Do something with each result: {}", a)); // 
Build the query
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from pyatlan.client.atlan import AtlanClient
from pyatlan.model.assets import Asset, Process
from pyatlan.model.fluent_search import CompoundQuery, FluentSearch

builder = (
    FluentSearch()  # 
    .where(CompoundQuery.active_assets())  # 
).to_request()  # 
results = client.asset.search(index)  # 
for asset in results: # 
    if not isinstance(asset, Process): # 
        non_process = asset
Automatic paging
1
2
3
4
5
6
client.assets.select() // 
    .pageSize(50) // 
    .stream() // 
    .limit(100) // 
    .filter { it !is ILineageProcess } // 
    .forEach { log.info { "Do something with each result: $it" } } // 

Use an SDK

The SDKs manage making multiple requests and parsing results to make subsequent requests in the most efficient way possible. You will need to make many different API requests if you want to do the same directly via the raw REST APIs.

Manually (via Elastic)

For curious minds, though, you can page through search results using a combination of the following properties1:

Property Description Example
from Indicates the starting point for the results. 0
size Indicates how many results to include per response (page). As a general rule of thumb we would recommend a size from 20-100, making 50 a common starting point. 50
track_total_hits Includes an accurate number of total results, if set to true. With its default value on the raw REST APIs (false) the maximum number of results you will see in the approximateCount field in the response is 10000. (Again, the SDKs set this to true by default to avoid this confusion.) true

Constraints with this approach

To have the most consistent results you can when paging, you must always use some sorting criteria and include at least one sorting criteria as a tie-breaker. (You must also keep that criteria the same for every page.)

Furthermore, as you get to larger from sizes (more than ~10,000) Elastic will begin to use significantly more resources to process your paging. To reduce this impact, if you need to page through many results you should implement your own timestamp-based offset mechanism so that the from size is kept consistently low.

(Again, the SDKs do both of these for you automatically.)

1.4.0 1.1.0

For example:

Annotated sort options, as you would define them in the Java SDK
1
2
SortOptions byUpdate = Asset.UPDATE_TIME.order(SortOrder.Desc); // 
SortOptions byGuid = Asset.GUID.order(SortOrder.Asc); // 
Build the request
 3
 4
 5
 6
 7
 8
 9
10
11
IndexSearchRequest index = IndexSearchRequest.builder(
  IndexSearchDSL.builder(someQuery) // 
      .from(100) // 
      .size(50) // 
      .trackTotalHits(true) // 
      .sortOption(byUpdate) // 
      .sortOption(byGuid)
      .build())
    .build();
Iterate through multiple pages of results
12
13
14
15
16
17
18
19
20
21
IndexSearchResponse response = index.search(client); // 
long totalResults = response.getApproximateCount(); // 
for (Asset result : response) { // 
    // Do something with each result of the search...
}
response.forEach(a -> log.info("Found asset: {}", a.getGuid())); // 
response.stream() // 
    .filter(a -> !(a instanceof ILineageProcess)) // 
    .limit(100) // 
    .forEach(a -> log.info("Found asset: {}", a.getGuid())) // 
Annotated sort options, as you would define them in the Python SDK
1
2
3
4
5
6
7
from pyatlan.client.atlan import AtlanClient
from pyatlan.model.enums import SortOrder
from pyatlan.model.assets import Referenceable
from pyatlan.model.search import IndexSearchRequest, DSL

by_update = Referenceable.UPDATE_TIME.order(SortOrder.DESCENDING)  # 
by_guid = Referenceable.GUID.order(SortOrder.ASCENDING)  # 
Build the request
 8
 9
10
11
12
13
14
15
16
17
18
19
index = IndexSearchRequest(
    dsl=DSL(
        query=someQuery,  # 
        from_=100,  # 
        size=50,  # 
        track_total_hits=True,  # 
        sort=[  # 
            by_update,
            by_guid
        ],
    )
)
Iterate through multiple pages of results
20
21
22
23
24
client = AtlanClient()
response = client.asset.search(index)  # 
total_results = response.count  # 
for result in response:  # 
    # Do something with each result of the search...
Annotated sort options, as you would define them in the Java SDK
1
2
val byUpdate = Asset.UPDATE_TIME.order(SortOrder.Desc) // 
val byGuid = Asset.GUID.order(SortOrder.Asc) // 
Build the request
 3
 4
 5
 6
 7
 8
 9
10
11
val index = IndexSearchRequest.builder(
  IndexSearchDSL.builder(someQuery) // 
      .from(100) // 
      .size(50) // 
      .trackTotalHits(true) // 
      .sortOption(byUpdate) // 
      .sortOption(byGuid)
      .build())
    .build()
Iterate through multiple pages of results
12
13
14
15
16
17
18
19
20
21
val response = index.search(client) // 
val totalResults = response.approximateCount // 
for (result in response) { // 
    // Do something with each result of the search...
}
response.forEach { log.info { "Found asset: ${it.guid}" } } // 
response.stream() // 
    .filter { it !is ILineageProcess } // 
    .limit(100) // 
    .forEach { log.info { "Found asset: ${it.guid}" } } // 
POST /api/meta/search/indexsearch
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
{
  "dsl": {
    "from": 100, // 
    "size": 50, // 
    "track_total_hits": true, // 
    "query": {...}, // 
    "sort": [ // 
      { "__modificationTimestamp": { "order": "desc" }}, // 
      { "__guid": { "order": "asc" }} // 
    ]
  }
}
Annotated response, in plain JSON
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
{
  "queryType": "INDEX",
  "searchParameters": {
      "showSearchScore": false,
      "suppressLogs": false,
      "allowDeletedRelations": false,
      "query": "{\"from\":100,\"size\":50,\"track_total_hits\":true,\"query\":{...},\"sort\":[{\"__modificationTimestamp\":\"asc\"},{\"__guid\":\"asc\"}]}" // 
  },
  "entities": [ // 
    {...},
    {...},
    ...
  ],
  "approximateCount": 24631 // 
}

  1. If you're familiar with Elasticsearch there are an alternative paging options using search_after and point-in-time (PIT) state preservation. (There also used to be scrolling, but this is no longer recommended by Elasticsearch.) We do not currently expose the search_after or PIT approaches through Atlan's search. However, you should still be able to page beyond the first 10,000 results using the approach outlined above.