Skip to content

Querying overview

The query defines what you want to find in your search.

At the highest level, what is key to understand is the context of your query.

Query context

By default, Elasticsearch sorts results by a relevance score, which measures how well each document matches a query.1 When you as a person are running a search through a UI this is fantastic — your results are presented back in a logical order even with some fuzziness applied to your search terms.

This works because queries calculate these scores to rank (sort) the results.

Queries that calculate these scores run in query context. They answer the question:

"How well does this result match this query clause?"

Filter context

When we're talking about machines running searches, though, this kind of scoring and ranking is often unnecessary. In most cases, in your program you only want to know whether a result matches what you're looking for or not — a much more binary decision.

Queries that include or exclude a result as a binary decision run in filter context. They answer the question:

"Does this result match this query clause (yes or no)?"

Filter context is therefore faster, and in addition is cached automatically by Elasticsearch.

Use filter context, unless you need the results to be scored

So unless you have a specific need for your results to be scored, like human input, use a filter context for performance.

1.0.0 1.1.0

What this means in practice:

Build the query and request
1
2
3
4
IndexSearchRequest index = Table.select() // (1)
    .where(Table.NAME.startsWith("abc")) // (2)
    .sort(Table.UPDATE_TIME.order(SortOrder.Desc)) // (3)
    .toRequest();
  1. Starting with the fluent search's select() helper will construct a query in the background that uses filters to narrow results by type (Table in this example) and to only active assets.
  2. Any other conditions you chain onto the query (through a .where()) will also be translated to filters.
  3. If you are sorting by some property of the results anyway, like when they were last modified, you probably do not need a score for each result — so filters will be the more performant option.
Build the query and request
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
from pyatlan.model.enums import SortOrder
from pyatlan.model.fluent_search import CompoundQuery, FluentSearch
from pyatlan.model.assets import Table

index = (FluentSearch()  # (1)
     .where(CompoundQuery.asset_type(Table))  # (2)
     .where(CompoundQuery.active_assets())
     .where(Table.NAME.startswith("abc"))
     .sort(Table.UPDATE_TIME.order(SortOrder.DESCENDING))  # (3)
    ).to_request()
  1. Starting with a FluentSearch() will construct a query.
  2. Every chained .where() condition will be translated to a filter in Elastic.
  3. If you are sorting by some property of the results anyway, like when they were last modified, you probably do not need a score for each result — so filters will be the more performant option.
POST /api/meta/search/indexsearch
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
{
  "dsl": {
    "query": { // (1)
      "bool": {
        "filter": [ // (2)
          { "term": { "__typeName.keyword": "Table" }}
        ]
      }
    },
    "sort": [ // (3)
      { "__modificationTimestamp": { "order": "desc" }}
    ]
  }
}
  1. Although we use a query construct (which we must to get any results)...
  2. ...if we are looking for exact matches only (and don't care about scoring), then we should put our search requirements into a filter.
  3. This is particularly true if we are sorting by some property of the results anyway, like when they were last modified.

  1. This page is a summary of the details in the Elasticsearch Guide's Query and filter context