Skip to content

Parse SQL queries

You can parse your own SQL queries to assist with managing lineage.

Unsupported

Unless you have a specific agreement that states otherwise, this feature is provided as-is without any support. Also note that this does not create lineage directly — the results of the SQL parser could be used to programmatically build your own lineage.

To parse a SQL query:

Parse SQL query
1
2
3
4
5
6
7
QueryParserRequest request = QueryParserRequest.creator( // (1)
            "INSERT INTO orders (order_name, customer_id, product_id) VALUES(SELECT 'test_order', id, 21 FROM customers)", // (2)
            QueryParserSourceType.SNOWFLAKE) // (3)
        .defaultDatabase("ORDERS") // (4)
        .defaultSchema("PRODUCTION") // (5)
        .build();
    ParsedQuery parsedQuery = Atlan.getDefaultClient().queryParser.parse(request); // (6)
  1. Use the creator() method to initialize the object with all necessary attributes.
  2. Provide the SQL code itself.
  3. Specify the data store for the SQL code, so it can be appropriately interpreted.
  4. (Optional) Provide a name to use for the database for any objects in the query that are not qualified.
  5. (Optional) Provide a name to use for the schema for any objects in the query that are not qualified.
  6. Call the parse() method to actually parse the query.
Parse SQL query
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
from pyatlan.client.atlan import AtlanClient
from pyatlan.model.enums import QueryParserSourceType
from pyatlan.model.query import QueryParserRequest

request = QueryParserRequest.create( # (1)
    sql="INSERT INTO orders (order_name, customer_id, product_id)" # (2)
    " VALUES(SELECT 'test_order', id, 21 FROM customers)",
    source=QueryParserSourceType.SNOWFLAKE, # (3)
)
request.default_database = "ORDERS" # (4)
request.default_schema = "PRODUCTION" # (5)
client = AtlanClient()
response = client.parse_query(request) # (6)
  1. Use the create() method to initialize the object with all necessary attributes.
  2. Provide the SQL code itself.
  3. Specify the data store for the SQL code, so it can be appropriately interpreted.
  4. (Optional) Provide a name to use for the database for any objects in the query that are not qualified.
  5. (Optional) Provide a name to use for the schema for any objects in the query that are not qualified.
  6. Call the parse_query() method to actually parse the query.
POST /api/sql/query/parse
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
{
  "sql": "INSERT INTO orders (order_name, customer_id, product_id) VALUES(SELECT 'test_order', id, 21 FROM customers)", // (1)
  "source": "snowflake", // (2)
  "defaultDatabase": "ORDERS", // (3)
  "defaultSchema": "PRODUCTION", // (4)
  "linkOrphanColumnToFirstTable": false,
  "showJoin": true,
  "ignoreRecordSet": true,
  "ignoreCoordinate": true
}
  1. Provide the SQL code itself.
  2. Specify the data store for the SQL code, so it can be appropriately interpreted.
  3. (Optional) Provide a name to use for the database for any objects in the query that are not qualified.
  4. (Optional) Provide a name to use for the schema for any objects in the query that are not qualified.

How to interpret the response

The parser interprets the query and provides back details about column-level processing.

Database objects

Database objects include tables, views, result sets of queries, and processes.

  • For objects like tables and views, the objects list includes the actual columns used.
  • For result sets, the objects list may also include system-generated columns that represent changes in the number of records returned or an aggregate value
  • Every object is assigned a numeric id value, that can be used to cross-reference the flow details in the relationships (below).

Relationships

Relationships returned by the parser indicate the flow of data between the database objects.

Each relationship is composed of:

  • a target, indicating the output column of the flow
  • one or more sources, indicating the input columns used to produce that output
  • a type, which can be one of:
    • fdd indicates a flow of data direct from source column to target column
    • fdr indicates a flow where the source column impacts the target, either:
      • the number of records in the target (for example, a column used in a where clause)
      • the resulting value of the target (for example, an aggregate function)
  • Numeric id, parentId and processId identifiers refer to the same numeric id fields used in the objects (above). So these can be used to lookup the additional object detail.
Interpret the parsed response
8
9
List<ParsedQuery.DatabaseObject> objects = parsedQuery.getObjects(); // (1)
List<ParsedQuery.Relationship> relationships = parsedQuery.getRelationships(); // (2)
  1. You can retrieve the list of objects from the response using getObjects().
  2. You can retrieve the list of relationships (flows) between the objects using getRelationships().
Interpret the parsed response
14
15
objects = response.dbobjs # (1)
relationships = response.relationships # (2)
  1. You can retrieve the list of objects from the response using the dbobjs property.
  2. You can retrieve the list of relationships (flows) between the objects using the relationships property.
POST /api/sql/query/parse
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
{
  "dbobjs": [ // (1)
    {
      "columns": [
        {
          "name": "id",
          "id": "13"
        }
      ],
      "displayName": "ORDERS.PRODUCTION.customers",
      "id": "12",
      "name": "customers",
      "type": "table",
      "database": "ORDERS",
      "schema": "PRODUCTION"
    },
    {
      "columns": [
        {
          "name": "order_name",
          "id": "5"
        },
        {
          "name": "customer_id",
          "id": "6"
        },
        {
          "name": "product_id",
          "id": "7"
        }
      ],
      "displayName": "ORDERS.PRODUCTION.orders",
      "id": "4",
      "name": "orders",
      "type": "table",
      "database": "ORDERS",
      "schema": "PRODUCTION"
    },
    {
      "id": "8",
      "name": "Query Insert-1",
      "procedureName": "batchQueries",
      "queryHashId": "0acc71f4f1b91a589c4e3fc652135c64",
      "type": "process",
      "database": "ORDERS",
      "schema": "PRODUCTION"
    }
  ],
  "relationships": [ // (2)
    {
      "id": "5",
      "type": "fdd",
      "effectType": "insert",
      "target": {
        "id": "5",
        "column": "order_name",
        "parentId": "4",
        "parentName": "orders"
      },
      "sources": [
        {
          "id": "13",
          "column": "id",
          "parentId": "12",
          "parentName": "customers"
        }
      ],
      "processId": "8",
      "processType": "sstinsert"
    }
  ]
}
  1. The objects are returned in the dbobjs list.
  2. The relationships are returned in the relationships list.