Skip to content

Testing toolkit Currently only usable by Atlanians

With the testing toolkit we can guide you to write write robust, reusable integration tests for connectors and utilities in Atlan.

testing-toolkit

Writing tests for non-toolkit based scripts βš’

You can write integration tests for existing scripts in the marketplace-csa-scripts repository, even if they are not based on package toolkits. These tests help verify script behavior end-to-end in a real Atlan tenant.

We'll begin by performing minimal refactoring of the existing script, as it's necessary to enable writing integration tests.

Step 1: Rename directory to snake_case

If the script is in kebab-case directory, convert it to snake_case.

Do this just after renaming

Update references in mkdocs.yml, delete the old directory, and verify imports/links still work.

For example:

Before:

scripts/
└── designation-based-group-provisioning/
    β”œβ”€β”€ main.py
    β”œβ”€β”€ index.md
    └── tests/
        └── test_main.py

After:

scripts/
└── designation_based_group_provisioning/
    β”œβ”€β”€ main.py
    β”œβ”€β”€ index.md
    └── tests/
        └── test_main.py

Step 2: Refactor main.py

DO

  • Refactor the script without altering logic or flow.
  • Wrap all logic inside functions.
  • Create a single entry point: main(args: argparse.Namespace)
  • Call helper functions from main() β€” each should receive only required args or inputs.

DO NOT

  • Rename or restructure existing functions.
  • Change the sequence or logic flow.
  • Modify argument parsing.
  • Add/remove logging unless required for debugging.

For example main.py:

def load_input_file(file: Any):
    pass

def do_something_with_file(client: AtlanClient, file: Any):
    pass

def main(args: argparse.Namespace):
    client = get_client(impersonate_user_id=args.user_id)
    client = set_package_headers(client)

    file = load_input_file(args.input_file)
    do_something_with_file(client, file)

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--user-id", required=True)
    parser.add_argument("--input-file", required=True)
    args = parser.parse_args()
    main(args)

Step 3: Add integration tests

Before writing tests, make sure you've installed the test dependencies in your local environment. You can do that by running the following command:

pip install -e ".[test]"

Alternatively, you can explicitly install the required packages by creating a requirements-test.txt file and installing them using:

requirements-dev.txt
1
2
3
4
5
6
pytest
coverage
# pytest plugins (optional) 
pytest-order
pytest-sugar
pytest-timer[termcolor]
pip install -r requirements-test.txt

Test layout for test_main.py

Create a tests/ folder if not already present:

scripts/
└── my_script/
    β”œβ”€β”€ main.py
    └── tests/
        └── test_main.py
Function Purpose
test_main_functions Test small pure helper functions individually (useful for quick validation of logic)
test_main Run the main() function with a config to simulate full script execution (end-to-end)
test_after_main (optional) Validate side effects after running the script, such as asset creation, retrieval, audit logs, etc.

For example, you can refer to this real-world integration test for designation_based_group_provisioning/main.py:

When writing integration tests for scripts in marketplace-csa-scripts, follow these practices to ensure reliable and production-relevant test coverage:

Best practices

  • Avoid using mock, patch, or mocking pyatlan clients or any Atlan interactions β€” unless absolutely necessary.
  • Integration tests should interact with a real Atlan tenant to validate actual behavior.

  • Use mocking or patching only (for example):

    • External/third-party API calls
    • Database interactions not managed by Atlan
    • Non-deterministic behavior (e.g: random data, time-based logic)
  • Use environment variables for all secrets and configuration values.

  • Load them via .env files, CI/CD secrets, or shell configs β€” never hardcode.

Things to avoid

  • Hardcoding sensitive values such as API keys, user-specific secrets, or test asset names.
  • Instead, use environment variables and pyatlan.test_utils like TestId.make_unique() to generate unique asset names and avoid naming collisions. Ensure that test objects are generated in fixtures, which can be reused across different tests, and cleaned up safely after tests are complete.

  • Using fake or placeholder data that doesn't reflect the actual structure or behavior of entities in Atlan. Always use data that closely mirrors production data for more meaningful tests.

  • Mocking pyatlan client methods β€” integration tests must execute real operations against a live Atlan tenant to ensure validity and detect regressions. Mocking undermines the purpose of integration testing.

Full example (expand for details)
test_main.py
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
  import pytest
  from types import SimpleNamespace
  from pyatlan.pkg.utils import get_client, set_package_headers
  import pandas as pd
  from scripts.designation_based_group_provisioning.main import (
      review_groups,
      get_default_groups,
      get_ungrouped_users,
      map_users_by_designation,
      main,
  )
  from pyatlan.model.group import AtlanGroup, CreateGroupResponse
  from pyatlan.client.atlan import AtlanClient
  from pyatlan.test_utils import TestId
  from typing import Generator
  import os
  from pathlib import Path

  TEST_PATH = Path(__file__).parent
  TEST_GROUP_NAME = TestId.make_unique("csa-dbgp-test")


  @pytest.fixture(scope="module")
  def config() -> SimpleNamespace:
      return SimpleNamespace(
          user_id=os.environ.get("ATLAN_USER_ID"),
          mapping_file=f"{TEST_PATH}/test_mapping.csv",
          missing_groups_handler="SKIP",
          remove_from_default_group="",
          domain_name="mock-tenant.atlan.com",
      )


  @pytest.fixture(scope="module")
  def client(config):
      if config.user_id:
          client = get_client(impersonate_user_id=config.user_id)
      else:
          client = AtlanClient()
      client = set_package_headers(client)
      return client


  @pytest.fixture(scope="module")
  def group(client: AtlanClient) -> Generator[CreateGroupResponse, None, None]:
      to_create = AtlanGroup.create(TEST_GROUP_NAME)
      g = client.group.create(group=to_create)
      # Read the CSV file
      df = pd.read_csv(f"{TEST_PATH}/mapping.csv")
      # Replace values in the 'GROUP_NAME' column with the test group name
      df["GROUP_NAME"] = df["GROUP_NAME"].replace(
          "Data Engineers and Scientists", TEST_GROUP_NAME
      )
      # Save the updated test CSV
      df.to_csv(f"{TEST_PATH}/test_mapping.csv", index=False)
      assert os.path.exists(f"{TEST_PATH}/test_mapping.csv")
      yield g
      client.group.purge(g.group)
      os.remove(f"{TEST_PATH}/test_mapping.csv")


  def test_main_functions(
      config: SimpleNamespace,
      client: AtlanClient,
      group: AtlanGroup,
      caplog: pytest.LogCaptureFixture,
  ):
      # Test configuration validation
      assert config.mapping_file.endswith(".csv")

      # Test group review functionality
      verified_groups = review_groups(
          config.mapping_file, config.missing_groups_handler, client
      )
      assert caplog.records[0].levelname == "INFO"
      assert "-> Source information procured." in caplog.records[0].message
      assert isinstance(verified_groups, set)

      default_groups = get_default_groups(client)
      assert caplog.records[6].levelname == "INFO"
      assert "DEFAULT groups found:" in caplog.records[6].message
      assert isinstance(default_groups, list) and len(default_groups) > 0

      groupless_users = get_ungrouped_users(default_groups=default_groups, client=client)
      assert isinstance(groupless_users, list) and len(groupless_users) > 0

      unmappable_users = map_users_by_designation(
          user_list=groupless_users,
          mapping_file=config.mapping_file,
          verified_groups=verified_groups,
          client=client,
      )
      assert isinstance(unmappable_users, list) and len(unmappable_users) > 0


  def test_main(
      config: SimpleNamespace,
      client: AtlanClient,
      group: AtlanGroup,
      caplog: pytest.LogCaptureFixture,
  ):
      # Test end-to-end main function execution
      main(config)

      # Verify expected log messages
      assert caplog.records[0].levelname == "INFO"
      assert "SDK Client initialized for tenant" in caplog.records[0].message
      assert "Input file path -" in caplog.records[1].message
      assert "-> Source information procured." in caplog.records[2].message
      assert "Total distinct groups in the input:" in caplog.records[3].message


  @pytest.mark.order(after="test_main")
  def test_after_main(client: AtlanClient, group: CreateGroupResponse):
      result = client.group.get_by_name(TEST_GROUP_NAME)
      assert result and len(result) == 1
      test_group = result[0]
      assert test_group.path
      assert test_group.name
      assert test_group.id == group.group
      assert test_group.attributes
      assert not test_group.attributes.description
      # Make sure users are successfully assigned
      # to the test group after running the workflow
      assert test_group.user_count and test_group.user_count >= 1

Writing tests for non-toolkit based scripts using Cursor AI code editor πŸ€–

You can leverage AI code editors like Cursor to help with refactoring existing scripts and generating integration tests for the marketplace-csa-scripts repository. However, it’s important to be aware of the potential issues and risks that may arise.

Step 1: Setup Cursor rules

To ensure the AI agent provides the desired results based on your prompts, you need to set up custom rules for your code editor.

  1. Create a rules file:

    • Create the file .cursor/rules/csa-scripts-tests.mdc in your project directory.
    • You can start by copying the example rule and modifying them to match your needs.
  2. Refine rules over time:

    • As you use AI for refactoring and generating tests, you can refine the rules. By adding more context (e.g: multiple packages and varied test patterns), the AI will become more effective over time, improving its results.

Step 2: Running the agent with the defined Rules

To run the AI agent with the defined rules, follow these steps:

  1. Open the cursor chat:

    • Press cmd + L to open a new chat in the Cursor IDE.
    • Click on Add Context, then select csa-scripts-tests.mdc to load the rules you defined.
  2. Provide a clear prompt:

    • After loading the rules, provide a clear prompt like the following to refactor your script and add integration tests:
      Refactor `scripts/asset-change-notification/main.py` using the latest Cursor rules and add integration tests in `scripts/asset_change_notification/tests/test_main.py` to ensure functionality and coverage.
      
  3. Review results:

    • Once the AI completes the task, review the generated results carefully. You may need to accept or reject parts of the refactoring based on your preferences and quality standards.

Common Issues

  • Low accuracy across models: AI results can be highly inconsistent, even after experimenting with different combinations of rules and prompts. In many cases, only a small fraction of attempts yield satisfactory results.

  • Inconsistent output: Regardless of using detailed or minimal rules, and trying various AI models (Claude 3.7, Sonnet 3.5, Gemini, OpenAI), the output often lacks consistency, leading to unsatisfactory refactorings.

Risks in refactoring

  • Code deletion: AI can unintentionally remove important parts of the original code during refactoring.

  • Unnecessary code addition: AI might add code that changes the behavior of the script, potentially introducing bugs.

  • Flaky or insufficient tests: Generated tests are often overly simplistic or unreliable. AI may also mock components that should not be mocked, leading to incomplete test coverage.

Mocking / Patching third party HTTP interactions πŸ”Œ

This becomes more common when building any connector or utility package that interacts with external systems (for example: fetching data from third-party sources).

Since these interactions usually require credentials and environment configurations, it becomes difficult to easily plug them into the existing integration test suite. Running them each time for tests, especially in CI builds, is not ideal and becomes harder to maintain over time as more integrations are added.

During development, we usually have access to the necessary credentials and environments. So instead of hitting the real APIs every time, what if we could save the responses once and reuse them?

This is where the vcrpy library comes into the picture. It helps to capture and save HTTP interactions (in files a.k.a "cassettes") during development.

The key idea is:

  • Record real-world API calls once during development.
  • Save the interactions into local files a.k.a "cassettes" (YAML or JSON).
  • Replay the saved interactions during tests without making real HTTP requests.
  • Optionally modify the saved responses to simulate different scenarios.

This sits somewhere between integration tests and unit tests β€”
it uses real API behavior but avoids needing a live environment every time. It becomes much easier to maintain, faster to run, and more configurable as the project grows.

Write VCR-based integration tests

6.0.6

For this example, we are using httpbin.org, which provides a simple and fast way to test vcrpy by recording HTTP request and response interactions.

Have you installed test dependencies?

Before writing tests, make sure you've installed the test dependencies in your local environment. You can do that by running the following command:

pip install -e ".[test]"

Alternatively, you can explicitly install the required packages by creating a requirements-test.txt file and installing them using:

requirements-dev.txt
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
pytest
coverage
# pytest plugins (optional) 
pytest-order
pytest-sugar
pytest-timer[termcolor]
pytest-vcr~=1.0.2
# pinned vcrpy to v6.x since vcrpy>=7.0 requires urllib3>=2.0
# which breaks compatibility with Python 3.8
vcrpy~=6.0.2
tests/integration/test_http_bin.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
import pytest
import requests
from pyatlan.test_utils.base_vcr import BaseVCR  # (1)


class TestHTTPBin(BaseVCR):
    """
    Integration tests to demonstrate VCR.py capabilities
    by recording and replaying HTTP interactions using
    HTTPBin (https://httpbin.org) for GET, POST, PUT, and DELETE requests.
    """

    BASE_URL = "https://httpbin.org"

    @pytest.fixture(scope="module")  # (2)
    def vcr_config(self):
        """
        Override the VCR configuration to use JSON serialization across the module.
        """
        config = self._BASE_CONFIG.copy()
        config.update({"serializer": "pretty-json"})
        return config

    @pytest.fixture(scope="module")
    def vcr_cassette_dir(self, request):  # (3)
        """
        Override the directory path for storing VCR cassettes.
        If a custom cassette directory is set in the class, it is used;
        otherwise, the default directory structure is created under "tests/cassettes".
        """
        return self._CASSETTES_DIR or os.path.join(
            "tests/vcr_cassettes", request.module.__name__
        )

    @pytest.mark.vcr()
    def test_httpbin_get(self): # (4)
        """
        Test a simple GET request to httpbin.
        """
        url = f"{self.BASE_URL}/get"
        response = requests.get(url, params={"test": "value"})

        assert response.status_code == 200
        assert response.json()["args"]["test"] == "value"

    @pytest.mark.vcr()
    def test_httpbin_post(self):
        """
        Test a simple POST request to httpbin.
        """
        url = f"{self.BASE_URL}/post"
        payload = {"name": "atlan", "type": "integration-test"}
        response = requests.post(url, json=payload)

        assert response.status_code == 200
        assert response.json()["json"] == payload

    @pytest.mark.vcr()
    def test_httpbin_put(self):
        """
        Test a simple PUT request to httpbin.
        """
        url = f"{self.BASE_URL}/put"
        payload = {"update": "value"}
        response = requests.put(url, json=payload)

        assert response.status_code == 200
        assert response.json()["json"] == payload

    @pytest.mark.vcr()
    def test_httpbin_delete(self):
        """
        Test a simple DELETE request to httpbin.
        """
        url = f"{self.BASE_URL}/delete"
        response = requests.delete(url)

        assert response.status_code == 200
        # HTTPBin returns an empty JSON object for DELETE
        assert response.json()["args"] == {}
  1. Start by importing the BaseVCR class from pyatlan.test_utils.base_vcr, which already includes base/default configurations for VCR-based tests, such as vcr_config, vcr_cassette_dir, and custom serializers like pretty-yaml (default for cassettes) and pretty-json (another cassette format).

  2. (Optional) To override any default vcr_config(), you can redefine the @pytest.fixture -> vcr_config() inside your test class. For example, you can update the serializer to use the custom pretty-json serializer.

  3. (Optional) To override the default cassette directory path, you can redefine the @pytest.fixture -> vcr_cassette_dir() inside your test class.

  4. When writing tests (e.g., test_my_scenario), make sure to add the @pytest.mark.vcr() decorator to mark them as VCR test cases. For each test case, a separate cassette (HTTP recording) will created inside the tests/vcr_cassettes/ directory.

Once you run all the tests using:

pytest tests/integration/test_http_bin.py

Since this is the first time running them, vcrpy will record all the HTTP interactions automatically and save them into the tests/vcr_cassettes/ directory βœ…

For example, here’s a saved cassette for the TestHTTPBin.test_httpbin_post test:

tests/vcr_cassettes/tests.integration.test_http_bin/TestHTTPBin.test_httpbin_post.yaml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
interactions:
- request:
    body: |-
      {
        "name": "atlan",
        "type": "integration-test"
      }
    headers: {}
    method: POST
    uri: https://httpbin.org/post
  response:
    body:
      string: |-
        {
          "args": {},
          "data": "{\"name\": \"atlan\", \"type\": \"integration-test\"}",
          "files": {},
          "form": {},
          "headers": {
            "Accept": "*/*",
            "Accept-Encoding": "gzip, deflate",
            "Content-Length": "45",
            "Content-Type": "application/json",
            "Host": "httpbin.org",
            "User-Agent": "python-requests/2.32.3",
            "X-Amzn-Trace-Id": "Root=1-680f7290-276efa7f015f83d24d9fdfc4"
          },
          "json": {
            "name": "atlan",
            "type": "integration-test"
          },
          "origin": "x.x.x.x",
          "url": "https://httpbin.org/post"
        }
    headers: {}
    status:
      code: 200
      message: OK
version: 1
vcrpy not sufficient for your use case? πŸ€”

There might be cases where VCR.py's recorded responses are not sufficient for your testing needs, even after applying custom configurations. In such scenarios, you can switch to using Python's built-in mock/patch object library for greater flexibility and control over external dependencies.