Skip to content

Testing toolkit Currently only usable by Atlanians

With the testing toolkit we can guide you to write robust, reusable integration tests for connectors and utilities in Atlan.

testing-toolkit

Writing tests for non-toolkit based scripts โš’

You can write integration tests for existing scripts in the marketplace-csa-scripts repository, even if they are not based on package toolkits. These tests help verify script behavior end-to-end in a real Atlan tenant.

We'll begin by performing minimal refactoring of the existing script, as it's necessary to enable writing integration tests.

Step 1: Rename directory to snake_case

If the script is in kebab-case directory, convert it to snake_case.

Do this just after renaming

Update references in mkdocs.yml, delete the old directory, and verify imports/links still work.

For example:

Before:

scripts/
โ””โ”€โ”€ designation-based-group-provisioning/
    โ”œโ”€โ”€ main.py
    โ”œโ”€โ”€ index.md
    โ””โ”€โ”€ tests/
        โ””โ”€โ”€ test_main.py

After:

scripts/
โ””โ”€โ”€ designation_based_group_provisioning/
    โ”œโ”€โ”€ main.py
    โ”œโ”€โ”€ index.md
    โ””โ”€โ”€ tests/
        โ””โ”€โ”€ test_main.py

Step 2: Refactor main.py

DO

  • Refactor the script without altering logic or flow.
  • Wrap all logic inside functions.
  • Create a single entry point: main(args: argparse.Namespace)
  • Call helper functions from main() โ€” each should receive only required args or inputs.

DO NOT

  • Rename or restructure existing functions.
  • Change the sequence or logic flow.
  • Modify argument parsing.
  • Add/remove logging unless required for debugging.

Example refactored main.py:

main.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import argparse
from typing import Any
from pyatlan.client.atlan import AtlanClient
from pyatlan.pkg.utils import get_client, set_package_headers


def load_input_file(file_path: str) -> Any:
    """Load and validate the input file."""
    # Your file loading logic here
    pass


def process_data_with_atlan(client: AtlanClient, data: Any) -> None:
    """Process the loaded data using Atlan client."""
    # Your data processing logic here
    pass


def main(args: argparse.Namespace) -> None:
    """Main entry point for the script."""
    # Initialize Atlan client
    client = get_client(impersonate_user_id=args.user_id)
    client = set_package_headers(client)

    # Load and process data
    data = load_input_file(args.input_file)
    process_data_with_atlan(client, data)


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Script description")
    parser.add_argument("--user-id", required=True, help="User ID for impersonation")
    parser.add_argument("--input-file", required=True, help="Path to input file")
    args = parser.parse_args()
    main(args)

Step 3: Add integration tests

Prerequisites: Install test dependencies

Before writing tests, you need to install the required testing dependencies. Choose one of the following methods:

Option 1: Install from package (recommended if available)

pip install -e ".[test]"

Option 2: Install explicitly with requirements file

  1. Create a requirements-test.txt file:
requirements-test.txt
1
2
3
4
5
6
pytest>=7.4.0
coverage>=7.6.1
# pytest plugins (optional but recommended) 
pytest-order>=1.3.0
pytest-sugar>=1.0.0
pytest-timer[termcolor]>=1.0.0
  1. Install the dependencies:
pip install -r requirements-test.txt

Ready to proceed

Once dependencies are installed, you can proceed to write your integration tests.

Test layout for test_main.py

Create a tests/ folder if not already present:

scripts/
โ””โ”€โ”€ my_script/
    โ”œโ”€โ”€ main.py
    โ””โ”€โ”€ tests/
        โ””โ”€โ”€ test_main.py
Function Purpose
test_main_functions Test small pure helper functions individually (useful for quick validation of logic)
test_main Run the main() function with a config to simulate full script execution (end-to-end)
test_after_main (optional) Validate side effects after running the script, such as asset creation, retrieval, audit logs, etc.

Example Reference: For a complete real-world example, see the integration test for designation_based_group_provisioning/main.py.

When writing integration tests for scripts in marketplace-csa-scripts, follow these practices to ensure reliable and production-relevant test coverage:

Best practices

โœ… DO:

  • Test against real Atlan tenants - Integration tests should interact with actual Atlan instances to validate real behavior
  • Use environment variables for all secrets and configuration values
  • Load configuration safely via .env files, CI/CD secrets, or shell configs โ€” never hardcode sensitive data

๐Ÿ”„ MOCK ONLY WHEN NECESSARY:

Use mocking or patching sparingly, and only for:

  • External/third-party API calls (non-Atlan services)
  • Database interactions not managed by Atlan
  • Non-deterministic behavior (e.g., random data, time-based logic)

โŒ AVOID:

  • Mocking pyatlan clients or any Atlan interactions unless absolutely necessary

Common pitfalls to avoid

โŒ Don't hardcode sensitive values

  • Never hardcode API keys, user-specific secrets, or test asset names
  • Instead: Use environment variables and pyatlan.test_utils.TestId.make_unique() for unique naming
  • Best practice: Generate test objects in fixtures for reusability and proper cleanup

โŒ Don't use fake data

  • Avoid placeholder data that doesn't reflect real Atlan entity structures
  • Instead: Use data that closely mirrors production for meaningful tests

โŒ Don't mock Atlan client methods

  • Integration tests must execute real operations against live Atlan tenants
  • Why: Mocking undermines the purpose of integration testing and may miss regressions
  • Remember: You're testing the integration, not the individual components
Full example (expand for details)
test_main.py
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
  import pytest
  from types import SimpleNamespace
  from pyatlan.pkg.utils import get_client, set_package_headers
  import pandas as pd
  from scripts.designation_based_group_provisioning.main import (
      review_groups,
      get_default_groups,
      get_ungrouped_users,
      map_users_by_designation,
      main,
  )
  from pyatlan.model.group import AtlanGroup, CreateGroupResponse
  from pyatlan.client.atlan import AtlanClient
  from pyatlan.test_utils import TestId
  from typing import Generator
  import os
  from pathlib import Path

  TEST_PATH = Path(__file__).parent
  TEST_GROUP_NAME = TestId.make_unique("csa-dbgp-test")


  @pytest.fixture(scope="module")
  def config() -> SimpleNamespace:
      return SimpleNamespace(
          user_id=os.environ.get("ATLAN_USER_ID"),
          mapping_file=f"{TEST_PATH}/test_mapping.csv",
          missing_groups_handler="SKIP",
          remove_from_default_group="",
          domain_name="mock-tenant.atlan.com",
      )


  @pytest.fixture(scope="module")
  def client(config):
      if config.user_id:
          client = get_client(impersonate_user_id=config.user_id)
      else:
          client = AtlanClient()
      client = set_package_headers(client)
      return client


  @pytest.fixture(scope="module")
  def group(client: AtlanClient) -> Generator[CreateGroupResponse, None, None]:
      to_create = AtlanGroup.create(TEST_GROUP_NAME)
      g = client.group.create(group=to_create)
      # Read the CSV file
      df = pd.read_csv(f"{TEST_PATH}/mapping.csv")
      # Replace values in the 'GROUP_NAME' column with the test group name
      df["GROUP_NAME"] = df["GROUP_NAME"].replace(
          "Data Engineers and Scientists", TEST_GROUP_NAME
      )
      # Save the updated test CSV
      df.to_csv(f"{TEST_PATH}/test_mapping.csv", index=False)
      assert os.path.exists(f"{TEST_PATH}/test_mapping.csv")
      yield g
      client.group.purge(g.group)
      os.remove(f"{TEST_PATH}/test_mapping.csv")


  def test_main_functions(
      config: SimpleNamespace,
      client: AtlanClient,
      group: AtlanGroup,
      caplog: pytest.LogCaptureFixture,
  ):
      # Test configuration validation
      assert config.mapping_file.endswith(".csv")

      # Test group review functionality
      verified_groups = review_groups(
          config.mapping_file, config.missing_groups_handler, client
      )
      assert caplog.records[0].levelname == "INFO"
      assert "-> Source information procured." in caplog.records[0].message
      assert isinstance(verified_groups, set)

      default_groups = get_default_groups(client)
      assert caplog.records[6].levelname == "INFO"
      assert "DEFAULT groups found:" in caplog.records[6].message
      assert isinstance(default_groups, list) and len(default_groups) > 0

      groupless_users = get_ungrouped_users(default_groups=default_groups, client=client)
      assert isinstance(groupless_users, list) and len(groupless_users) > 0

      unmappable_users = map_users_by_designation(
          user_list=groupless_users,
          mapping_file=config.mapping_file,
          verified_groups=verified_groups,
          client=client,
      )
      assert isinstance(unmappable_users, list) and len(unmappable_users) > 0


  def test_main(
      config: SimpleNamespace,
      client: AtlanClient,
      group: AtlanGroup,
      caplog: pytest.LogCaptureFixture,
  ):
      # Test end-to-end main function execution
      main(config)

      # Verify expected log messages
      assert caplog.records[0].levelname == "INFO"
      assert "SDK Client initialized for tenant" in caplog.records[0].message
      assert "Input file path -" in caplog.records[1].message
      assert "-> Source information procured." in caplog.records[2].message
      assert "Total distinct groups in the input:" in caplog.records[3].message


  @pytest.mark.order(after="test_main")
  def test_after_main(client: AtlanClient, group: CreateGroupResponse):
      result = client.group.get_by_name(TEST_GROUP_NAME)
      assert result and len(result) == 1
      test_group = result[0]
      assert test_group.path
      assert test_group.name
      assert test_group.id == group.group
      assert test_group.attributes
      assert not test_group.attributes.description
      # Make sure users are successfully assigned
      # to the test group after running the workflow
      assert test_group.user_count and test_group.user_count >= 1

Writing tests for non-toolkit based scripts using Cursor AI code editor ๐Ÿค–

You can leverage AI code editors like Cursor to help with refactoring existing scripts and generating integration tests for the marketplace-csa-scripts repository. However, it's important to be aware of the potential issues and risks that may arise.

Step 1: Setup Cursor rules

To ensure the AI agent provides the desired results based on your prompts, you need to set up custom rules for your code editor.

  1. Create a rules file:

    • Create the file .cursor/rules/csa-scripts-tests.mdc in your project directory.
    • You can start by copying the example rule and modifying them to match your needs.
  2. Refine rules over time:

    • As you use AI for refactoring and generating tests, you can refine the rules. By adding more context (e.g: multiple packages and varied test patterns), the AI will become more effective over time, improving its results.

Step 2: Running the agent with the defined Rules

To run the AI agent with the defined rules, follow these steps:

  1. Open the cursor chat:

    • Press cmd + L to open a new chat in the Cursor IDE.
    • Click on Add Context, then select csa-scripts-tests.mdc to load the rules you defined.
  2. Provide a clear prompt:

    • After loading the rules, provide a clear prompt like the following to refactor your script and add integration tests:
      Refactor `scripts/asset-change-notification/main.py` using the latest Cursor rules and add integration tests in `scripts/asset_change_notification/tests/test_main.py` to ensure functionality and coverage.
      
  3. Review results:

    • Once the AI completes the task, review the generated results carefully. You may need to accept or reject parts of the refactoring based on your preferences and quality standards.

Common Issues

  • Low accuracy across models: AI results can be highly inconsistent, even after experimenting with different combinations of rules and prompts. In many cases, only a small fraction of attempts yield satisfactory results.

  • Inconsistent output: Regardless of using detailed or minimal rules, and trying various AI models (Claude 3.7, Sonnet 3.5, Gemini, OpenAI), the output often lacks consistency, leading to unsatisfactory refactorings.

Risks in refactoring

  • Code deletion: AI can unintentionally remove important parts of the original code during refactoring.

  • Unnecessary code addition: AI might add code that changes the behavior of the script, potentially introducing bugs.

  • Flaky or insufficient tests: Generated tests are often overly simplistic or unreliable. AI may also mock components that should not be mocked, leading to incomplete test coverage.

Mocking / Patching third party HTTP interactions ๐Ÿ”Œ

When do you need this?

This approach is essential when building connectors or utility packages that interact with external systems, such as:

  • Fetching data from third-party APIs
  • Integrating with external databases
  • Calling web services that require authentication

The problem with real API calls in tests

โŒ Challenges with direct API testing: - Requires credentials and environment configurations - Difficult to integrate into automated test suites - Slow execution times, especially in CI/CD pipelines - Hard to maintain as more integrations are added - External service availability can break tests

The solution: VCR (Video Cassette Recorder)

โœ… Benefits of using VCR: - Record real API interactions once during development - Replay saved responses in tests without network calls - Fast, reliable, and reproducible tests - Works offline and in CI environments

The vcrpy library captures and saves HTTP interactions in files called "cassettes" during development.

How VCR works

The workflow:

  1. Record โ†’ Run tests once with real API calls to record interactions
  2. Save โ†’ Store responses in local "cassette" files (YAML or JSON)
  3. Replay โ†’ Future test runs use saved responses instead of real HTTP requests
  4. Customize โ†’ Optionally modify saved responses to simulate different scenarios

The benefits:

  • ๐Ÿš€ Faster tests - No network latency
  • ๐Ÿ”’ Reliable - No dependency on external service availability
  • ๐Ÿ”„ Reproducible - Same responses every time
  • ๐Ÿ› ๏ธ Configurable - Easy to simulate edge cases and error conditions

Hybrid approach

VCR sits between integration and unit tests โ€” it uses real API behavior but avoids needing a live environment every time. This makes tests easier to maintain, faster to run, and more configurable as your project grows.

Write VCR-based integration tests

6.0.6

For this example, we are using httpbin.org, which provides a simple and fast way to test vcrpy by recording HTTP request and response interactions.

Have you installed test dependencies?

Before writing tests, make sure you've installed the test dependencies in your local environment. You can do that by running the following command:

pip install -e ".[test]"

Alternatively, you can explicitly install the required packages by creating a requirements-test.txt file and installing them using:

requirements-test.txt
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
pytest>=7.4.0
coverage>=7.6.1
# pytest plugins (optional but recommended) 
pytest-order>=1.3.0
pytest-sugar>=1.0.0
pytest-timer[termcolor]>=1.0.0
pytest-vcr~=1.0.2
# pinned vcrpy to v6.x since vcrpy>=7.0 requires urllib3>=2.0
# which breaks compatibility with Python 3.8
vcrpy~=6.0.2
tests/integration/test_http_bin.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
import pytest
import requests
import os
from pyatlan.test_utils.base_vcr import BaseVCR  # (1)


class TestHTTPBin(BaseVCR):
    """
    Integration tests to demonstrate VCR.py capabilities
    by recording and replaying HTTP interactions using
    HTTPBin (https://httpbin.org) for GET, POST, PUT, and DELETE requests.
    """

    BASE_URL = "https://httpbin.org"

    @pytest.fixture(scope="module")  # (2)
    def vcr_config(self):
        """
        Override the VCR configuration to use JSON serialization across the module.
        """
        config = self._BASE_CONFIG.copy()
        config.update({"serializer": "pretty-json"})
        return config

    @pytest.fixture(scope="module")
    def vcr_cassette_dir(self, request):  # (3)
        """
        Override the directory path for storing VCR cassettes.
        If a custom cassette directory is set in the class, it is used;
        otherwise, the default directory structure is created under "tests/cassettes".
        """
        return self._CASSETTES_DIR or os.path.join(
            "tests/vcr_cassettes", request.module.__name__
        )

    @pytest.mark.vcr()
    def test_httpbin_get(self): # (4)
        """
        Test a simple GET request to httpbin.
        """
        url = f"{self.BASE_URL}/get"
        response = requests.get(url, params={"test": "value"})

        assert response.status_code == 200
        assert response.json()["args"]["test"] == "value"

    @pytest.mark.vcr()
    def test_httpbin_post(self):
        """
        Test a simple POST request to httpbin.
        """
        url = f"{self.BASE_URL}/post"
        payload = {"name": "atlan", "type": "integration-test"}
        response = requests.post(url, json=payload)

        assert response.status_code == 200
        assert response.json()["json"] == payload

    @pytest.mark.vcr()
    def test_httpbin_put(self):
        """
        Test a simple PUT request to httpbin.
        """
        url = f"{self.BASE_URL}/put"
        payload = {"update": "value"}
        response = requests.put(url, json=payload)

        assert response.status_code == 200
        assert response.json()["json"] == payload

    @pytest.mark.vcr()
    def test_httpbin_delete(self):
        """
        Test a simple DELETE request to httpbin.
        """
        url = f"{self.BASE_URL}/delete"
        response = requests.delete(url)

        assert response.status_code == 200
        # HTTPBin returns an empty JSON object for DELETE
        assert response.json()["args"] == {}
  1. Start by importing the BaseVCR class from pyatlan.test_utils.base_vcr, which already includes base/default configurations for VCR-based tests, such as vcr_config, vcr_cassette_dir, and custom serializers like pretty-yaml (default for cassettes) and pretty-json (another cassette format).

  2. (Optional) To override any default vcr_config(), you can redefine the @pytest.fixture -> vcr_config() inside your test class. For example, you can update the serializer to use the custom pretty-json serializer.

  3. (Optional) To override the default cassette directory path, you can redefine the @pytest.fixture -> vcr_cassette_dir() inside your test class.

  4. When writing tests (e.g test_my_scenario), make sure to add the @pytest.mark.vcr() decorator to mark them as VCR test cases. For each test case, a separate cassette (HTTP recording) will created inside the tests/vcr_cassettes/ directory.

Once you run all the tests using:

pytest tests/integration/test_http_bin.py

Since this is the first time running them, vcrpy will record all the HTTP interactions automatically and save them into the tests/vcr_cassettes/ directory โœ…

For example, here's a saved cassette for the TestHTTPBin.test_httpbin_post test:

tests/vcr_cassettes/tests.integration.test_http_bin/TestHTTPBin.test_httpbin_post.yaml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
interactions:
- request:
    body: |-
      {
        "name": "atlan",
        "type": "integration-test"
      }
    headers: {}
    method: POST
    uri: https://httpbin.org/post
  response:
    body:
      string: |-
        {
          "args": {},
          "data": "{\"name\": \"atlan\", \"type\": \"integration-test\"}",
          "files": {},
          "form": {},
          "headers": {
            "Accept": "*/*",
            "Accept-Encoding": "gzip, deflate",
            "Content-Length": "45",
            "Content-Type": "application/json",
            "Host": "httpbin.org",
            "User-Agent": "python-requests/2.32.3",
            "X-Amzn-Trace-Id": "Root=1-680f7290-276efa7f015f83d24d9fdfc4"
          },
          "json": {
            "name": "atlan",
            "type": "integration-test"
          },
          "origin": "x.x.x.x",
          "url": "https://httpbin.org/post"
        }
    headers: {}
    status:
      code: 200
      message: OK
version: 1
vcrpy not sufficient for your use case? ๐Ÿค”

There might be cases where VCR.py's recorded responses are not sufficient for your testing needs, even after applying custom configurations. In such scenarios, you can switch to using Python's built-in mock/patch object library for greater flexibility and control over external dependencies.

Containerizing marketplace scripts ๐Ÿ“ฆ

Overview

When your script is ready for production deployment, you'll need to create package-specific Docker images for reliable and consistent execution across different environments.

Why containerize? - โœ… Consistent execution across all environments - โœ… Proper versioning and rollback capability - โœ… Isolated dependencies prevent conflicts - โœ… Automated deployment via CI/CD pipelines

containerization

Prerequisites

Complete these steps first

Before containerizing your script, ensure you have:

  • โœ… Completed script refactoring from the Writing tests for non-toolkit based scripts section
  • โœ… Working integration tests that validate your script functionality
  • โœ… Script directory renamed to snake_case format (if applicable)

Required files for containerization

File checklist

For each package script (e.g scripts/designation_based_group_provisioning/), you need to create 5 essential files:

  • ๐Ÿ“ version.txt - Semantic versioning
  • ๐Ÿณ Dockerfile - Container image definition
  • ๐Ÿ“ฆ requirements.txt - Package dependencies
  • ๐Ÿงช requirements-test.txt - Testing dependencies
  • ๐Ÿ”’ Vulnerability scan (using snyk CLI)

Let's create each file step by step:

1. version.txt - semantic versioning

Create a version file to track your package releases:

version.txt
1
1.0.0dev

Semantic versioning guidelines

You should use .dev suffix for development โฌ†

Follow semantic versioning principles:

  • MAJOR version: incompatible API changes
  • MINOR version: backwards-compatible functionality additions
  • PATCH version: backwards-compatible bug fixes

2. Dockerfile - package-specific image

Create a production-ready Docker image for your script:

Dockerfile
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# Use the latest pyatlan-wolfi-base image
FROM ghcr.io/atlanhq/pyatlan-wolfi-base:8.0.1-3.13

# Build arguments
ARG PKG_DIR
ARG APP_DIR=/app/designation_based_group_provisioning

# Container metadata
LABEL org.opencontainers.image.vendor="Atlan Pte. Ltd." \
      org.opencontainers.image.source="https://github.com/atlanhq/marketplace-csa-scripts" \
      org.opencontainers.image.description="Atlan image for designation_based_group_provisioning custom package." \
      org.opencontainers.image.licenses="Apache-2.0"

# Switch to root for package installation
USER root

# Copy and install package requirements
COPY ${PKG_DIR}/requirements.txt requirements.txt

# Install additional requirements system-wide with caching
RUN --mount=type=cache,target=/root/.cache/uv \
    uv pip install --system -r requirements.txt && \
    rm requirements.txt

# Copy application code and utilities
COPY ${PKG_DIR} ${APP_DIR}/
COPY utils /app/scripts/utils/

# Switch back to nonroot user for security
USER nonroot

# Set working directory
WORKDIR /app

About pyatlan-wolfi-base

Use pyatlan-wolfi-base images for package scripts. The image is built on top of Chainguard Wolfi image with pyatlan. We use it because it is a vulnerability-free open source image and this image will auto-publish to ghcr on every pyatlan release (see image tag contains suffix e.g: 8.0.1-3.13 -> pyatlan_version-python-version). If you want to use a custom pyatlan-wolfi-base for development (with different pyatlan version, pyatlan branch or python version) you can also do this by manually triggering the GH workflow. Following are the inputs for that workflow:

pyatlan-wolfi-base

  1. Navigate to Build Pyatlan Wolfi Base Image workflow
  2. Click "Run workflow" and provide the following inputs:
Input Description Example Required
Branch Use workflow from main โœ…
Build type Build type (dev uses amd64 only, release uses amd64+arm64) dev โœ…
Python version Python version (leave empty for 3.13) 3.11 โŒ
Pyatlan version Published pyatlan version (pull from PyPI, (leave empty to use version.txt ie: latest) 7.2.0 โŒ
Pyatlan git branch Pyatlan git branch (overrides version - installs from git://github.com/atlanhq/atlan-python.git@branch) APP-1234 โŒ

3. requirements.txt - package dependencies

Generate your package dependencies using pipreqs and include required OTEL logging dependencies:

requirements.txt
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Package-specific dependencies
# Generated via: pipreqs /path/to/pkg --force
pyatlan>=8.0.0
pandas>=2.0.0
# Add your specific dependencies here...

# Required for OpenTelemetry logging
opentelemetry-api~=1.29.0
opentelemetry-sdk~=1.29.0
opentelemetry-instrumentation-logging~=0.50b0
opentelemetry-exporter-otlp~=1.29.0

Generating requirements automatically

Use pipreqs to automatically detect and generate your package dependencies:

```bash # Install pipreqs if not already installed pip install pipreqs

# Generate requirements for your package pipreqs /path/to/your/package --force

# Example for a specific script pipreqs scripts/designation_based_group_provisioning --force ```

4. requirements-test.txt - testing dependencies

Create testing-specific dependencies for CI/CD and local development:

requirements-test.txt
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Minimal required for testing
coverage~=7.6.1
pytest>=7.4.0
pytest-order~=1.3.0
pytest-timer[termcolor]~=1.0.0
pytest-sugar~=1.0.0

# Add VCR support if using HTTP mocking
pytest-vcr~=1.0.2
vcrpy~=6.0.2

5. Run snyk vulnerability scan:

We also recommend running a snyk vulnerability scan on your requirements so that any issues can be fixed before doing a GA release.

Step-by-step security scanning:

  1. Authenticate with Snyk CLI:

    snyk auth
    
    Follow the prompts to login via SSO and grant app access

  2. Scan project dependencies:

    # Ensure your virtual environment is active and dependencies are installed
    snyk test
    

  3. Scan Docker image (optional):

    # After building your Docker image locally
    snyk container test ghcr.io/atlanhq/designation_based_group_provisioning:1.0.0dev-0d35a91 --file=Dockerfile
    

  4. Create exceptions policy (if needed):

If there are vulnerabilities that don't impact your project, create a .snyk policy file:

# designation_based_group_provisioning/.snyk
# Snyk (https://snyk.io) policy file, patches or ignores known issues.
version: v1.0.0
# ignores vulnerabilities until expiry date; change duration by modifying expiry date
ignore:
  'snyk:lic:pip:certifi:MPL-2.0':
    - '*':
        reason: 'MPL-2.0 license is acceptable for this project - certifi is a widely used certificate bundle'

Development workflow

Testing your containerized package

Use the Build Package Test Image workflow for rapid development and testing:

Steps:

  1. Navigate to the workflow: Go to Build Package Test Image

  2. Trigger the build: Click "Run workflow" and provide the required inputs:

Input Description Example Required
Branch Select your development branch from dropdown APP-001-containerize-dbgp โœ…
Package Directory Name of the package directory designation_based_group_provisioning โœ…
Package Name Image name (defaults to kebab-case of directory) designation-based-group-provisioning โŒ
Version Tag Custom version tag (defaults to version.txt-GITHASH) 1.0.0-dev โŒ

The workflow will build a dev image with tag format:

ghcr.io/atlanhq/designation-based-group-provisioning:1.0.0-dev-8799072

Benefits of development testing

๐Ÿš€ Rapid iteration - Test containerized changes without affecting production

๐Ÿ”„ Environment consistency - Same container environment as production

โœ… Integration validation - Verify your script works in containerized context

Production release workflow

Step 1: Prepare for GA release

Before creating your pull request:

  1. Update version.txt: Ensure the version reflects your changes (final GA version)

    version.txt
    1.0.0
    

  2. Update HISTORY.md: Document all changes in this release

    HISTORY.md
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    ## 1.0.0 (July 1, 2025)
    
    ### Features
    ...
    ### Bug Fixes
    ...
    ### Breaking Changes
    ...
    ### QOL Improvements
    
    - Migrated package to build specific docker image.
    

  3. Verify integration tests: Ensure all tests pass locally

    pytest tests/ -s
    

    OR run tests with coverage:

    coverage run -m pytest tests && coverage report 
    

Step 2: Create pull request

  1. Create PR with your containerization changes:

    • Include all required files (Dockerfile, version.txt, requirements.txt, etc.)
    • Add or update integration tests following the testing guidelines
    • Update documentation if needed
  2. PR validation: The automated CI pipeline will:

    • Run unit and integration tests
    • Validate Docker build process
    • Check code quality and coverage
    • Verify all required files are present

Integration tests required

If your package doesn't have integration tests, this is the perfect time to add them following the testing toolkit guidelines. The CI pipeline expects comprehensive test coverage for production releases.

Step 3: Merge and deploy

  1. Review and approval: Get your PR reviewed and approved
  2. Merge to main: Once merged, this automatically triggers:
  3. GA image build: Creates production image with semantic version tag
  4. Registry publication: Publishes to GitHub Container Registry
  5. Deployment preparation: Image becomes available for Argo template updates

  6. Final GA image: Your production image will be tagged as:

    ghcr.io/atlanhq/designation-based-group-provisioning:1.0.0
    

Step 4: Update Argo templates

After the GA image is built, you need to update your package's Argo workflow template to use the new containerized image. This involves two main changes:

  1. Remove the git repository artifact (scripts are now embedded in the Docker image)
  2. Update the container configuration to use the new image and module path

Example PR: marketplace-packages/pull/18043

Key changes required:
Remove scripts repository pull
inputs:
  artifacts:
-   - name: scripts
-     path: "/tmp/marketplace-csa-scripts"
-     git:
-       repo: git@github.com:atlanhq/marketplace-csa-scripts
-       insecureIgnoreHostKey: true
-       singleBranch: true
-       branch: "main"
-       revision: "main"
-       sshPrivateKeySecret:
-         name: "git-ssh"
-         key: "private-key"
    - name: config
      path: "/tmp/config"
      # ... other artifacts remain unchanged
Update container image and module path
container:
+ image: ghcr.io/atlanhq/designation-based-group-provisioning:1.0.0
  imagePullPolicy: IfNotPresent
  env:
    - name: OAUTHLIB_INSECURE_TRANSPORT
      value: "1"
    # ... other env vars remain unchanged
- workingDir: "/tmp/marketplace-csa-scripts"
  command: [ "python" ]
  args:
    - "-m"
-   - "scripts.designation_based_group_provisioning.main"
+   - "designation_based_group_provisioning.main"
Why these changes are needed:
  • No more git clone: Scripts are now embedded in the Docker image, eliminating the need to clone the repository at runtime
  • Simplified module path: Direct import from the package directory instead of the nested scripts. path
  • Cleaner execution: Container starts directly in the appropriate working directory (/app)
  • Better security: No SSH keys needed for git access during workflow execution

Once merged, this will automatically deploy your containerized script across all Atlan tenants via the atlan-update workflow ๐ŸŽ‰

Production deployment complete

Your script is now fully containerized and ready for production deployment across all Atlan tenants with:

  • โœ… Consistent execution environment
  • โœ… Proper versioning and rollback capability
  • โœ… Comprehensive testing coverage
  • โœ… Automated CI/CD pipeline integration

Best practices for containerized scripts

Development practices

Practice Description
๐Ÿ“ Version management Always update version.txt before creating PRs
๐Ÿ”’ Dependency pinning Use specific version ranges in requirements.txt for stability
๐Ÿงช Comprehensive testing Ensure integration tests cover containerized execution paths
๐Ÿ“š Documentation Keep HISTORY.md updated with meaningful change descriptions

Security considerations

Security Area Best Practice
๐Ÿ”„ Base image updates Regularly update your base Python image for security patches
๐Ÿ” Dependency scanning Monitor for security vulnerabilities in your dependencies
๐Ÿ” Secret management Never hardcode secrets in Docker images - use environment variables
๐Ÿ” Image scanning Enable container scanning in your CI/CD pipeline