Skip to content

CI/CD Pipeline Guide

datason uses a modern, multi-pipeline CI/CD architecture designed for speed, clarity, and parallel execution. This guide explains our complete CI/CD setup.

๐Ÿ—๏ธ Architecture Overview

graph TD
    A[Git Push/PR] --> B{Change Type?}
    B -->|Any Code| C[๐Ÿ” Quality Pipeline]
    B -->|Non-docs| D[๐Ÿงช Main CI Pipeline]
    B -->|Docs Only| E[๐Ÿ“š Docs Pipeline]
    B -->|Release Tag| F[๐Ÿ“ฆ Publish Pipeline]

    C --> G[โœ… Code Quality & Security]
    D --> H[โœ… Tests & Build]
    E --> I[โœ… Documentation]
    F --> J[โœ… PyPI Release]

๐Ÿ”„ Pipeline Details

๐Ÿงช Main CI Pipeline (ci.yml)

Triggers: - Push to main, develop - Pull requests to main - Excludes: docs-only changes

What it does:

jobs:
  test:
    - ๐Ÿ“ฅ Checkout code
    - ๐Ÿ Setup Python 3.11
    - ๐Ÿ’พ Cache pip dependencies
    - ๐Ÿ“ฆ Install dev dependencies
    - ๐Ÿงช Run core tests with plugin matrix:
        โ€ข minimal: tests/unit/ (no optional deps)
โ€ข with-numpy: tests/unit/ + ML features (numpy)
โ€ข with-pandas: tests/unit/ + data features (pandas)
โ€ข with-ml-deps: tests/unit/ + tests/integration/ (full ML stack)
โ€ข full: tests/unit/ + tests/edge_cases/ + tests/integration/
    - ๐Ÿ“Š Upload coverage to Codecov
    - ๐Ÿ”’ Security scan (bandit)
    - ๐Ÿ“ค Upload security report

  build:
    - ๐Ÿ—๏ธ Build package (wheel + sdist)
    - โœ… Check package integrity
    - ๐Ÿ“ค Upload build artifacts

Performance: ~2-3 minutes with caching

๐Ÿ” Code Quality & Security Pipeline (ruff.yml)

Triggers: - All pushes to main, develop - All pull requests to main, develop

What it does:

jobs:
  quality-and-security:
    - ๐Ÿ“ฅ Checkout code
    - ๐Ÿ Setup Python 3.11
    - ๐Ÿ’พ Cache pip dependencies (quality)
    - ๐Ÿ› ๏ธ Install ruff + bandit
    - ๐Ÿงน Run ruff linter
    - ๐ŸŽจ Run ruff formatter check
    - ๐Ÿ›ก๏ธ Run bandit security scan
    - ๐Ÿ“Š Generate quality report

Performance: ~30-60 seconds (15-30s with cache)

๐Ÿ“š Documentation Pipeline (docs.yml)

Triggers: - Changes to docs/**, mkdocs.yml, README.md

What it does:

jobs:
  build-docs:
    - ๐Ÿ“ฅ Checkout code
    - ๐Ÿ Setup Python 3.11
    - ๐Ÿ’พ Cache pip dependencies (docs)
    - ๐Ÿ’พ Cache MkDocs build
    - ๐Ÿ“ฆ Install docs dependencies
    - ๐Ÿ—๏ธ Build documentation
    - ๐Ÿ“ค Upload docs artifact

  deploy-github-pages:
    - ๐Ÿš€ Deploy to GitHub Pages (main only)

Performance: ~1-2 minutes (very fast with cache)

๐Ÿ“ฆ Publish Pipeline (publish.yml)

Triggers: - GitHub releases (automatic) - Manual workflow dispatch

What it does:

jobs:
  build:
    - ๐Ÿ—๏ธ Build package
    - โœ… Verify package integrity
    - ๐Ÿ“ค Upload build artifacts

  test-pypi:
    - ๐Ÿงช Publish to TestPyPI
    - โœ… Verify upload

  pypi:
    - ๐Ÿš€ Publish to PyPI (releases only)
    - ๐Ÿ“ข Create release summary

Performance: ~2-3 minutes

โšก Performance Optimizations

Intelligent Caching Strategy

Each pipeline has optimized caching:

# Main CI - General development cache
key: ${{ runner.os }}-pip-${{ hashFiles('**/pyproject.toml') }}

# Quality - Specialized for linting tools
key: ${{ runner.os }}-quality-pip-${{ hashFiles('**/pyproject.toml') }}

# Docs - Documentation-specific cache + MkDocs
key: ${{ runner.os }}-docs-pip-${{ hashFiles('**/pyproject.toml') }}
key: ${{ runner.os }}-mkdocs-${{ hashFiles('mkdocs.yml') }}-${{ hashFiles('docs/**') }}

Cache Benefits: - First run: Full dependency installation - Subsequent runs: 2-5x faster execution - Cross-pipeline sharing: Quality cache falls back to main cache

Smart Triggering

Path-based triggers prevent unnecessary runs:

# Main CI skips docs-only changes
paths-ignore:
  - 'docs/**'
  - '*.md'
  - 'mkdocs.yml'

# Docs pipeline only runs for docs changes
paths: ['docs/**', 'mkdocs.yml', 'README.md']

Result: Docs changes don't trigger expensive test runs

๐ŸŽฏ Pipeline Responsibilities

Pipeline Purpose Speed When
Main CI Core functionality validation ~2-3 min Code changes
Quality Code quality & security ~30-60s All changes
Docs Documentation generation ~1-2 min Docs changes
Performance Performance regression tracking (informational) ~3-5 min Performance changes, weekly
Publish Package distribution ~2-3 min Releases

๐Ÿ—๏ธ Test Structure & CI Matrix Alignment

New Organized Test Structure

Our test suite is organized into logical directories for optimal CI performance:

tests/
โ”œโ”€โ”€ core/           # Fast core functionality tests (~7 seconds)
โ”‚   โ”œโ”€โ”€ test_core.py                    # Basic serialization
โ”‚   โ”œโ”€โ”€ test_security.py                # Security features  
โ”‚   โ”œโ”€โ”€ test_circular_references.py     # Circular reference handling
โ”‚   โ”œโ”€โ”€ test_edge_cases.py              # Edge cases and error handling
โ”‚   โ”œโ”€โ”€ test_converters.py              # Type converters
โ”‚   โ”œโ”€โ”€ test_deserializers.py           # Deserialization functionality
โ”‚   โ””โ”€โ”€ test_dataframe_orientation_regression.py
โ”‚
โ”œโ”€โ”€ features/       # Feature-specific tests (~10-20 seconds)
โ”‚   โ”œโ”€โ”€ test_ml_serializers.py          # ML library integrations
โ”‚   โ”œโ”€โ”€ test_chunked_streaming.py       # Streaming/chunking features
โ”‚   โ”œโ”€โ”€ test_auto_detection_and_metadata.py  # Auto-detection
โ”‚   โ””โ”€โ”€ test_template_deserialization.py     # Template deserialization
โ”‚
โ”œโ”€โ”€ integration/    # Integration tests (~5-15 seconds)
โ”‚   โ”œโ”€โ”€ test_config_and_type_handlers.py     # Configuration integration
โ”‚   โ”œโ”€โ”€ test_optional_dependencies.py        # Dependency integrations
โ”‚   โ””โ”€โ”€ test_pickle_bridge.py                # Pickle bridge functionality
โ”‚
โ”œโ”€โ”€ coverage/       # Coverage boost tests (~10-30 seconds)
โ”‚   โ””โ”€โ”€ test_*_coverage_boost.py             # Targeted coverage improvement
โ”‚
โ””โ”€โ”€ benchmarks/     # Performance tests (separate pipeline, ~60-120 seconds)
    โ””โ”€โ”€ test_*_benchmarks.py                 # Performance measurements

CI Test Matrix Mapping

CI Job Test Directories Dependencies Purpose Speed
minimal tests/unit/ None Core functionality only ~7s
with-numpy tests/unit/ + ML features numpy Basic array support ~15s
with-pandas tests/unit/ + data features pandas DataFrame support ~25s
with-ml-deps tests/unit/ + tests/integration/ numpy, pandas, sklearn Full ML stack ~45s
full tests/unit/ + tests/edge_cases/ + tests/integration/ All dependencies Complete test suite ~60s
Performance tests/benchmarks/ All dependencies Performance tracking ~120s (separate)

Plugin Testing Strategy

Our CI implements plugin-style testing where the core package has zero required dependencies but gains functionality when optional dependencies are available:

  • Core Tests (tests/unit/): Always run, no optional dependencies
  • Feature Tests (tests/features/): Run only when relevant dependencies are available
  • Integration Tests (tests/integration/): Test cross-component functionality
  • Coverage Tests (tests/coverage/): Improve coverage metrics for edge cases

This ensures: - โœ… Zero dependency install works: pip install datason - โœ… Enhanced features work: pip install datason[ml] - โœ… Cross-platform compatibility: Tests run on multiple Python versions - โœ… Fast feedback: Core tests complete in ~7 seconds

๐Ÿ” Quality Gates

Automatic Checks

Every commit is validated by:

โœ… Ruff linting (1000+ rules)
โœ… Ruff formatting (consistent style)
โœ… Bandit security scanning
โœ… Test suite with coverage
โœ… Type checking (mypy)
โœ… Package build integrity

Required Checks

For PR merging, these must pass: - [ ] Code Quality & Security pipeline - [ ] Main CI pipeline (if code changed) - [ ] Documentation pipeline (if docs changed)

Optional Checks

Non-blocking but monitored: - Coverage reports (Codecov) - Security reports (GitHub Security tab) - Performance benchmarks

๐Ÿ“Š Monitoring & Reports

GitHub Actions Dashboard

  • โœ… Green: All checks passing
  • โŒ Red: Issues found, PR blocked
  • ๐ŸŸก Yellow: In progress

Detailed Reports

Quality Pipeline generates rich reports:

## Code Quality & Security Report

### ๐Ÿ” Ruff Linting
โœ… No issues found

### ๐ŸŽจ Code Formatting  
โœ… All files properly formatted

### ๐Ÿ›ก๏ธ Security Scan Results
**Scanned**: 2,547 lines of code
**High Severity**: 0
**Medium Severity**: 0  
**Low Severity**: 1
โœ… **Security Status**: PASSED

Artifacts Available

  • ๐Ÿ“Š Coverage reports (HTML, XML)
  • ๐Ÿ›ก๏ธ Security reports (JSON)
  • ๐Ÿ“ฆ Build artifacts (wheels, sdist)
  • ๐Ÿ“š Documentation (static site)

๐Ÿš€ Developer Workflow

Local Development

# Pre-commit handles local quality
git commit โ†’ pre-commit runs โ†’ quality checks pass โ†’ commit succeeds

# Manual quality check
ruff check --fix .
ruff format .
pytest --cov=datason

Push to GitHub

git push origin feature-branch

What happens: 1. Quality Pipeline runs immediately (~30s) 2. Main CI runs in parallel (~2-3min) 3. PR Status Checks update in real-time

Documentation Changes

# Edit docs/something.md
git push origin docs-update

What happens: 1. Docs Pipeline runs (~1-2min) 2. Quality Pipeline runs (for any .md files) 3. Main CI skipped (docs-only change)

Release Process

# Create GitHub release
git tag v1.0.0
git push origin v1.0.0
# Create release in GitHub UI

What happens: 1. Publish Pipeline triggers automatically 2. Package built and verified 3. Published to PyPI with OIDC (secure, no tokens)

๐Ÿ› ๏ธ Configuration Files

Workflow Locations

.github/workflows/
โ”œโ”€โ”€ ci.yml              # Main CI pipeline
โ”œโ”€โ”€ ruff.yml            # Code quality & security
โ”œโ”€โ”€ docs.yml            # Documentation
โ””โ”€โ”€ publish.yml         # PyPI publishing

Key Configuration

# pyproject.toml - Tool configuration
[tool.ruff]           # Linting rules
[tool.pytest]         # Test configuration  
[tool.coverage]       # Coverage requirements
[tool.bandit]         # Security scanning

# .pre-commit-config.yaml - Local hooks
repos:
  - ruff              # Local quality checks
  - bandit            # Local security

๐Ÿ”ง Customization

Adding New Checks

To Quality Pipeline:

- name: New Quality Check
  run: |
    new-tool check datason/

To Main CI:

- name: New Test Type
  run: |
    pytest tests/test_new_feature.py

Performance Tuning

Cache Optimization:

# Add new cache for expensive tool
- name: Cache expensive-tool
  uses: actions/cache@v4
  with:
    path: ~/.cache/expensive-tool
    key: ${{ runner.os }}-expensive-${{ hashFiles('config') }}

Parallel Jobs:

strategy:
  matrix:
    python-version: [3.8, 3.9, 3.10, 3.11, 3.12, 3.13]

๐ŸŽฏ Best Practices

Pipeline Design

  • โœ… Fast feedback - Quality checks run first (30s)
  • โœ… Parallel execution - Independent pipelines don't block each other
  • โœ… Intelligent triggers - Only run what's needed
  • โœ… Comprehensive caching - 2-5x speedup on repeated runs

Security

  • โœ… OIDC Publishing - No API tokens needed
  • โœ… Multi-tool scanning - Bandit + Safety + pip-audit
  • โœ… Artifact signing - GPG-signed commits
  • โœ… Dependency monitoring - Dependabot + security advisories

Developer Experience

  • โœ… Clear status - GitHub status checks show exactly what failed
  • โœ… Rich reports - Detailed summaries in GitHub UI
  • โœ… Local consistency - pre-commit matches CI exactly
  • โœ… Fast iteration - Quick feedback for common issues

๐Ÿ“ˆ Metrics & Performance

Pipeline Performance

Pipeline Cached Uncached Frequency
Quality ~30s ~60s Every push/PR
Main CI ~2min ~3min Most pushes/PRs
Docs ~1min ~2min Docs changes only
Publish ~3min ~3min Releases only

Success Rates

  • Quality Pipeline: 95%+ pass rate (fast feedback catches most issues)
  • Main CI: 90%+ pass rate (comprehensive testing)
  • Docs: 99%+ pass rate (simple build process)

๐Ÿ“Š Performance Pipeline Strategy

Why Performance Tests Don't Block CI:

The performance pipeline is informational only and doesn't block CI for these important reasons:

  1. Environment Variability: GitHub Actions runners have inconsistent performance
  2. Micro-benchmark Noise: Tests measuring <1ms are highly sensitive to environment
  3. Hardware Differences: Local vs CI environments produce different baseline measurements
  4. False Positive Prevention: Avoid blocking legitimate code changes due to infrastructure noise

Environment-Aware Thresholds:

Environment Threshold Reasoning
Local Development 5% Stable environment, consistent hardware
CI (Same Environment) 25% Account for runner variability
CI (Cross Environment) 25%+ Local baseline vs CI execution

How to Interpret Performance Results:

โœ… Safe to Ignore: - Changes <50% (likely environment noise) - First run after environment change - Micro-benchmarks showing high variance

โš ๏ธ Worth Investigating: - Consistent patterns across multiple tests - Changes >100% without code explanation - Memory usage regressions

๐Ÿ”ฅ Action Required: - Changes >200% with clear code correlation - New algorithmic complexity introduced - Memory leaks or resource issues

Performance Baseline Management:

# Create new CI baseline (if needed)
gh workflow run performance.yml -f save_baseline=true

# Local performance testing
cd benchmarks
python ci_performance_tracker.py

# Test with different threshold
PERFORMANCE_REGRESSION_THRESHOLD=15 python ci_performance_tracker.py

๐Ÿ” Troubleshooting

Common Issues

Quality Pipeline Fails:

# Local fix
ruff check --fix .
ruff format .
git commit --amend

Main CI Test Failures:

# Local debugging
pytest tests/test_failing.py -v
pytest --cov=datason --cov-report=html
# Open htmlcov/index.html

Cache Issues:

# Clear cache via GitHub Actions UI
# Or update cache key in workflow

Performance Issues

Slow Pipeline: - Check cache hit rates in Actions logs - Verify cache keys are stable - Consider splitting large jobs

Resource Limits: - Use timeout-minutes for runaway processes - Monitor memory usage in logs - Consider matrix builds for heavy testing


This architecture provides: - ๐Ÿš€ Fast feedback (30s for quality issues) - ๐Ÿ”„ Parallel execution (quality + tests simultaneously)
- ๐Ÿ’พ Intelligent caching (2-5x speedup) - ๐ŸŽฏ Smart triggering (only run what's needed) - ๐Ÿ“Š Rich reporting (detailed GitHub summaries) - ๐Ÿ”’ Strong security (multi-tool scanning + OIDC)

The result is a production-ready CI/CD pipeline that scales with your team while maintaining developer velocity! ๐ŸŽ‰