Skip to content

CI Performance & Caching Strategy

๐Ÿš€ Optimization Goals

Our CI pipeline is optimized for: - โšก Speed: Parallel execution with smart caching - ๐Ÿ’ฐ Cost Efficiency: Minimize GitHub Actions minutes - ๐Ÿ”„ Reliability: Robust dependency management - ๐Ÿงช Comprehensive Testing: Multiple plugin scenarios

๐Ÿ“Š Performance Metrics

Before Optimization (Simple CI)

Single Job: ~8-12 minutes
- Install all dependencies: ~3-4 min
- Run all tests: ~4-6 min  
- Build & upload: ~1-2 min
Total: 8-12 minutes per run

After Optimization (Matrix + Caching)

Cache Warm Job: ~2-3 minutes (first run)
                ~30 seconds (subsequent runs)

5 Parallel Test Jobs: ~3-5 minutes each
- Restore cache: ~10-20 seconds
- Install specific deps: ~30-60 seconds  
- Run tests: ~2-3 minutes
- Upload results: ~10-20 seconds

Quality Job: ~2-3 minutes (parallel)
Build Job: ~1-2 minutes (parallel)

Total Pipeline: ~5-7 minutes (vs 8-12 minutes)
Speed Up: ~40-50% faster

๐Ÿ”ง Caching Strategy

3-Layer Cache System

1. Base Dependencies Cache

Key: base-ubuntu-py3.11-{pyproject.toml-hash}
Path: ~/.cache/pip + ~/.local/lib/python3.11/site-packages
Contains: pip, setuptools, wheel, pytest, pytest-cov, build, twine, bandit
Shared by: All jobs

2. Dependency-Specific Cache

Key: deps-ubuntu-{scenario}-{pyproject.toml-hash}
Path: ~/.cache/pip
Contains: numpy, pandas, sklearn, etc.
Per scenario: minimal, numpy, pandas, ml, dev

3. Virtual Environment Cache (Local)

# For local testing script
Path: ./test-environments/venv-{scenario}/
Cached: Entire virtual environment per scenario
Marker files: .pip-upgraded, .numpy-installed, etc.

๐Ÿƒโ€โ™‚๏ธ Parallel Execution Strategy

Job Dependencies Graph

graph TD
    A[cache-base] --> B[test: minimal]
    A --> C[test: with-numpy]
    A --> D[test: with-pandas]
    A --> E[test: with-ml-deps]
    A --> F[test: full]
    A --> G[quality]
    A --> H[build]

    B --> I[test-summary]
    C --> I
    D --> I
    E --> I
    F --> I
    G --> I
    H --> I

Execution Flow

  1. cache-base runs first (2-3 min)
  2. All other jobs start simultaneously after cache is ready
  3. test-summary waits for all jobs to complete

Parallel Configuration

strategy:
  fail-fast: false          # Don't stop on first failure
  max-parallel: 5           # Run up to 5 jobs simultaneously

๐Ÿ“ˆ Cache Hit Ratios

Expected Cache Performance

Cache Type First Run Subsequent Runs Savings
Base deps Miss (3 min) Hit (20 sec) ~85%
Specific deps Miss (1-2 min) Hit (10 sec) ~90%
pip cache Miss (varies) Hit (varies) ~60-80%

Cache Invalidation Triggers

  • pyproject.toml changes โ†’ All caches invalidate
  • New dependencies โ†’ Specific cache invalidates
  • Python version change โ†’ All caches invalidate
  • Weekly automatic cleanup โ†’ Base cache invalidates

๐ŸŽฏ Resource Optimization

GitHub Actions Minutes Usage

Before (Single Job):

Per run: 8-12 minutes ร— 1 job = 8-12 minutes
Per day (10 runs): ~100-120 minutes
Per month: ~3000-3600 minutes

After (Optimized Matrix):

Per run:
- cache-base: 2-3 min
- 5 test jobs: 3-5 min each (parallel)
- quality: 2-3 min (parallel)  
- build: 1-2 min (parallel)
- summary: 30 sec
Total: 5-7 minutes (wall clock)
Total minutes used: ~15-20 minutes (sum of all jobs)

But saves time with:
- Smart caching (faster subsequent runs)
- Parallel execution (faster wall clock time)
- Early failure detection (stop fast if build fails)

Cost Analysis

Minute Cost Trade-off:
+ More jobs running in parallel = more minutes used
+ But much faster feedback to developers  
+ Better caching = faster subsequent runs
+ Comprehensive testing = fewer bugs

Net Result: ~20-30% more minutes used, but 40-50% faster feedback

๐Ÿ” Monitoring & Debugging

Performance Monitoring

# Add to each job for timing analysis
- name: ๐Ÿ“Š Job timing
  run: |
    echo "Job started: $(date)"
    echo "Job completed: $(date)"

Cache Debugging

- name: ๐Ÿ” Cache status
  run: |
    echo "Cache hit: ${{ steps.cache.outputs.cache-hit }}"
    ls -la ~/.cache/pip/
    du -sh ~/.cache/pip/

Dependency Debugging

- name: ๐Ÿ“‹ Show installed packages
  run: pip list

๐Ÿ› ๏ธ Local Testing

Run Plugin Matrix Locally

# Test all scenarios with caching
./scripts/test-plugin-matrix.sh

# Clean up test environments
rm -rf ./test-environments/

# Test specific scenario
python -m venv test-minimal
source test-minimal/bin/activate
pip install -e .
pytest tests/test_core.py -v
deactivate

โš™๏ธ Fine-tuning Options

For Faster Development

# Skip heavy tests on draft PRs
if: github.event.pull_request.draft == false

# Run only changed test files
uses: dorny/paths-filter@v2
with:
  filters: |
    tests:
      - 'tests/**'
      - 'datason/**'

For Resource Conservation

# Reduce matrix size for PRs
strategy:
  matrix:
    dependency-set:
      - minimal  # Only test core on PRs
      - full     # Test full on push to main

For Speed Priority

# Increase parallelism
max-parallel: 10  # Use more runners

# Skip slow tests in PR
pytest -m "not slow" tests/

๐ŸŽ‰ Benefits Summary

โœ… 40-50% faster pipeline execution
โœ… Parallel test execution
โœ… Smart multi-layer caching
โœ… Comprehensive plugin testing
โœ… Early failure detection
โœ… Local testing scripts
โœ… Detailed performance monitoring

This optimized CI ensures our plugin architecture is thoroughly tested while keeping development velocity high!