Skip to content

Modern Python Tooling Guide

Overview

datason uses best-in-class modern Python tooling for development, testing, security, and documentation. This guide explains our choices and how to use them effectively.

๐Ÿš€ Tool Stack Summary

Category Tool Replaces Why Better
Linting & Formatting Ruff Black + isort + flake8 + pyupgrade 10-100x faster, single tool
Type Checking mypy - Industry standard, strict typing
Testing pytest + coverage unittest Better DX, plugins, fixtures
Security Bandit + Safety + pip-audit Manual reviews Automated vulnerability detection
Documentation MkDocs Material Sphinx Modern, beautiful, fast
Dependency Management pip-tools + Dependabot Manual updates Automated, deterministic
Git Hooks pre-commit Manual checks Automatic quality gates
Performance pytest-benchmark Manual timing Statistical benchmarking

๐Ÿ”ง Core Tooling Decisions

โšก Ruff (Linting & Formatting)

Why Ruff over Black + isort + flake8?

# Old approach (slow, multiple tools)
black datason/ tests/     # ~2 seconds
isort datason/ tests/     # ~1 second  
flake8 datason/ tests/    # ~3 seconds
pyupgrade **/*.py           # ~2 seconds
# Total: ~8 seconds

# New approach (fast, single tool)
ruff check --fix datason/ tests/    # ~0.1 seconds
ruff format datason/ tests/          # ~0.1 seconds  
# Total: ~0.2 seconds (40x faster!)

Ruff Configuration:

[tool.ruff]
target-version = "py38"
line-length = 88

[tool.ruff.lint]
select = [
    "E", "W",    # pycodestyle
    "F",         # pyflakes  
    "I",         # isort
    "N",         # pep8-naming
    "D",         # pydocstyle
    "UP",        # pyupgrade
    "S",         # bandit security
    "B",         # bugbear
    "C4",        # comprehensions
    "SIM",       # simplify
    "RUF",       # ruff-specific
]

Usage:

# Check and fix issues
ruff check --fix .

# Format code  
ruff format .

# Check specific rule
ruff check --select E501 .  # Line too long

๐Ÿ”’ Security Tools

Multi-layered security approach:

  1. Bandit - Static code analysis for security issues
  2. Safety - Known vulnerability database checking
  3. pip-audit - Dependency vulnerability scanning
  4. Semgrep - Advanced pattern-based security scanning
# Run all security checks
bandit -c pyproject.toml -r datason/
safety check --json
pip-audit --format=json
semgrep --config=auto datason/

Common security issues caught: - Hardcoded passwords/secrets - SQL injection patterns - Use of eval() or exec() - Weak random number generation - Insecure hash functions

๐Ÿ“š Documentation with MkDocs Material

Why MkDocs over Sphinx?

# Modern, beautiful, fast
mkdocs serve    # Live reload dev server
mkdocs build    # Static site generation
mkdocs gh-deploy # Deploy to GitHub Pages

Features: - ๐ŸŽจ Beautiful UI - Material Design - ๐Ÿ” Search - Full-text search built-in - ๐Ÿ“ฑ Mobile - Responsive design - ๐ŸŒ™ Dark mode - Automatic theme switching - ๐Ÿ“Š Diagrams - Mermaid support - ๐Ÿ”— Cross-refs - Automatic API linking

๐Ÿงช Testing Excellence

Comprehensive testing setup:

# Run all tests with coverage
pytest --cov=datason --cov-report=html

# Parallel testing (faster)
pytest -n auto

# Performance benchmarks
pytest tests/test_performance.py --benchmark-only

# Specific test markers
pytest -m "not slow"        # Skip slow tests
pytest -m "ml"              # Only ML-related tests
pytest -m "integration"     # Integration tests only

Test markers configured:

# In pyproject.toml
markers = [
    "slow: marks tests as slow",
    "integration: marks tests as integration tests",
    "benchmark: marks tests as benchmarks",
    "ml: marks tests that require ML libraries",
]

๐Ÿ”„ Pre-commit Hooks

Automatic quality gates:

# .pre-commit-config.yaml highlights
repos:
  - repo: https://github.com/astral-sh/ruff-pre-commit
    hooks:
      - id: ruff           # Linting
      - id: ruff-format    # Formatting

  - repo: https://github.com/PyCQA/bandit
    hooks:
      - id: bandit         # Security scanning

Benefits: - โœ… Prevents bad commits - Catches issues before they reach CI - โœ… Consistent formatting - Everyone uses same style - โœ… Security checks - Vulnerabilities caught early - โœ… Fast feedback - Issues fixed immediately

๐Ÿ› ๏ธ Development Workflow

Initial Setup

# 1. Install development dependencies
pip install -e ".[dev]"

# 2. Install pre-commit hooks
pre-commit install

# 3. Run initial checks
pre-commit run --all-files

Daily Development

# Before committing (automatic via pre-commit)
ruff check --fix .          # Fix linting issues
ruff format .               # Format code
pytest                      # Run tests
mypy datason/            # Type checking

Release Process

# 1. Security audit
bandit -c pyproject.toml -r datason/
safety check
pip-audit

# 2. Full test suite
pytest --cov=datason --cov-report=term-missing

# 3. Documentation
mkdocs build

# 4. Build and publish
python -m build
twine upload dist/*

๐Ÿ“Š Performance Optimization

Benchmarking with pytest-benchmark

def test_serialization_performance(benchmark):
    """Benchmark serialization performance."""
    data = create_test_data()

    result = benchmark(datason.serialize, data)

    # Assertions on performance
    assert benchmark.stats.median < 0.01  # < 10ms

Profiling

# Profile specific functions
python -m cProfile -o profile.stats benchmark_real_performance.py

# Analyze with snakeviz
pip install snakeviz
snakeviz profile.stats

๐Ÿ”„ Dependency Management

pip-tools for Reproducible Builds

# Create requirements.txt from pyproject.toml
pip-compile pyproject.toml

# Update dependencies
pip-compile --upgrade pyproject.toml

# Install exact versions
pip-sync requirements.txt

Dependabot Configuration

# .github/dependabot.yml
version: 2
updates:
  - package-ecosystem: "pip"
    directory: "/"
    schedule:
      interval: "weekly"
    open-pull-requests-limit: 10

๐Ÿ“ˆ Monitoring & Metrics

Coverage Tracking

# Generate coverage report
pytest --cov=datason --cov-report=html

# Coverage thresholds in pyproject.toml
[tool.coverage.report]
fail_under = 80
show_missing = true

Performance Monitoring

# Benchmark regression testing
@pytest.mark.benchmark
def test_performance_regression(benchmark):
    """Ensure performance doesn't regress."""
    result = benchmark(serialize_large_dataset)
    # CI will fail if significantly slower than baseline

๐ŸŽฏ Best Practices Summary

Code Quality

  • โœ… Ruff for all linting/formatting (replaces 4 tools)
  • โœ… mypy with strict mode for type safety
  • โœ… Pre-commit hooks for automatic quality gates
  • โœ… 100% test coverage for critical paths

Security

  • โœ… Multi-tool scanning (Bandit, Safety, pip-audit)
  • โœ… Regular dependency updates via Dependabot
  • โœ… Secret detection in pre-commit
  • โœ… Security policy with responsible disclosure

Documentation

  • โœ… MkDocs Material for beautiful docs
  • โœ… API reference auto-generated from docstrings
  • โœ… Examples for all major features
  • โœ… Cross-references and search

Performance

  • โœ… Benchmark suite with statistical analysis
  • โœ… Performance regression testing in CI
  • โœ… Profiling tools for optimization
  • โœ… Real-world performance data in docs

๐Ÿš€ Upgrading from Legacy Tools

Migration Guide

From Black + isort + flake8 โ†’ Ruff:

# Remove old tools
pip uninstall black isort flake8

# Install Ruff
pip install ruff

# Update pre-commit
# Replace separate hooks with ruff hooks

# Run once
ruff check --fix .
ruff format .

Benefits: - ๐Ÿš€ 40x faster linting and formatting - ๐Ÿงน Single tool instead of multiple - ๐Ÿ”ง More rules - catches more issues - ๐Ÿ“ฆ Smaller dependency tree

๐ŸŽ‰ Results

This modern tooling setup delivers:

  • โšก 10x faster development workflow
  • ๐Ÿ›ก๏ธ Better security with automated scanning
  • ๐Ÿ“š Professional docs with zero maintenance
  • ๐Ÿงช Reliable testing with performance tracking
  • ๐Ÿค– Automated quality via pre-commit hooks

Developer Experience Score: 10/10 ๐ŸŽฏ

The investment in modern tooling pays off immediately in developer productivity and code quality!