Skip to content

Contributing to datason ๐Ÿค

Thank you for your interest in contributing to datason! This guide will help you understand our development process, coding standards, and how to make meaningful contributions.

๐ŸŽฏ Core Development Principles

datason follows three fundamental principles that guide all development decisions:

1. ๐Ÿชถ Minimal Dependencies

Philosophy: The core functionality should work without any external dependencies, with optional enhancements when libraries are available.

Guidelines: - โœ… Core serialization must work with just Python standard library - โœ… Import optional dependencies only when needed with graceful fallbacks - โœ… Use feature detection rather than forcing installations - โœ… Document which features require which dependencies

Example:

# Good: Graceful fallback
try:
    import pandas as pd
    HAS_PANDAS = True
except ImportError:
    pd = None
    HAS_PANDAS = False

def serialize_pandas_object(obj):
    if not HAS_PANDAS:
        return str(obj)  # Fallback to string representation
    # Full pandas logic here

2. โœ… Comprehensive Test Coverage

Target: Maintain 80%+ test coverage across all modules

Testing Strategy: - Edge cases first: Test error conditions, malformed data, boundary cases - Optional dependency matrix: Test with and without optional packages - Performance regression: Benchmark critical paths - Cross-platform: Test on multiple Python versions and platforms

Coverage Requirements: - ๐ŸŽฏ Overall target: 80-85% coverage - ๐Ÿ”ด Core modules: 85%+ required (core.py, converters.py)
- ๐ŸŸก Feature modules: 75%+ required (ml_serializers.py, datetime_utils.py) - ๐ŸŸข Utility modules: 70%+ acceptable (data_utils.py)

Testing Commands:

# Run all tests with coverage
pytest --cov=datason --cov-report=term-missing --cov-report=html

# Test core functionality only
pytest tests/test_core.py tests/test_converters.py tests/test_deserializers.py

# Test with optional dependencies
pytest tests/test_optional_dependencies.py tests/test_ml_serializers.py

# Performance benchmarks
pytest tests/test_performance.py

3. โšก Performance First

Performance Standards (based on real benchmarks): - ๐Ÿš€ Simple data: < 1ms for JSON-compatible objects (achieved: 0.6ms) - ๐Ÿš€ Complex data: < 5ms for 500 objects with UUIDs/datetimes (achieved: 2.1ms) - ๐Ÿš€ Large datasets: > 250K items/second throughput (achieved: 272K/sec) - ๐Ÿ’พ Memory: Linear memory usage, no memory leaks - ๐Ÿ”„ Round-trip: Complete cycle < 2ms (achieved: 1.4ms) - ๐Ÿ“Š NumPy: > 5M elements/second (achieved: 5.5M/sec) - ๐Ÿผ Pandas: > 150K rows/second (achieved: 195K/sec)

Optimization Guidelines: - Early exits for already-serialized data - Minimize object creation in hot paths - Use efficient algorithms for type detection - Benchmark before and after changes with benchmark_real_performance.py

Performance Testing:

# Always benchmark performance-critical changes
@pytest.mark.benchmark
def test_large_dataset_performance():
    large_data = create_large_test_dataset()

    start_time = time.time()
    result = serialize(large_data)
    elapsed = time.time() - start_time

    assert elapsed < 1.0  # Must complete in under 1 second

๐Ÿ› ๏ธ Development Setup

Prerequisites

  • Python 3.8+ (we support 3.8-3.13)
  • Git
  • Virtual environment (recommended)

Local Development Setup

# 1. Fork and clone the repository
git clone https://github.com/danielendler/datason.git
cd datason

# 2. Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# 3. Install development dependencies
pip install -e ".[dev,all,ml]"

# 4. Install pre-commit hooks
pre-commit install

# 5. Run initial tests to verify setup
pytest -v

Development Workflow

  1. Create feature branch: git checkout -b feature/your-feature-name
  2. Make changes following the coding standards below
  3. Add tests for new functionality
  4. Run tests: pytest --cov=datason
  5. Check coverage: Ensure coverage stays above 80%
  6. Run linting and formatting: ruff check --fix . && ruff format .
  7. Create pull request with detailed description

๐Ÿ“‹ Coding Standards

Code Style

  • Linter & Formatter: Ruff (unified tool for linting and formatting)
  • Type hints: Required for all public APIs
  • Docstrings: Google-style for all public functions

File Structure

datason/
โ”œโ”€โ”€ core.py              # Core serialization logic
โ”œโ”€โ”€ converters.py        # Safe type converters
โ”œโ”€โ”€ deserializers.py     # Deserialization and parsing
โ”œโ”€โ”€ datetime_utils.py    # Date/time handling
โ”œโ”€โ”€ data_utils.py        # Data processing utilities
โ”œโ”€โ”€ serializers.py       # Specialized serializers
โ”œโ”€โ”€ ml_serializers.py    # ML/AI library support
โ””โ”€โ”€ __init__.py          # Public API exports

Naming Conventions

  • Functions: snake_case (e.g., serialize_object)
  • Classes: PascalCase (e.g., JSONSerializer)
  • Constants: UPPER_SNAKE_CASE (e.g., DEFAULT_TIMEOUT)
  • Private: _leading_underscore (e.g., _internal_function)

Documentation Standards

def serialize_complex_object(obj: Any, **kwargs: Any) -> Dict[str, Any]:
    """Serialize a complex object to JSON-compatible format.

    Args:
        obj: The object to serialize (any Python type)
        **kwargs: Additional serialization options

    Returns:
        JSON-compatible dictionary representation

    Raises:
        SerializationError: If object cannot be serialized

    Examples:
        >>> data = {'date': datetime.now(), 'array': np.array([1, 2, 3])}
        >>> result = serialize_complex_object(data)
        >>> isinstance(result, dict)
        True
    """

๐Ÿงช Testing Guidelines

Test Organization

tests/
โ”œโ”€โ”€ test_core.py                    # Core functionality tests
โ”œโ”€โ”€ test_converters.py              # Type converter tests
โ”œโ”€โ”€ test_deserializers.py           # Deserialization tests
โ”œโ”€โ”€ test_optional_dependencies.py   # Pandas/numpy integration
โ”œโ”€โ”€ test_ml_serializers.py          # ML library tests
โ”œโ”€โ”€ test_edge_cases.py              # Edge cases and error handling
โ”œโ”€โ”€ test_performance.py             # Performance benchmarks
โ””โ”€โ”€ test_coverage_boost.py          # Coverage improvement tests

Test Types

Unit Tests: Test individual functions in isolation

def test_serialize_datetime():
    """Test datetime serialization preserves information."""
    dt = datetime(2023, 1, 1, 12, 0, 0)
    result = serialize(dt)
    assert result.endswith('T12:00:00')

Integration Tests: Test component interaction

def test_round_trip_serialization():
    """Test complete serialize โ†’ JSON โ†’ deserialize cycle."""
    data = {'uuid': uuid.uuid4(), 'date': datetime.now()}
    json_str = json.dumps(serialize(data))
    restored = deserialize(json.loads(json_str))
    assert isinstance(restored['uuid'], uuid.UUID)

Edge Case Tests: Test error conditions and boundaries

def test_circular_reference_handling():
    """Test graceful handling of circular references."""
    circular = {}
    circular['self'] = circular
    result = serialize(circular)  # Should not raise
    assert isinstance(result, (dict, str))

Performance Tests: Benchmark critical paths

@pytest.mark.performance
def test_large_dataset_serialization():
    """Benchmark serialization of large datasets."""
    large_data = create_large_test_data(10000)

    start = time.time()
    result = serialize(large_data)
    elapsed = time.time() - start

    assert elapsed < 2.0  # Performance requirement

Coverage Requirements

Each new feature must include tests that maintain or improve coverage:

# Check coverage before changes
pytest --cov=datason --cov-report=term-missing

# Identify uncovered lines
# Lines marked in red in HTML report need tests

# Add tests targeting specific lines
pytest tests/test_your_new_feature.py --cov=datason --cov-report=term-missing

๐Ÿ”„ Pull Request Process

Before Submitting

  • All tests pass: pytest
  • Coverage maintained: pytest --cov=datason
  • Code quality: ruff check --fix .
  • Code formatting: ruff format .
  • Type checking: mypy datason/
  • Security scan: bandit -r datason/
  • Documentation updated if needed

CI/CD Pipeline

datason uses a modern multi-pipeline CI/CD architecture. See our CI/CD Pipeline Guide for complete details on: - ๐Ÿ” Quality Pipeline - Ruff linting, formatting, security scanning (~30s) - ๐Ÿงช Main CI - Testing, coverage, package building (~2-3min) - ๐Ÿ“š Docs Pipeline - Documentation generation and deployment - ๐Ÿ“ฆ Publish Pipeline - Automated PyPI releases

All pipelines use intelligent caching for 2-5x speedup on repeat runs.

PR Template

## ๐ŸŽฏ What does this PR do?
Brief description of the changes

## โœ… Testing
- [ ] Added unit tests for new functionality
- [ ] Added integration tests if applicable
- [ ] Performance benchmarks if performance-related
- [ ] Coverage maintained/improved

## ๐Ÿ“Š Coverage Impact
Current coverage: X%
After changes: Y%

## ๐Ÿš€ Performance Impact  
[Include benchmarks if performance-related]

## ๐Ÿ“š Documentation
- [ ] Updated README if user-facing changes
- [ ] Added docstrings for new public APIs
- [ ] Updated examples if needed

๐ŸŽ Types of Contributions

๐Ÿ› Bug Fixes

  • Always include a test that reproduces the bug
  • Ensure fix doesn't break existing functionality
  • Update documentation if behavior changes

โœจ New Features

  • Discuss major features in an issue first
  • Follow the minimal dependencies principle
  • Include comprehensive tests
  • Update documentation and examples

๐Ÿ“ˆ Performance Improvements

  • Include before/after benchmarks
  • Ensure no functionality regressions
  • Add performance tests to prevent future regressions

๐Ÿ“š Documentation

  • Fix typos, improve clarity
  • Add examples for complex use cases
  • Update API documentation

๐Ÿงช Testing

  • Improve test coverage
  • Add edge case tests
  • Performance benchmarks

๐Ÿ† Recognition

Contributors are recognized in: - CHANGELOG.md for each release - GitHub contributors page - Special recognition for significant contributions

โ“ Getting Help

  • Issues: Use GitHub Issues for bugs and feature requests
  • Discussions: Use GitHub Discussions for questions
  • Email: Contact maintainers for security issues

๐Ÿ“œ Code of Conduct

We follow the Contributor Covenant Code of Conduct. Please be respectful and inclusive in all interactions.


Thank you for contributing to datason! ๐Ÿ™ Your contributions help make Python data serialization better for everyone.