Configuration System¶

The datason configuration system provides fine-grained control over serialization behavior through a comprehensive set of options and preset configurations optimized for different use cases.

Overview¶

Instead of a one-size-fits-all approach, datason allows you to configure serialization behavior through the SerializationConfig class and convenient preset functions.

import datason
from datason.config import SerializationConfig, get_ml_config

# Use presets for common scenarios
ml_config = get_ml_config()
result = datason.serialize(data, config=ml_config)

# Or create custom configurations
custom_config = SerializationConfig(
    date_format=DateFormat.UNIX_MS,
    nan_handling=NanHandling.DROP,
    sort_keys=True
)
result = datason.serialize(data, config=custom_config)

🎛️ Configuration Options¶

Date/Time Formatting¶

Control how datetime objects are serialized with the date_format option:

from datason.config import DateFormat, SerializationConfig
from datetime import datetime

dt = datetime(2024, 1, 15, 10, 30, 45)

# ISO format (default) - human readable
config = SerializationConfig(date_format=DateFormat.ISO)
# Result: "2024-01-15T10:30:45"

# Unix timestamp - numeric, compact
config = SerializationConfig(date_format=DateFormat.UNIX)
# Result: 1705316445.0

# Unix milliseconds - JavaScript compatible
config = SerializationConfig(date_format=DateFormat.UNIX_MS)
# Result: 1705316445000

# String representation - human readable
config = SerializationConfig(date_format=DateFormat.STRING)
# Result: "2024-01-15 10:30:45"

# Custom format - your pattern
config = SerializationConfig(
    date_format=DateFormat.CUSTOM,
    custom_date_format="%Y-%m-%d %H:%M"
)
# Result: "2024-01-15 10:30"

NaN/Null Handling¶

Configure how NaN, None, and missing values are processed:

from datason.config import NanHandling
import numpy as np

data = {"values": [1, 2, float('nan'), 4, None]}

# Convert to JSON null (default)
config = SerializationConfig(nan_handling=NanHandling.NULL)
# Result: {"values": [1, 2, null, 4, null]}

# Convert to string representation
config = SerializationConfig(nan_handling=NanHandling.STRING)
# Result: {"values": [1, 2, "nan", 4, "None"]}

# Keep original values (may not be JSON compatible)
config = SerializationConfig(nan_handling=NanHandling.KEEP)
# Result: {"values": [1, 2, NaN, 4, null]}

# Drop NaN values from collections
config = SerializationConfig(nan_handling=NanHandling.DROP)
# Result: {"values": [1, 2, 4]}

Type Coercion Strategies¶

Control how aggressively types are converted:

from datason.config import TypeCoercion
import decimal
import uuid

data = {
    "amount": decimal.Decimal("123.45"),
    "id": uuid.uuid4(),
    "complex": 3+4j
}

# Strict - preserve all type information
config = SerializationConfig(type_coercion=TypeCoercion.STRICT)
# Result: {"amount": {"_type": "decimal", "value": "123.45", ...}, ...}

# Safe - convert when safe to do so
config = SerializationConfig(type_coercion=TypeCoercion.SAFE)  # Default
# Result: {"amount": 123.45, "id": "uuid-string", "complex": "3+4j"}

# Aggressive - convert to simple types when possible
config = SerializationConfig(type_coercion=TypeCoercion.AGGRESSIVE)
# Result: {"amount": 123.45, "id": "uuid-string", "complex": [3, 4]}

Pandas DataFrame Orientation¶

Control how DataFrames are serialized:

from datason.config import DataFrameOrient
import pandas as pd

df = pd.DataFrame({"A": [1, 2], "B": [3, 4]})

# Records format (default) - list of row objects
config = SerializationConfig(dataframe_orient=DataFrameOrient.RECORDS)
# Result: [{"A": 1, "B": 3}, {"A": 2, "B": 4}]

# Split format - separate index/columns/data
config = SerializationConfig(dataframe_orient=DataFrameOrient.SPLIT)
# Result: {"index": [0, 1], "columns": ["A", "B"], "data": [[1, 3], [2, 4]]}

# Values format - raw data only
config = SerializationConfig(dataframe_orient=DataFrameOrient.VALUES)
# Result: [[1, 3], [2, 4]]

# Index format - index as keys
config = SerializationConfig(dataframe_orient=DataFrameOrient.INDEX)
# Result: {0: {"A": 1, "B": 3}, 1: {"A": 2, "B": 4}}

# Columns format - columns as keys
config = SerializationConfig(dataframe_orient=DataFrameOrient.COLUMNS)
# Result: {"A": {0: 1, 1: 2}, "B": {0: 3, 1: 4}}

# Table format - complete DataFrame representation
config = SerializationConfig(dataframe_orient=DataFrameOrient.TABLE)
# Result: {"schema": {...}, "data": [...]}

Custom Serializers¶

Register custom handlers for your own types:

from dataclasses import dataclass

@dataclass
class Person:
    name: str
    age: int

def serialize_person(person):
    return {
        "_type": "Person",
        "name": person.name,
        "age": person.age,
        "is_adult": person.age >= 18
    }

config = SerializationConfig(
    custom_serializers={Person: serialize_person}
)

data = {"user": Person("Alice", 25)}
result = datason.serialize(data, config=config)
# Result: {"user": {"_type": "Person", "name": "Alice", "age": 25, "is_adult": true}}

Performance and Security Options¶

Control resource usage and optimization behavior:

config = SerializationConfig(
    # Security limits
    max_depth=500,              # Maximum nesting depth
    max_size=1_000_000,         # Maximum collection size
    max_string_length=100_000,  # Maximum string length

    # Performance options
    sort_keys=True,             # Sort dictionary keys for consistent output
    preserve_decimals=True,     # Keep decimal precision vs convert to float
    preserve_complex=True,      # Preserve complex number metadata

    # Caching options (New in v0.7.0)
    cache_size_limit=10000,     # Maximum cache entries per scope
    cache_metrics_enabled=True, # Enable cache performance monitoring
    cache_warn_on_limit=True,   # Warn when cache size limit reached
)

Cache Scope Management¶

New in v0.7.0: Control caching behavior independently from serialization configuration:

import datason
from datason import CacheScope

# Set global cache scope
datason.set_cache_scope(CacheScope.REQUEST)  # For web APIs
datason.set_cache_scope(CacheScope.PROCESS)  # For ML training
datason.set_cache_scope(CacheScope.OPERATION)  # For testing (default)

# Use context managers for temporary scope changes
with datason.request_scope():
    # Multiple operations share cache within this block
    result1 = datason.deserialize_fast(data1)
    result2 = datason.deserialize_fast(data1)  # Cache hit!

# Monitor cache performance
metrics = datason.get_cache_metrics()
for scope, stats in metrics.items():
    print(f"{scope}: {stats.hit_rate:.1%} hit rate")

🎯 Preset Configurations¶

For common use cases, datason provides optimized preset configurations:

ML/AI Configuration¶

from datason.config import get_ml_config

config = get_ml_config()
# Optimized for machine learning workflows:
# - Unix timestamp dates (numeric, efficient)
# - Aggressive type coercion (simple types)
# - NULL NaN handling (JSON compatibility)
# - DataFrame records format (standard ML format)

API Configuration¶

from datason.config import get_api_config

config = get_api_config()
# Optimized for REST API responses:
# - ISO date format (human readable)
# - Safe type coercion (predictable)
# - Sorted keys (consistent output)
# - DataFrame records format (web standard)

Strict Configuration¶

from datason.config import get_strict_config

config = get_strict_config()
# Optimized for type preservation:
# - Strict type coercion (no data loss)
# - Preserve decimals and complex numbers
# - String NaN handling (explicit)
# - DataFrame table format (complete metadata)

Performance Configuration¶

from datason.config import get_performance_config

config = get_performance_config()
# Optimized for speed:
# - Unix date format (fast)
# - Aggressive coercion (minimal processing)
# - Drop NaN values (smaller output)
# - DataFrame values format (minimal structure)

🔧 Global Configuration¶

Set a default configuration for all serialization operations:

import datason
from datason.config import get_ml_config

# Set global default
datason.configure(get_ml_config())

# Now all serialize() calls use ML config by default
result = datason.serialize(data)  # Uses ML config

# Override for specific calls
result = datason.serialize(data, config=get_api_config())

⚡ Convenience Functions¶

Quick configuration without creating config objects:

# Quick configuration with keyword arguments
result = datason.serialize_with_config(
    data,
    date_format='unix_ms',
    nan_handling='drop',
    sort_keys=True
)

# Equivalent to:
config = SerializationConfig(
    date_format=DateFormat.UNIX_MS,
    nan_handling=NanHandling.DROP,
    sort_keys=True
)
result = datason.serialize(data, config=config)

📊 Configuration Examples by Use Case¶

Data Science Pipeline¶

# For data analysis and scientific computing
config = SerializationConfig(
    date_format=DateFormat.ISO,         # Human readable dates
    dataframe_orient=DataFrameOrient.TABLE,  # Complete DataFrame info
    nan_handling=NanHandling.STRING,    # Explicit NaN representation
    type_coercion=TypeCoercion.STRICT,  # Preserve all type information
    preserve_decimals=True,             # Keep numeric precision
    preserve_complex=True               # Preserve complex numbers
)

Microservices API¶

# For service-to-service communication
config = SerializationConfig(
    date_format=DateFormat.UNIX_MS,     # JavaScript compatible
    dataframe_orient=DataFrameOrient.RECORDS,  # Standard format
    nan_handling=NanHandling.NULL,      # JSON compatible
    type_coercion=TypeCoercion.SAFE,    # Predictable conversion
    sort_keys=True,                     # Consistent output
    max_depth=100,                      # Security limit
    max_size=10_000                     # Memory protection
)

ML Model Serving¶

# For model inference and results
config = SerializationConfig(
    date_format=DateFormat.UNIX,        # Efficient numeric format
    dataframe_orient=DataFrameOrient.VALUES,  # Raw data only
    nan_handling=NanHandling.DROP,      # Clean output
    type_coercion=TypeCoercion.AGGRESSIVE,  # Simple types
    max_depth=50,                       # Fast processing
    custom_serializers={
        ModelResult: custom_model_serializer
    }
)

Development and Debugging¶

# For development and troubleshooting
config = SerializationConfig(
    date_format=DateFormat.STRING,      # Human readable
    nan_handling=NanHandling.STRING,    # Explicit representation
    type_coercion=TypeCoercion.STRICT,  # No information loss
    sort_keys=True,                     # Consistent for diffs
    preserve_decimals=True,             # Exact values
    preserve_complex=True               # Complete information
)

🔍 Advanced Configuration¶

Dynamic Configuration¶

def get_config_for_environment():
    import os

    if os.getenv('ENV') == 'production':
        return get_performance_config()
    elif os.getenv('ENV') == 'development':
        return get_strict_config()
    else:
        return get_api_config()

config = get_config_for_environment()

Configuration Inheritance¶

# Start with a base configuration
base_config = get_ml_config()

# Create variations
debug_config = SerializationConfig(
    **base_config.__dict__,
    nan_handling=NanHandling.STRING,    # Override for debugging
    sort_keys=True                      # Add sorting
)

Conditional Custom Serializers¶

def smart_model_serializer(obj):
    """Serialize models differently based on size."""
    if hasattr(obj, 'coef_') and len(obj.coef_) > 1000:
        # Large model - serialize metadata only
        return {"_type": "large_model", "_class": type(obj).__name__}
    else:
        # Small model - full serialization
        return {"_type": "model", "params": obj.get_params()}

config = SerializationConfig(
    custom_serializers={
        type(my_model): smart_model_serializer
    }
)

📚 Best Practices¶

1. Choose the Right Preset¶

Start with a preset configuration that matches your primary use case, then customize as needed.

2. Profile Your Configuration¶

Use the benchmarking tools to measure performance impact of different configuration options.

3. Document Your Configuration¶

When using custom configurations in production, document the choices and reasoning.

4. Version Your Configuration¶

Keep configuration objects in version control and treat them as part of your API contract.

5. Test Edge Cases¶

Ensure your configuration handles edge cases in your data (NaN, None, circular references, large objects).

Advanced Types → - Supported Python types
Performance Guide → - Optimization strategies
ML/AI Integration → - ML-specific configuration
Pandas Integration → - DataFrame handling options