🎯 Modern API Guide¶

Welcome to datason's modern API - a collection of intention-revealing functions designed for clarity, progressive complexity, and domain-specific optimization. This guide will help you master the modern API and understand when and how to use each function.

🌟 Why Modern API?¶

The modern API addresses common pain points with traditional serialization:

🎯 Clear Intent: Function names tell you exactly what they do
📈 Progressive Complexity: Start simple, add complexity as needed
🔧 Domain-Specific: Optimized functions for ML, API, security use cases
🧩 Composable: Mix and match features for your specific needs
🔍 Self-Documenting: Built-in help and discovery

🔹 Serialization Functions (Dump Functions)¶

dump() - The Universal Function¶

The dump() function is your Swiss Army knife - it can handle any scenario with composable options:

import datason as ds
import pandas as pd
import torch

# Basic usage
data = {"values": [1, 2, 3], "timestamp": datetime.now()}
result = ds.dump(data)

# Composable options for specific needs
complex_data = {
    "model": torch.nn.Linear(10, 1),
    "user_data": {"email": "user@example.com", "ssn": "123-45-6789"},
    "large_df": pd.DataFrame(np.random.random((10000, 50)))
}

# Combine multiple features
result = ds.dump(
    complex_data,
    secure=True,     # Enable PII redaction
    ml_mode=True,    # Optimize for ML objects
    chunked=True,    # Memory-efficient processing
    fast_mode=True   # Performance optimization
)

When to use: - General-purpose serialization - When you need multiple features combined - As a starting point before moving to specialized functions

dump_ml() - ML-Optimized¶

Perfect for machine learning workflows with automatic optimization for ML objects:

import torch
import numpy as np
from sklearn.ensemble import RandomForestClassifier

# ML objects are automatically optimized
ml_data = {
    "pytorch_model": torch.nn.Linear(100, 10),
    "sklearn_model": RandomForestClassifier(),
    "tensor": torch.randn(1000, 100),
    "numpy_features": np.random.random((1000, 50)),
    "training_config": {"lr": 0.001, "epochs": 100}
}

result = ds.dump_ml(ml_data)

Features: - Optimized tensor serialization - Model state preservation - NumPy array compression - Training metadata handling

When to use: - PyTorch/TensorFlow models - NumPy arrays and tensors - ML training pipelines - Model checkpoints

dump_api() - API-Safe¶

Clean, web-safe JSON output optimized for REST APIs:

# API response with mixed data
api_response = {
    "status": "success",
    "data": [1, 2, 3],
    "errors": None,           # Will be removed
    "timestamp": datetime.now(),
    "pagination": {
        "page": 1,
        "total": 100,
        "has_more": True
    }
}

clean_result = ds.dump_api(api_response)
# Result: Clean JSON, null values removed, optimized structure

Features: - Removes null/None values - Optimizes nested structures - Ensures JSON compatibility - Minimal payload size

When to use: - REST API endpoints - Web service responses - JSON data for frontend - Configuration files

dump_secure() - Security-Focused¶

Automatic PII redaction and security-focused serialization:

# Sensitive user data
user_data = {
    "profile": {
        "name": "John Doe",
        "email": "john@example.com",
        "ssn": "123-45-6789",
        "phone": "+1-555-123-4567"
    },
    "account": {
        "password": "secret123",
        "api_key": "sk-abc123def456",
        "credit_card": "4532-1234-5678-9012"
    }
}

# Automatic PII redaction
secure_result = ds.dump_secure(user_data, redact_pii=True)

# Custom redaction patterns
custom_secure = ds.dump_secure(
    user_data,
    redact_fields=["internal_id", "session_token"],
    redact_patterns=[
        r"\b\d{4}-\d{4}-\d{4}-\d{4}\b",  # Credit cards
        r"sk-[a-zA-Z0-9]{20,}",          # API keys
    ]
)

Features: - Automatic PII detection and redaction - Custom redaction patterns - Field-based redaction - Audit trail generation

When to use: - User data with PII - Financial information - Healthcare data - Compliance requirements

dump_fast() - Performance-Optimized¶

High-throughput serialization with minimal overhead:

# Large batch processing
batch_data = []
for i in range(10000):
    batch_data.append({
        "id": i,
        "value": random.random(),
        "category": f"cat_{i % 10}"
    })

# Optimized for speed
fast_result = ds.dump_fast(batch_data)

Features: - Minimal type checking - Optimized algorithms - Reduced memory allocations - Streamlined processing

When to use: - High-volume data processing - Real-time systems - Performance-critical paths - Simple data structures

dump_chunked() - Memory-Efficient¶

Handle very large objects without memory exhaustion:

# Very large dataset
large_data = {
    "images": [np.random.random((512, 512, 3)) for _ in range(1000)],
    "features": pd.DataFrame(np.random.random((100000, 200))),
    "metadata": {"size": "huge", "format": "research"}
}

# Process in memory-efficient chunks
chunked_result = ds.dump_chunked(large_data, chunk_size=1000)

# Chunked result is a generator - process piece by piece
for chunk in chunked_result:
    # Process each chunk independently
    process_chunk(chunk)

Features: - Memory-efficient processing - Streaming capabilities - Configurable chunk size - Generator-based output

When to use: - Very large datasets - Memory-constrained environments - Streaming applications - ETL pipelines

stream_dump() - File Streaming¶

Direct streaming to files for extremely large data:

# Massive dataset that won't fit in memory
huge_data = {
    "sensor_data": generate_sensor_readings(1000000),
    "images": generate_image_batch(10000),
    "metadata": {"source": "sensors", "duration": "24h"}
}

# Stream directly to file
with open('massive_dataset.json', 'w') as f:
    ds.stream_dump(huge_data, f)

# For compressed output
import gzip
with gzip.open('massive_dataset.json.gz', 'wt') as f:
    ds.stream_dump(huge_data, f)

Features: - Direct file output - No memory buffering - Supports any file-like object - Works with compression

When to use: - Extremely large datasets - Direct file output - Memory-constrained systems - Archive creation

🔹 Deserialization Functions (Load Functions)¶

The load functions provide progressive complexity - start with basic exploration and move to production-ready functions as needed.

load_basic() - Fast Exploration (60-70% Success Rate)¶

Quick and dirty deserialization for data exploration:

# Simple JSON data
json_data = '''
{
    "values": [1, 2, 3, 4, 5],
    "timestamp": "2024-01-01T12:00:00",
    "metadata": {"version": 1.0}
}
'''

# Fast basic loading - minimal processing
basic_result = ds.load_basic(json_data)
print(basic_result)
# Note: timestamp remains as string, minimal type conversion

Features: - Fastest loading - Basic type conversion - Minimal error handling - Good for exploration

When to use: - Data exploration and debugging - Simple JSON structures - Quick prototyping - Performance-critical loading

load_smart() - Production-Ready (80-90% Success Rate)¶

Intelligent type detection and restoration for production use:

# Complex serialized data
complex_data = {
    "dataframe": pd.DataFrame({"x": [1, 2, 3], "y": [4.5, 5.5, 6.5]}),
    "timestamp": datetime.now(),
    "array": np.array([1, 2, 3, 4, 5]),
    "config": {"learning_rate": 0.001}
}

serialized = ds.dump(complex_data)

# Smart loading with type restoration
smart_result = ds.load_smart(serialized)
print(type(smart_result["dataframe"]))  # <class 'pandas.core.frame.DataFrame'>
print(type(smart_result["timestamp"]))  # <class 'datetime.datetime'>
print(type(smart_result["array"]))      # <class 'numpy.ndarray'>

Features: - Intelligent type detection - Good success rate - Handles complex types - Production-ready

When to use: - Production applications - Complex data structures - General-purpose loading - When reliability matters

load_perfect() - 100% Success Rate (Requires Template)¶

Template-based loading for critical applications requiring 100% reliability:

# Define the expected structure
template = {
    "user_id": int,
    "profile": {
        "name": str,
        "email": str,
        "created": datetime
    },
    "scores": [float],
    "metadata": {
        "version": str,
        "features": [str]
    }
}

json_data = '''
{
    "user_id": 12345,
    "profile": {
        "name": "Alice",
        "email": "alice@example.com",
        "created": "2024-01-01T12:00:00"
    },
    "scores": [95.5, 87.2, 92.1],
    "metadata": {
        "version": "v1.0",
        "features": ["premium", "analytics"]
    }
}
'''

# Perfect restoration using template
perfect_result = ds.load_perfect(json_data, template)
# 100% guaranteed to match template structure

Features: - 100% success rate - Type validation - Structure enforcement - Error reporting

When to use: - Critical applications - Data validation required - Schema enforcement - API input validation

load_typed() - Metadata-Based (95% Success Rate)¶

Uses embedded type metadata for high-reliability restoration:

# Serialize with type information
original_data = {
    "model": torch.nn.Linear(10, 1),
    "dataframe": pd.DataFrame({"x": [1, 2, 3]}),
    "timestamp": datetime.now()
}

# Serialize with type metadata
serialized_with_types = ds.dump(original_data, include_type_info=True)

# Load using embedded type information
typed_result = ds.load_typed(serialized_with_types)
# Types are restored using embedded metadata

Features: - High success rate (95%) - Uses embedded metadata - Automatic type restoration - No template required

When to use: - When original data had type metadata - High-reliability needs - Complex type restoration - Self-describing data

🔹 Utility & Discovery Functions¶

help_api() - Interactive Guidance¶

Get personalized recommendations for your use case:

# Get interactive help
ds.help_api()

# Example output:
# 🎯 datason Modern API Guide
#
# SERIALIZATION (Dump Functions):
# • dump() - General purpose with composable options
# • dump_ml() - ML models, tensors, NumPy arrays
# • dump_api() - Web APIs, clean JSON output
# • dump_secure() - Sensitive data with PII redaction
# • dump_fast() - High-throughput scenarios
# • dump_chunked() - Large objects, memory efficiency
#
# DESERIALIZATION (Load Functions):
# • load_basic() - 60-70% success, fastest (exploration)
# • load_smart() - 80-90% success, moderate (production)
# • load_perfect() - 100% success, requires template (critical)
# • load_typed() - 95% success, uses metadata
#
# RECOMMENDATIONS:
# • For ML workflows: dump_ml() + load_smart()
# • For APIs: dump_api() + load_smart()
# • For sensitive data: dump_secure() + load_smart()
# • For exploration: dump() + load_basic()
# • For production: dump() + load_smart() or load_typed()

get_api_info() - API Metadata¶

Programmatic access to API information:

# Get comprehensive API information
api_info = ds.get_api_info()

print("Dump functions:", api_info['dump_functions'])
print("Load functions:", api_info['load_functions'])
print("Features:", api_info['features'])
print("Recommendations:", api_info['recommendations'])

# Use programmatically
if 'dump_ml' in api_info['dump_functions']:
    # ML optimization available
    result = ds.dump_ml(ml_data)

dumps() / loads() - JSON Compatibility¶

Drop-in replacement for Python's json module:

import datason as ds

# Instead of:
# import json
# json_str = json.dumps(data)  # Fails with datetime, numpy, etc.
# data = json.loads(json_str)

# Use datason:
data = {
    "timestamp": datetime.now(),
    "array": np.array([1, 2, 3]),
    "dataframe": pd.DataFrame({"x": [1, 2, 3]})
}

# Like json.dumps() but handles complex types
json_str = ds.dumps(data)

# Like json.loads() but restores types
restored = ds.loads(json_str)
print(type(restored["timestamp"]))  # <class 'datetime.datetime'>
print(type(restored["array"]))      # <class 'numpy.ndarray'>

🎯 Choosing the Right Function¶

Quick Decision Tree¶

📊 SERIALIZATION (What are you saving?)
├── 🤖 ML models/tensors/arrays → dump_ml()
├── 🌐 API/web responses → dump_api()
├── 🔒 Sensitive/PII data → dump_secure()
├── ⚡ High-volume/performance → dump_fast()
├── 💾 Very large data → dump_chunked() or stream_dump()
└── 🎯 General purpose → dump()

📥 DESERIALIZATION (How reliable do you need it?)
├── 🔍 Quick exploration (60-70%) → load_basic()
├── 🏭 Production ready (80-90%) → load_smart()
├── 🎯 Critical/validated (100%) → load_perfect()
└── 📋 With metadata (95%) → load_typed()

Usage Patterns by Scenario¶

Data Science Workflow¶

# Exploratory analysis
raw_data = ds.load_basic(json_file)  # Quick exploration

# Data processing
processed = process_data(raw_data)
result = ds.dump_ml(processed)  # ML-optimized storage

# Production pipeline
validated_data = ds.load_smart(result)  # Reliable loading

Web API Development¶

# API endpoint
@app.route('/api/data')
def get_data():
    data = get_database_data()
    return ds.dump_api(data)  # Clean JSON response

# API input validation
@app.route('/api/data', methods=['POST'])
def create_data():
    template = get_validation_template()
    try:
        validated = ds.load_perfect(request.json, template)
        return process_data(validated)
    except TemplateError:
        return {"error": "Invalid data structure"}, 400

Security-Sensitive Applications¶

# User data processing
user_input = request.json
secure_data = ds.dump_secure(user_input, redact_pii=True)

# Logging (safe for logs)
logger.info(f"Processed user data: {secure_data}")

# Storage (PII-free)
store_in_database(secure_data)

High-Performance Systems¶

# Batch processing
for batch in data_batches:
    processed = ds.dump_fast(batch)  # High-throughput
    queue.put(processed)

# Real-time systems
result = ds.load_basic(incoming_data)  # Fastest loading
process_realtime(result)

Large Data Systems¶

# ETL pipeline
with open('large_output.json', 'w') as f:
    ds.stream_dump(massive_dataset, f)  # Direct streaming

# Chunked processing
for chunk in ds.dump_chunked(huge_data, chunk_size=1000):
    process_chunk(chunk)  # Memory-efficient

🔄 Migration from Traditional API¶

Gradual Migration Strategy¶

# Phase 1: Add modern API alongside traditional
def serialize_data(data, use_modern=False):
    if use_modern:
        return ds.dump_ml(data)  # Modern approach
    else:
        return ds.serialize(data, config=ds.get_ml_config())  # Traditional

# Phase 2: Feature flags for new functionality
def api_response(data):
    if feature_flags.modern_api:
        return ds.dump_api(data)
    else:
        return ds.serialize(data, config=ds.get_api_config())

# Phase 3: Full migration
def process_ml_data(data):
    return ds.dump_ml(data)  # Modern API only

Equivalent Functions¶

Traditional API	Modern API	Notes
`serialize(data, config=get_ml_config())`	`dump_ml(data)`	Automatic ML optimization
`serialize(data, config=get_api_config())`	`dump_api(data)`	Clean JSON output
`serialize(data, config=get_performance_config())`	`dump_fast(data)`	Performance optimized
`serialize_chunked(data)`	`dump_chunked(data)`	Memory efficient
`deserialize(data)`	`load_smart(data)`	General purpose
`auto_deserialize(data)`	`load_basic(data)`	Quick exploration
`deserialize_with_template(data, template)`	`load_perfect(data, template)`	Template-based

🛠️ Advanced Patterns¶

Composable Serialization¶

# Build up complexity as needed
def adaptive_serialize(data, context):
    options = {}

    if context.has_ml_objects:
        options['ml_mode'] = True

    if context.has_sensitive_data:
        options['secure'] = True

    if context.memory_constrained:
        options['chunked'] = True

    if context.performance_critical:
        options['fast_mode'] = True

    return ds.dump(data, **options)

Progressive Loading¶

def smart_load(data, reliability_required='medium'):
    """Load data with appropriate reliability level."""

    if reliability_required == 'low':
        return ds.load_basic(data)
    elif reliability_required == 'medium':
        return ds.load_smart(data)
    elif reliability_required == 'high':
        # Requires template
        template = infer_or_get_template(data)
        return ds.load_perfect(data, template)
    else:
        # Use metadata if available
        return ds.load_typed(data)

Error Handling and Fallbacks¶

def robust_deserialize(data):
    """Try progressively simpler approaches if needed."""

    # Try typed first (highest success rate without template)
    try:
        return ds.load_typed(data)
    except (TypeError, ValueError):
        pass

    # Fall back to smart loading
    try:
        return ds.load_smart(data)
    except (TypeError, ValueError):
        pass

    # Last resort: basic loading
    try:
        return ds.load_basic(data)
    except Exception as e:
        raise DeserializationError(f"All loading methods failed: {e}")

🔍 Debugging and Troubleshooting¶

Common Issues and Solutions¶

1. Deserialization Success Rate Lower Than Expected¶

# Problem: load_smart() not working well
data = ds.load_smart(json_data)  # Only 60% success

# Solution: Check if you need a template
template = create_template_for_data()
data = ds.load_perfect(json_data, template)  # 100% success

2. Performance Issues¶

# Problem: Slow serialization
result = ds.dump(large_data)  # Too slow

# Solution: Use performance-optimized function
result = ds.dump_fast(large_data)  # Much faster

# Or use chunked processing for memory issues
result = ds.dump_chunked(large_data, chunk_size=1000)

3. Security Concerns¶

# Problem: PII in serialized data
result = ds.dump(user_data)  # Contains PII

# Solution: Use security-focused function
result = ds.dump_secure(user_data, redact_pii=True)  # PII redacted

Debugging Tools¶

# Get detailed API information
api_info = ds.get_api_info()
print("Available functions:", api_info['dump_functions'])

# Interactive help
ds.help_api()  # Get recommendations

# Test different loading strategies
for loader in [ds.load_basic, ds.load_smart, ds.load_typed]:
    try:
        result = loader(problematic_data)
        print(f"{loader.__name__}: SUCCESS")
        break
    except Exception as e:
        print(f"{loader.__name__}: FAILED - {e}")

🚀 Best Practices¶

1. Start Simple, Add Complexity¶

# Start with basic functions
result = ds.dump(data)
loaded = ds.load_smart(result)

# Add features as needed
result = ds.dump(data, ml_mode=True)  # Add ML optimization
result = ds.dump(data, ml_mode=True, secure=True)  # Add security

2. Choose the Right Tool for the Job¶

# For each use case, pick the most appropriate function
ml_result = ds.dump_ml(model_data)      # ML objects
api_result = ds.dump_api(response_data) # Web APIs
secure_result = ds.dump_secure(user_data, redact_pii=True)  # PII data

3. Handle Errors Gracefully¶

def safe_serialize(data, strategy='smart'):
    try:
        if strategy == 'ml':
            return ds.dump_ml(data)
        elif strategy == 'api':
            return ds.dump_api(data)
        else:
            return ds.dump(data)
    except Exception as e:
        # Fallback to basic serialization
        return ds.dump_fast(data)

4. Use Discovery Features¶

# Let the API guide you
ds.help_api()  # Get recommendations

# Check capabilities programmatically
api_info = ds.get_api_info()
if 'dump_ml' in api_info['dump_functions']:
    use_ml_optimization = True

🎯 Summary¶

The modern API provides a clear, progressive approach to serialization:

🔹 7 Dump Functions: From general-purpose to highly specialized
🔹 4 Load Functions: Progressive complexity from exploration to production
🔹 3 Utility Functions: Discovery, help, and JSON compatibility
🔹 Composable Design: Mix and match features as needed
🔹 Self-Documenting: Built-in guidance and recommendations

Start with dump() and load_smart() for general use, then specialize as your needs become clearer. The modern API grows with your application - from quick prototypes to production-critical systems.

Ready to explore real-world examples? Check out our Examples Gallery for comprehensive usage patterns!