Skip to content

πŸ“‹ Core Functions

The main serialization and deserialization functions including the perfect JSON module replacement and traditional comprehensive APIs.

πŸ”„ JSON Module Drop-in Replacement

Zero migration effort - use datason exactly like Python's json module with optional enhanced features.

JSON Compatibility API

# Perfect drop-in replacement for Python's json module
import datason.json as json

# Exact same behavior as stdlib json
data = json.loads('{"timestamp": "2024-01-01T00:00:00Z", "value": 42}')
# Returns: {'timestamp': '2024-01-01T00:00:00Z', 'value': 42}

output = json.dumps({"key": "value"}, indent=2, sort_keys=True)
# All json.dumps() parameters work exactly the same

Enhanced API with Smart Defaults

# Enhanced features with same simple API
import datason

# Smart datetime parsing automatically enabled
data = datason.loads('{"timestamp": "2024-01-01T00:00:00Z", "value": 42}')
# Returns: {'timestamp': datetime.datetime(2024, 1, 1, 0, 0, tzinfo=timezone.utc), 'value': 42}

# Enhanced serialization with dict output
result = datason.dumps({"timestamp": datetime.now(), "data": [1, 2, 3]})
# Returns: dict (not string) with smart type handling
Function Purpose Output Type Enhanced Features
datason.loads() JSON string parsing dict βœ… Smart datetime parsing
datason.dumps() Object serialization dict βœ… Enhanced type handling
datason.loads_json() JSON compatibility dict ❌ Exact stdlib behavior
datason.dumps_json() JSON string output str ❌ Exact stdlib behavior

🎯 Traditional API Overview

The traditional core functions provide comprehensive, configuration-based serialization with maximum control and flexibility.

Function Purpose Best For
serialize() Main serialization function Custom configurations
deserialize() Main deserialization function Structured data restoration
auto_deserialize() Automatic type detection Quick data exploration
safe_deserialize() Error-resilient deserialization Untrusted data sources

πŸ“¦ Detailed Function Documentation

serialize()

The primary serialization function with full configuration support.

datason.serialize(obj: Any, config: Any = None, **kwargs: Any) -> Any

Serialize an object (DEPRECATED - use dump/dumps instead).

DEPRECATION WARNING: Direct use of serialize() is discouraged. Use the clearer API functions instead: - dump(obj, file) - write to file (like json.dump) - dumps(obj) - convert to string (like json.dumps) - serialize_enhanced(obj, **options) - enhanced serialization with clear options

Parameters:

Name Type Description Default
obj Any

Object to serialize

required
config Any

Optional configuration

None
**kwargs Any

Additional options

{}

Returns:

Type Description
Any

Serialized object

Source code in datason/__init__.py
def serialize(obj: Any, config: Any = None, **kwargs: Any) -> Any:
    """Serialize an object (DEPRECATED - use dump/dumps instead).

    DEPRECATION WARNING: Direct use of serialize() is discouraged.
    Use the clearer API functions instead:
    - dump(obj, file) - write to file (like json.dump)
    - dumps(obj) - convert to string (like json.dumps)
    - serialize_enhanced(obj, **options) - enhanced serialization with clear options

    Args:
        obj: Object to serialize
        config: Optional configuration
        **kwargs: Additional options

    Returns:
        Serialized object
    """
    import warnings

    warnings.warn(
        "serialize() is deprecated. Use dump/dumps for JSON compatibility or "
        "serialize_enhanced() for advanced features. Direct serialize() will be "
        "removed in a future version.",
        DeprecationWarning,
        stacklevel=2,
    )
    return _serialize_core(obj, config, **kwargs)

Configuration Example:

import datason as ds
from datetime import datetime
import pandas as pd

# Basic serialization
data = {"values": [1, 2, 3], "timestamp": datetime.now()}
result = ds.serialize(data)

# With custom configuration
config = ds.SerializationConfig(
    include_type_info=True,
    compress_arrays=True,
    date_format=ds.DateFormat.ISO_8601,
    nan_handling=ds.NanHandling.NULL
)

complex_data = {
    "dataframe": pd.DataFrame({"x": [1, 2, 3]}),
    "timestamp": datetime.now(),
    "metadata": {"version": 1.0}
}

result = ds.serialize(complex_data, config=config)

deserialize()

The primary deserialization function with configuration support.

datason.deserialize(obj: Any, parse_dates: bool = True, parse_uuids: bool = True) -> Any

Recursively deserialize JSON-compatible data back to Python objects.

Attempts to intelligently restore datetime objects, UUIDs, and other types that were serialized to strings by the serialize function.

Parameters:

Name Type Description Default
obj Any

The JSON-compatible object to deserialize

required
parse_dates bool

Whether to attempt parsing ISO datetime strings back to datetime objects

True
parse_uuids bool

Whether to attempt parsing UUID strings back to UUID objects

True

Returns:

Type Description
Any

Python object with restored types where possible

Examples:

>>> data = {"date": "2023-01-01T12:00:00", "id": "12345678-1234-5678-9012-123456789abc"}
>>> deserialize(data)
{"date": datetime(2023, 1, 1, 12, 0), "id": UUID('12345678-1234-5678-9012-123456789abc')}
Source code in datason/deserializers_new.py
def deserialize(obj: Any, parse_dates: bool = True, parse_uuids: bool = True) -> Any:
    """Recursively deserialize JSON-compatible data back to Python objects.

    Attempts to intelligently restore datetime objects, UUIDs, and other types
    that were serialized to strings by the serialize function.

    Args:
        obj: The JSON-compatible object to deserialize
        parse_dates: Whether to attempt parsing ISO datetime strings back to datetime objects
        parse_uuids: Whether to attempt parsing UUID strings back to UUID objects

    Returns:
        Python object with restored types where possible

    Examples:
        >>> data = {"date": "2023-01-01T12:00:00", "id": "12345678-1234-5678-9012-123456789abc"}
        >>> deserialize(data)
        {"date": datetime(2023, 1, 1, 12, 0), "id": UUID('12345678-1234-5678-9012-123456789abc')}
    """
    # ==================================================================================
    # IDEMPOTENCY CHECKS: Prevent double deserialization
    # ==================================================================================

    # IDEMPOTENCY CHECK 1: Check if object is already in final deserialized form
    if _is_already_deserialized(obj):
        return obj

    if obj is None:
        return None

    # NEW: Handle type metadata for round-trip serialization
    if isinstance(obj, dict) and TYPE_METADATA_KEY in obj:
        return _deserialize_with_type_metadata(obj)

    # Handle basic types (already in correct format)
    if isinstance(obj, (int, float, bool)):
        return obj

    # Handle strings - attempt intelligent parsing
    if isinstance(obj, str):
        # Try to parse as UUID first (more specific pattern)
        if parse_uuids and _looks_like_uuid(obj):
            try:
                import uuid as uuid_module  # Fresh import to avoid state issues

                return uuid_module.UUID(obj)
            except (ValueError, ImportError):
                # Log parsing failure but continue with string
                warnings.warn(f"Failed to parse UUID string: {obj}", stacklevel=2)

        # Try to parse as datetime if enabled
        if parse_dates and _looks_like_datetime(obj):
            try:
                import sys
                from datetime import datetime as datetime_class  # Fresh import

                # Handle 'Z' timezone suffix for Python < 3.11
                date_str = obj.replace("Z", "+00:00") if obj.endswith("Z") and sys.version_info < (3, 11) else obj
                return datetime_class.fromisoformat(date_str)
            except (ValueError, ImportError):
                # Log parsing failure but continue with string
                warnings.warn(
                    f"Failed to parse datetime string: {obj[:50]}{'...' if len(obj) > 50 else ''}",
                    stacklevel=2,
                )

        # Return as string if no parsing succeeded
        return obj

    # Handle lists
    if isinstance(obj, list):
        return [deserialize(item, parse_dates, parse_uuids) for item in obj]

    # Handle dictionaries
    if isinstance(obj, dict):
        return {k: deserialize(v, parse_dates, parse_uuids) for k, v in obj.items()}

    # For any other type, return as-is
    return obj

Deserialization Example:

# Basic deserialization
restored_data = ds.deserialize(serialized_result)

# With custom configuration for specific type handling
config = ds.SerializationConfig(
    strict_types=True,
    preserve_numpy_arrays=True,
    datetime_parsing=True
)

restored_data = ds.deserialize(serialized_result, config=config)
print(type(restored_data["dataframe"]))  # <class 'pandas.core.frame.DataFrame'>

auto_deserialize()

Automatic type detection and intelligent deserialization.

datason.auto_deserialize(obj: Any, aggressive: bool = False, config: Optional[SerializationConfig] = None) -> Any

NEW: Intelligent auto-detection deserialization with heuristics.

Uses pattern recognition and heuristics to automatically detect and restore complex data types without explicit configuration.

Parameters:

Name Type Description Default
obj Any

JSON-compatible object to deserialize

required
aggressive bool

Whether to use aggressive type detection (may have false positives)

False
config Optional[SerializationConfig]

Configuration object to control deserialization behavior

None

Returns:

Type Description
Any

Python object with auto-detected types restored

Examples:

>>> data = {"records": [{"a": 1, "b": 2}, {"a": 3, "b": 4}]}
>>> auto_deserialize(data, aggressive=True)
{"records": DataFrame(...)}  # May detect as DataFrame
>>> # API-compatible UUID handling
>>> from datason.config import get_api_config
>>> auto_deserialize("12345678-1234-5678-9012-123456789abc", config=get_api_config())
"12345678-1234-5678-9012-123456789abc"  # Stays as string
Source code in datason/deserializers_new.py
def auto_deserialize(obj: Any, aggressive: bool = False, config: Optional["SerializationConfig"] = None) -> Any:
    """NEW: Intelligent auto-detection deserialization with heuristics.

    Uses pattern recognition and heuristics to automatically detect and restore
    complex data types without explicit configuration.

    Args:
        obj: JSON-compatible object to deserialize
        aggressive: Whether to use aggressive type detection (may have false positives)
        config: Configuration object to control deserialization behavior

    Returns:
        Python object with auto-detected types restored

    Examples:
        >>> data = {"records": [{"a": 1, "b": 2}, {"a": 3, "b": 4}]}
        >>> auto_deserialize(data, aggressive=True)
        {"records": DataFrame(...)}  # May detect as DataFrame

        >>> # API-compatible UUID handling
        >>> from datason.config import get_api_config
        >>> auto_deserialize("12345678-1234-5678-9012-123456789abc", config=get_api_config())
        "12345678-1234-5678-9012-123456789abc"  # Stays as string
    """
    # ==================================================================================
    # IDEMPOTENCY CHECKS: Prevent double deserialization
    # ==================================================================================

    # IDEMPOTENCY CHECK 1: Check if object is already in final deserialized form
    if _is_already_deserialized(obj):
        return obj

    if obj is None:
        return None

    # Get default config if none provided
    if config is None and _config_available:
        config = get_default_config()

    # Handle type metadata first
    if isinstance(obj, dict) and TYPE_METADATA_KEY in obj:
        return _deserialize_with_type_metadata(obj)

    # Handle basic types
    if isinstance(obj, (int, float, bool)):
        return obj

    # Handle strings with auto-detection
    if isinstance(obj, str):
        return _auto_detect_string_type(obj, aggressive, config)

    # Handle lists with auto-detection
    if isinstance(obj, list):
        deserialized_list = [auto_deserialize(item, aggressive, config) for item in obj]

        if aggressive and pd is not None and _looks_like_series_data(deserialized_list):
            # Try to detect if this should be a pandas Series or DataFrame
            try:
                return pd.Series(deserialized_list)
            except Exception:  # nosec B110
                pass

        return deserialized_list

    # Handle dictionaries with auto-detection
    if isinstance(obj, dict):
        # Check for pandas DataFrame patterns first
        if aggressive and pd is not None and _looks_like_dataframe_dict(obj):
            try:
                return _reconstruct_dataframe(obj)
            except Exception:  # nosec B110
                pass

        # Check for pandas split format
        if pd is not None and _looks_like_split_format(obj):
            try:
                return _reconstruct_from_split(obj)
            except Exception:  # nosec B110
                pass

        # Standard dictionary deserialization
        return {k: auto_deserialize(v, aggressive, config) for k, v in obj.items()}

    return obj

Auto-Detection Example:

# Automatically detect and restore types from JSON
json_data = '{"timestamp": "2024-01-01T12:00:00", "values": [1, 2, 3]}'

# Intelligent type detection
auto_restored = ds.auto_deserialize(json_data)
print(type(auto_restored["timestamp"]))  # <class 'datetime.datetime'>

# Works with complex nested structures
complex_json = ds.serialize({
    "df": pd.DataFrame({"x": [1, 2, 3]}),
    "date": datetime.now(),
    "array": np.array([1, 2, 3])
})

auto_complex = ds.auto_deserialize(complex_json)

safe_deserialize()

Error-resilient deserialization for untrusted or malformed data.

datason.safe_deserialize(json_str: str, allow_pickle: bool = False, **kwargs: Any) -> Any

Safely deserialize a JSON string, handling parse errors gracefully.

Parameters:

Name Type Description Default
json_str str

JSON string to parse and deserialize

required
allow_pickle bool

Whether to allow deserialization of pickle-serialized objects

False
**kwargs Any

Arguments passed to deserialize()

{}

Returns:

Type Description
Any

Deserialized Python object, or the original string if parsing fails

Raises:

Type Description
DeserializationSecurityError

If pickle data is detected and allow_pickle=False

Source code in datason/deserializers_new.py
def safe_deserialize(json_str: str, allow_pickle: bool = False, **kwargs: Any) -> Any:
    """Safely deserialize a JSON string, handling parse errors gracefully.

    Args:
        json_str: JSON string to parse and deserialize
        allow_pickle: Whether to allow deserialization of pickle-serialized objects
        **kwargs: Arguments passed to deserialize()

    Returns:
        Deserialized Python object, or the original string if parsing fails

    Raises:
        DeserializationSecurityError: If pickle data is detected and allow_pickle=False
    """
    import json

    try:
        parsed = json.loads(json_str)

        # Security check for pickle data
        if not allow_pickle and _contains_pickle_data(parsed):
            raise DeserializationSecurityError(
                "Detected pickle-serialized objects which are unsafe to deserialize. "
                "Set allow_pickle=True to override this security check."
            )

        return deserialize(parsed, **kwargs)
    except (json.JSONDecodeError, TypeError, ValueError):
        return json_str

Safe Processing Example:

# Handle potentially malformed data
untrusted_data = '{"timestamp": "invalid-date", "values": [1, "bad", 3]}'

try:
    # Regular deserialization might fail
    result = ds.deserialize(untrusted_data)
except Exception as e:
    # Safe deserialization provides fallbacks
    safe_result = ds.safe_deserialize(untrusted_data)
    print("Safely processed:", safe_result)

# With custom error handling
safe_result = ds.safe_deserialize(
    untrusted_data,
    fallback_values={"timestamp": None, "values": []},
    skip_invalid=True
)

πŸ”§ Configuration System Integration

The core functions work seamlessly with datason's configuration system:

Preset Configurations

# Use predefined configurations for common scenarios
ml_config = ds.get_ml_config()
ml_result = ds.serialize(ml_data, config=ml_config)

api_config = ds.get_api_config()
api_result = ds.serialize(api_data, config=api_config)

strict_config = ds.get_strict_config()
strict_result = ds.serialize(data, config=strict_config)

performance_config = ds.get_performance_config()
fast_result = ds.serialize(data, config=performance_config)

Custom Configuration

# Build custom configurations
custom_config = ds.SerializationConfig(
    # Type handling
    include_type_info=True,
    strict_types=False,
    preserve_numpy_arrays=True,

    # Performance
    compress_arrays=True,
    optimize_memory=True,

    # Data handling
    date_format=ds.DateFormat.TIMESTAMP,
    nan_handling=ds.NanHandling.STRING,
    dataframe_orient=ds.DataFrameOrient.RECORDS,

    # Security
    redact_patterns=["ssn", "password"],
    max_depth=100
)

result = ds.serialize(data, config=custom_config)

πŸ”„ Error Handling Patterns

Graceful Degradation

def robust_serialize(data):
    """Serialize with multiple fallback strategies."""
    try:
        # Try with full configuration
        return ds.serialize(data, config=ds.get_ml_config())
    except MemoryError:
        # Fall back to chunked processing
        return ds.serialize_chunked(data)
    except SecurityError:
        # Fall back to safe mode
        safe_config = ds.SerializationConfig(secure_mode=True)
        return ds.serialize(data, config=safe_config)
    except Exception:
        # Last resort: safe deserialization
        return ds.safe_deserialize(data)

Validation and Recovery

def validate_and_deserialize(serialized_data):
    """Validate data before deserialization."""
    try:
        # First attempt: auto deserialization
        result = ds.auto_deserialize(serialized_data)
        return result
    except ValueError:
        # Second attempt: safe deserialization
        return ds.safe_deserialize(serialized_data)

πŸ“Š Performance Considerations

Function Performance Characteristics

Function Speed Reliability Features
serialize() ⚑⚑ πŸ›‘οΈπŸ›‘οΈπŸ›‘οΈ ⭐⭐⭐
deserialize() ⚑⚑ πŸ›‘οΈπŸ›‘οΈπŸ›‘οΈ ⭐⭐⭐
auto_deserialize() ⚑ πŸ›‘οΈπŸ›‘οΈ ⭐⭐
safe_deserialize() ⚑ πŸ›‘οΈπŸ›‘οΈπŸ›‘οΈπŸ›‘οΈ ⭐

Optimization Tips

# Reuse configurations for better performance
config = ds.get_ml_config()
for batch in data_batches:
    result = ds.serialize(batch, config=config)

# Use appropriate function for your needs
if data_is_trusted:
    result = ds.deserialize(data)  # Fastest
else:
    result = ds.safe_deserialize(data)  # Most reliable