Skip to content

📥 Modern API: Deserialization Functions

Progressive complexity load functions for different accuracy and performance needs, including file I/O variants.

🎯 Progressive Complexity Approach

Function Success Rate Speed Best For
load_basic() 60-70% ⚡⚡⚡ Quick exploration
load_smart() 80-90% ⚡⚡ Production use
load_perfect() 100% Mission-critical
load_typed() 95% ⚡⚡ Metadata-driven
FILE OPERATIONS
load_smart_file() 80-90% ⚡⚡ File-based production
load_perfect_file() 100% File-based critical
stream_load() 100% ⚡⚡⚡ Large file streaming

🚀 Streaming Deserialization

For handling large files that don't fit in memory, use stream_load() to process data in chunks.

stream_load()

Stream data from a file with memory efficiency, supporting both JSONL and JSON array formats.

datason.stream_load(file_path: Union[str, os.PathLike], format: Optional[str] = None, chunk_size: int = 1000, chunk_processor: Optional[callable] = None, buffer_size: int = 8192, **kwargs: Any) -> StreamingDeserializer

Streaming deserialization from file.

Efficiently deserialize large datasets directly from file without loading everything into memory. Supports both JSON and JSONL formats, with optional gzip compression.

Parameters:

Name Type Description Default
file_path Union[str, PathLike]

Path to input file

required
format Optional[str]

Input format ('jsonl' or 'json')

None
chunk_processor Optional[callable]

Optional function to process each deserialized chunk

None
buffer_size int

Read buffer size in bytes

8192
**kwargs Any

Additional configuration options (passed to deserializer)

{}

Returns:

Type Description
StreamingDeserializer

StreamingDeserializer context manager that can be iterated over

Examples:

>>> # Process items one at a time (memory efficient)
>>> with stream_load("large_data.jsonl") as stream:
...     for item in stream:
...         process_item(item)
>>> # Apply custom processing to each item
>>> def process_item(item):
...     return {k: v * 2 for k, v in item.items()}
>>>
>>> with stream_load("data.jsonl", chunk_processor=process_item) as stream:
...     processed_items = list(stream)
>>> # Handle gzipped files automatically
>>> with stream_load("compressed_data.jsonl.gz") as stream:
...     for item in stream:
...         process_item(item)
Source code in datason/api.py
def stream_load(
    file_path: Union[str, os.PathLike],
    format: Optional[str] = None,
    chunk_size: int = 1000,
    chunk_processor: Optional[callable] = None,
    buffer_size: int = 8192,
    **kwargs: Any,
) -> StreamingDeserializer:
    """Streaming deserialization from file.

    Efficiently deserialize large datasets directly from file without
    loading everything into memory. Supports both JSON and JSONL formats,
    with optional gzip compression.

    Args:
        file_path: Path to input file
        format: Input format ('jsonl' or 'json')
        chunk_processor: Optional function to process each deserialized chunk
        buffer_size: Read buffer size in bytes
        **kwargs: Additional configuration options (passed to deserializer)

    Returns:
        StreamingDeserializer context manager that can be iterated over

    Examples:
        >>> # Process items one at a time (memory efficient)
        >>> with stream_load("large_data.jsonl") as stream:
        ...     for item in stream:
        ...         process_item(item)

        >>> # Apply custom processing to each item
        >>> def process_item(item):
        ...     return {k: v * 2 for k, v in item.items()}
        >>>
        >>> with stream_load("data.jsonl", chunk_processor=process_item) as stream:
        ...     processed_items = list(stream)

        >>> # Handle gzipped files automatically
        >>> with stream_load("compressed_data.jsonl.gz") as stream:
        ...     for item in stream:
        ...         process_item(item)
    """
    # Detect format if not provided
    detected_format = _detect_file_format(file_path, format)

    return stream_deserialize(
        file_path=file_path,
        format=detected_format,
        chunk_processor=chunk_processor,
        buffer_size=buffer_size,
        **kwargs,
    )

Features:

  • Memory-efficient processing of large files
  • Supports both JSONL and JSON array formats
  • Automatic gzip decompression (.gz files)
  • Progress tracking with items_yielded
  • Optional chunk processing callback

Basic Example:

# Process a large JSONL file efficiently
with ds.stream_load("large_data.jsonl") as stream:
    for item in stream:
        process_item(item)  # Process one item at a time
    print(f"Processed {stream.items_yielded} items")

Gzipped JSON Example:

# Process a gzipped JSON file with a chunk processor
def process_chunk(chunk):
    chunk["processed"] = True
    return chunk

with ds.stream_load("data.json.gz", format="json", chunk_processor=process_chunk) as stream:
    results = list(stream)  # Process all items with chunk processing

Performance Considerations:

  • Uses buffered I/O for efficient file reading
  • Processes one item at a time to minimize memory usage
  • Automatically detects and handles gzip compression
  • Supports both string paths and pathlib.Path objects

Example Script:

For a complete working example that demonstrates different ways to use stream_load(), see the streaming_example.py in the examples directory. You can run it with:

python examples/streaming_example.py

This example shows:

  • Basic streaming from JSONL files
  • Streaming with chunk processing
  • Handling gzipped files
  • Progress tracking and item counting

📦 Detailed Function Documentation

load_basic()

Fast, basic deserialization for exploration and testing.

datason.load_basic(data: Any) -> Any

Parse JSON string or deserialize parsed data to Python object.

Primary API expected by datason-benchmark for measuring deserialization performance. Also supports already-parsed data for API compatibility.

Parameters:

Name Type Description Default
data Any

JSON string to parse or already-parsed data to deserialize

required

Returns:

Type Description
Any

Parsed Python object

Example

obj = datason.load_basic('{"key": "value"}') assert obj == {"key": "value"} obj = datason.load_basic({"key": "value"}) # Also works assert obj == {"key": "value"}

Source code in datason/__init__.py
def load_basic(data: Any) -> Any:
    """Parse JSON string or deserialize parsed data to Python object.

    Primary API expected by datason-benchmark for measuring deserialization performance.
    Also supports already-parsed data for API compatibility.

    Args:
        data: JSON string to parse or already-parsed data to deserialize

    Returns:
        Parsed Python object

    Example:
        >>> obj = datason.load_basic('{"key": "value"}')
        >>> assert obj == {"key": "value"}
        >>> obj = datason.load_basic({"key": "value"})  # Also works
        >>> assert obj == {"key": "value"}
    """
    # Use loads_json and deserialize with profiling support
    from ._profiling import profile_run, stage
    from .api import loads_json
    from .deserializers_new import deserialize

    with profile_run():
        if isinstance(data, (str, bytes)):
            with stage("load_basic_json"):
                return loads_json(data)
        else:
            # For already-parsed data, use deserialize
            with stage("load_basic_deserialize"):
                return deserialize(data)

Quick Exploration Example:

# Fast loading for data exploration
json_data = '{"values": [1, 2, 3], "timestamp": "2024-01-01T12:00:00"}'
basic_data = ds.load_basic(json_data)
# Basic types only, minimal processing

load_smart()

Intelligent deserialization with good accuracy for production use.

datason.load_smart(data: Any, config: Optional[SerializationConfig] = None, **kwargs: Any) -> Any

Smart deserialization with auto-detection and heuristics.

Combines automatic type detection with heuristic fallbacks. Good balance of accuracy and performance for most use cases.

Success rate: ~80-90% for complex objects Speed: Moderate Use case: General purpose, production data processing

Parameters:

Name Type Description Default
data Any

Serialized data to deserialize

required
config Optional[SerializationConfig]

Configuration for deserialization behavior

None
**kwargs Any

Additional options

{}

Returns:

Type Description
Any

Deserialized Python object with improved type fidelity

Example

serialized = dump_api(complex_object) result = load_smart(serialized)

Better type reconstruction than load_basic

Source code in datason/api.py
def load_smart(data: Any, config: Optional[SerializationConfig] = None, **kwargs: Any) -> Any:
    """Smart deserialization with auto-detection and heuristics.

    Combines automatic type detection with heuristic fallbacks.
    Good balance of accuracy and performance for most use cases.

    Success rate: ~80-90% for complex objects
    Speed: Moderate
    Use case: General purpose, production data processing

    Args:
        data: Serialized data to deserialize
        config: Configuration for deserialization behavior
        **kwargs: Additional options

    Returns:
        Deserialized Python object with improved type fidelity

    Example:
        >>> serialized = dump_api(complex_object)
        >>> result = load_smart(serialized)
        >>> # Better type reconstruction than load_basic
    """
    if config is None:
        config = SerializationConfig(auto_detect_types=True)
    return deserialize_fast(data, config=config, **kwargs)

Production Example:

# Intelligent type detection for production
smart_data = ds.load_smart(json_data)
print(type(smart_data["timestamp"]))  # <class 'datetime.datetime'>

load_perfect()

Perfect accuracy deserialization using templates for mission-critical applications.

datason.load_perfect(data: Any, template: Any, **kwargs: Any) -> Any

Perfect deserialization using template matching.

Uses a template object to achieve 100% accurate reconstruction. Requires you to provide the structure/type information but guarantees perfect fidelity.

Success rate: 100% when template matches data Speed: Fast (direct template matching) Use case: Critical applications, ML model loading, exact reconstruction

Parameters:

Name Type Description Default
data Any

Serialized data to deserialize

required
template Any

Template object showing expected structure/types

required
**kwargs Any

Additional options

{}

Returns:

Type Description
Any

Perfectly reconstructed Python object matching template

Example

original = MyComplexClass(...) serialized = dump_ml(original) template = MyComplexClass.get_template() # or original itself result = load_perfect(serialized, template)

Guaranteed perfect reconstruction

Source code in datason/api.py
def load_perfect(data: Any, template: Any, **kwargs: Any) -> Any:
    """Perfect deserialization using template matching.

    Uses a template object to achieve 100% accurate reconstruction.
    Requires you to provide the structure/type information but
    guarantees perfect fidelity.

    Success rate: 100% when template matches data
    Speed: Fast (direct template matching)
    Use case: Critical applications, ML model loading, exact reconstruction

    Args:
        data: Serialized data to deserialize
        template: Template object showing expected structure/types
        **kwargs: Additional options

    Returns:
        Perfectly reconstructed Python object matching template

    Example:
        >>> original = MyComplexClass(...)
        >>> serialized = dump_ml(original)
        >>> template = MyComplexClass.get_template()  # or original itself
        >>> result = load_perfect(serialized, template)
        >>> # Guaranteed perfect reconstruction
    """
    return deserialize_with_template(data, template, **kwargs)

Mission-Critical Example:

# Define expected structure
template = {
    "values": [int],
    "timestamp": datetime,
    "metadata": {"version": float}
}

# 100% reliable restoration
perfect_data = ds.load_perfect(json_data, template)

load_typed()

High-accuracy deserialization using embedded type metadata.

datason.load_typed(data: Any, config: Optional[SerializationConfig] = None, **kwargs: Any) -> Any

Metadata-based type reconstruction.

Uses embedded type metadata from serialization to reconstruct objects. Requires data was serialized with type information preserved.

Success rate: ~95% when metadata available Speed: Fast (direct metadata lookup) Use case: When you control both serialization and deserialization

Parameters:

Name Type Description Default
data Any

Serialized data with embedded type metadata

required
config Optional[SerializationConfig]

Configuration for type reconstruction

None
**kwargs Any

Additional options

{}

Returns:

Type Description
Any

Type-accurate deserialized Python object

Example

Works best with datason-serialized data

serialized = dump(original_object) # Preserves type info result = load_typed(serialized)

High fidelity reconstruction using embedded metadata

Source code in datason/api.py
def load_typed(data: Any, config: Optional[SerializationConfig] = None, **kwargs: Any) -> Any:
    """Metadata-based type reconstruction.

    Uses embedded type metadata from serialization to reconstruct objects.
    Requires data was serialized with type information preserved.

    Success rate: ~95% when metadata available
    Speed: Fast (direct metadata lookup)
    Use case: When you control both serialization and deserialization

    Args:
        data: Serialized data with embedded type metadata
        config: Configuration for type reconstruction
        **kwargs: Additional options

    Returns:
        Type-accurate deserialized Python object

    Example:
        >>> # Works best with datason-serialized data
        >>> serialized = dump(original_object)  # Preserves type info
        >>> result = load_typed(serialized)
        >>> # High fidelity reconstruction using embedded metadata
    """
    if config is None:
        config = get_strict_config()  # Use strict config for best type preservation
    return deserialize_fast(data, config=config, **kwargs)

Metadata-Driven Example:

# Use embedded type information
typed_data = ds.load_typed(data_with_types)
# Uses metadata for accurate restoration

🗃️ File Operations Functions

load_smart_file()

Smart file loading with automatic format detection and good accuracy.

datason.load_smart_file(path: Union[str, Path], *, format: Optional[str] = None, **kwargs: Any) -> Iterator[Any]

Load data from JSON/JSONL file using smart deserialization.

Good balance of accuracy and performance for most use cases. Success rate: ~80-90% for complex objects.

Parameters:

Name Type Description Default
path Union[str, Path]

Input file path

required
format Optional[str]

Explicit format ('json' or 'jsonl'), auto-detected if None

None
**kwargs Any

Additional deserialization options

{}

Returns:

Type Description
Iterator[Any]

Iterator of deserialized objects

Examples:

>>> # Load from JSONL (yields each line)
>>> for item in load_smart_file("data.jsonl"):
...     process(item)
>>>
>>> # Load from JSON (yields each item in array, or single item)
>>> for item in load_smart_file("data.json"):
...     process(item)
>>>
>>> # Or load all at once
>>> data = list(load_smart_file("data.jsonl"))
Source code in datason/api.py
def load_smart_file(path: Union[str, Path], *, format: Optional[str] = None, **kwargs: Any) -> Iterator[Any]:
    """Load data from JSON/JSONL file using smart deserialization.

    Good balance of accuracy and performance for most use cases.
    Success rate: ~80-90% for complex objects.

    Args:
        path: Input file path
        format: Explicit format ('json' or 'jsonl'), auto-detected if None
        **kwargs: Additional deserialization options

    Returns:
        Iterator of deserialized objects

    Examples:
        >>> # Load from JSONL (yields each line)
        >>> for item in load_smart_file("data.jsonl"):
        ...     process(item)
        >>>
        >>> # Load from JSON (yields each item in array, or single item)
        >>> for item in load_smart_file("data.json"):
        ...     process(item)
        >>>
        >>> # Or load all at once
        >>> data = list(load_smart_file("data.jsonl"))
    """
    # Check for large files (10MB threshold)
    try:
        file_size = os.path.getsize(path)
        if file_size > 10 * 1024 * 1024:  # 10MB
            warnings.warn(
                f"Loading large file ({file_size / 1024 / 1024:.1f}MB). "
                "Consider using stream_load() for better memory efficiency.",
                ResourceWarning,
                stacklevel=2,
            )
    except (OSError, TypeError):
        pass  # Skip size check if we can't determine file size

    config = SerializationConfig(auto_detect_types=True)
    for raw_item in _load_from_file(path, config, format):
        yield load_smart(raw_item, config, **kwargs)

File-Based Production Example:

# Automatic format detection (.json, .jsonl, .gz)
data = ds.load_smart_file("experiment.json")
jsonl_data = ds.load_smart_file("training_logs.jsonl")
compressed_data = ds.load_smart_file("model.json.gz")

# Smart type reconstruction for production use
ml_data = ds.load_smart_file("model_checkpoint.json")

load_perfect_file()

Perfect file loading using templates for mission-critical applications.

datason.load_perfect_file(path: Union[str, Path], template: Any, *, format: Optional[str] = None, **kwargs: Any) -> Iterator[Any]

Load data from JSON/JSONL file using perfect template-based deserialization.

Uses template for 100% accurate reconstruction. Success rate: 100% when template matches data.

Parameters:

Name Type Description Default
path Union[str, Path]

Input file path

required
template Any

Template object showing expected structure/types

required
format Optional[str]

Explicit format ('json' or 'jsonl'), auto-detected if None

None
**kwargs Any

Additional template options

{}

Returns:

Type Description
Iterator[Any]

Iterator of perfectly reconstructed objects

Examples:

>>> template = {"weights": np.array([0.0]), "epoch": 0}
>>>
>>> # Perfect loading from JSONL
>>> for item in load_perfect_file("training.jsonl", template):
...     assert isinstance(item["weights"], np.ndarray)
>>>
>>> # Perfect loading from JSON
>>> for item in load_perfect_file("training.json", template):
...     assert isinstance(item["weights"], np.ndarray)
Source code in datason/api.py
def load_perfect_file(
    path: Union[str, Path], template: Any, *, format: Optional[str] = None, **kwargs: Any
) -> Iterator[Any]:
    """Load data from JSON/JSONL file using perfect template-based deserialization.

    Uses template for 100% accurate reconstruction.
    Success rate: 100% when template matches data.

    Args:
        path: Input file path
        template: Template object showing expected structure/types
        format: Explicit format ('json' or 'jsonl'), auto-detected if None
        **kwargs: Additional template options

    Returns:
        Iterator of perfectly reconstructed objects

    Examples:
        >>> template = {"weights": np.array([0.0]), "epoch": 0}
        >>>
        >>> # Perfect loading from JSONL
        >>> for item in load_perfect_file("training.jsonl", template):
        ...     assert isinstance(item["weights"], np.ndarray)
        >>>
        >>> # Perfect loading from JSON
        >>> for item in load_perfect_file("training.json", template):
        ...     assert isinstance(item["weights"], np.ndarray)
    """
    for raw_item in _load_from_file(path, format=format):
        yield load_perfect(raw_item, template, **kwargs)

File-Based Critical Example:

import torch
import numpy as np

# Define expected ML structure
ml_template = {
    "model": torch.nn.Linear(10, 1),
    "weights": torch.randn(100, 50),
    "features": np.random.random((1000, 20)),
    "metadata": {"accuracy": 0.0}
}

# 100% reliable ML reconstruction from file
perfect_ml = ds.load_perfect_file("experiment.json", ml_template)

# Works with JSONL files too
for item in ds.load_perfect_file("training_log.jsonl", ml_template):
    process_item(item)

🔄 Choosing the Right Load Function

Decision Matrix

# Choose based on your needs:

# Exploration phase - speed matters most
data = ds.load_basic(json_string)

# Development/testing - good balance
data = ds.load_smart(json_string)  

# Production - reliability critical
data = ds.load_perfect(json_string, template)

# Has embedded types - leverage metadata
data = ds.load_typed(json_string)