📥 Modern API: Deserialization Functions¶
Progressive complexity load functions for different accuracy and performance needs, including file I/O variants.
🎯 Progressive Complexity Approach¶
Function | Success Rate | Speed | Best For |
---|---|---|---|
load_basic() |
60-70% | ⚡⚡⚡ | Quick exploration |
load_smart() |
80-90% | ⚡⚡ | Production use |
load_perfect() |
100% | ⚡ | Mission-critical |
load_typed() |
95% | ⚡⚡ | Metadata-driven |
FILE OPERATIONS | |||
load_smart_file() |
80-90% | ⚡⚡ | File-based production |
load_perfect_file() |
100% | ⚡ | File-based critical |
stream_load() |
100% | ⚡⚡⚡ | Large file streaming |
🚀 Streaming Deserialization¶
For handling large files that don't fit in memory, use stream_load()
to process data in chunks.
stream_load()¶
Stream data from a file with memory efficiency, supporting both JSONL and JSON array formats.
datason.stream_load(file_path: Union[str, os.PathLike], format: Optional[str] = None, chunk_size: int = 1000, chunk_processor: Optional[callable] = None, buffer_size: int = 8192, **kwargs: Any) -> StreamingDeserializer
¶
Streaming deserialization from file.
Efficiently deserialize large datasets directly from file without loading everything into memory. Supports both JSON and JSONL formats, with optional gzip compression.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
Union[str, PathLike]
|
Path to input file |
required |
format
|
Optional[str]
|
Input format ('jsonl' or 'json') |
None
|
chunk_processor
|
Optional[callable]
|
Optional function to process each deserialized chunk |
None
|
buffer_size
|
int
|
Read buffer size in bytes |
8192
|
**kwargs
|
Any
|
Additional configuration options (passed to deserializer) |
{}
|
Returns:
Type | Description |
---|---|
StreamingDeserializer
|
StreamingDeserializer context manager that can be iterated over |
Examples:
>>> # Process items one at a time (memory efficient)
>>> with stream_load("large_data.jsonl") as stream:
... for item in stream:
... process_item(item)
>>> # Apply custom processing to each item
>>> def process_item(item):
... return {k: v * 2 for k, v in item.items()}
>>>
>>> with stream_load("data.jsonl", chunk_processor=process_item) as stream:
... processed_items = list(stream)
>>> # Handle gzipped files automatically
>>> with stream_load("compressed_data.jsonl.gz") as stream:
... for item in stream:
... process_item(item)
Source code in datason/api.py
Features:
- Memory-efficient processing of large files
- Supports both JSONL and JSON array formats
- Automatic gzip decompression (.gz files)
- Progress tracking with items_yielded
- Optional chunk processing callback
Basic Example:
# Process a large JSONL file efficiently
with ds.stream_load("large_data.jsonl") as stream:
for item in stream:
process_item(item) # Process one item at a time
print(f"Processed {stream.items_yielded} items")
Gzipped JSON Example:
# Process a gzipped JSON file with a chunk processor
def process_chunk(chunk):
chunk["processed"] = True
return chunk
with ds.stream_load("data.json.gz", format="json", chunk_processor=process_chunk) as stream:
results = list(stream) # Process all items with chunk processing
Performance Considerations:
- Uses buffered I/O for efficient file reading
- Processes one item at a time to minimize memory usage
- Automatically detects and handles gzip compression
- Supports both string paths and pathlib.Path objects
Example Script:
For a complete working example that demonstrates different ways to use stream_load()
, see the streaming_example.py in the examples directory. You can run it with:
This example shows:
- Basic streaming from JSONL files
- Streaming with chunk processing
- Handling gzipped files
- Progress tracking and item counting
📦 Detailed Function Documentation¶
load_basic()¶
Fast, basic deserialization for exploration and testing.
datason.load_basic(data: Any) -> Any
¶
Parse JSON string or deserialize parsed data to Python object.
Primary API expected by datason-benchmark for measuring deserialization performance. Also supports already-parsed data for API compatibility.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
Any
|
JSON string to parse or already-parsed data to deserialize |
required |
Returns:
Type | Description |
---|---|
Any
|
Parsed Python object |
Example
obj = datason.load_basic('{"key": "value"}') assert obj == {"key": "value"} obj = datason.load_basic({"key": "value"}) # Also works assert obj == {"key": "value"}
Source code in datason/__init__.py
Quick Exploration Example:
# Fast loading for data exploration
json_data = '{"values": [1, 2, 3], "timestamp": "2024-01-01T12:00:00"}'
basic_data = ds.load_basic(json_data)
# Basic types only, minimal processing
load_smart()¶
Intelligent deserialization with good accuracy for production use.
datason.load_smart(data: Any, config: Optional[SerializationConfig] = None, **kwargs: Any) -> Any
¶
Smart deserialization with auto-detection and heuristics.
Combines automatic type detection with heuristic fallbacks. Good balance of accuracy and performance for most use cases.
Success rate: ~80-90% for complex objects Speed: Moderate Use case: General purpose, production data processing
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
Any
|
Serialized data to deserialize |
required |
config
|
Optional[SerializationConfig]
|
Configuration for deserialization behavior |
None
|
**kwargs
|
Any
|
Additional options |
{}
|
Returns:
Type | Description |
---|---|
Any
|
Deserialized Python object with improved type fidelity |
Example
serialized = dump_api(complex_object) result = load_smart(serialized)
Better type reconstruction than load_basic¶
Source code in datason/api.py
Production Example:
# Intelligent type detection for production
smart_data = ds.load_smart(json_data)
print(type(smart_data["timestamp"])) # <class 'datetime.datetime'>
load_perfect()¶
Perfect accuracy deserialization using templates for mission-critical applications.
datason.load_perfect(data: Any, template: Any, **kwargs: Any) -> Any
¶
Perfect deserialization using template matching.
Uses a template object to achieve 100% accurate reconstruction. Requires you to provide the structure/type information but guarantees perfect fidelity.
Success rate: 100% when template matches data Speed: Fast (direct template matching) Use case: Critical applications, ML model loading, exact reconstruction
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
Any
|
Serialized data to deserialize |
required |
template
|
Any
|
Template object showing expected structure/types |
required |
**kwargs
|
Any
|
Additional options |
{}
|
Returns:
Type | Description |
---|---|
Any
|
Perfectly reconstructed Python object matching template |
Example
original = MyComplexClass(...) serialized = dump_ml(original) template = MyComplexClass.get_template() # or original itself result = load_perfect(serialized, template)
Guaranteed perfect reconstruction¶
Source code in datason/api.py
Mission-Critical Example:
# Define expected structure
template = {
"values": [int],
"timestamp": datetime,
"metadata": {"version": float}
}
# 100% reliable restoration
perfect_data = ds.load_perfect(json_data, template)
load_typed()¶
High-accuracy deserialization using embedded type metadata.
datason.load_typed(data: Any, config: Optional[SerializationConfig] = None, **kwargs: Any) -> Any
¶
Metadata-based type reconstruction.
Uses embedded type metadata from serialization to reconstruct objects. Requires data was serialized with type information preserved.
Success rate: ~95% when metadata available Speed: Fast (direct metadata lookup) Use case: When you control both serialization and deserialization
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
Any
|
Serialized data with embedded type metadata |
required |
config
|
Optional[SerializationConfig]
|
Configuration for type reconstruction |
None
|
**kwargs
|
Any
|
Additional options |
{}
|
Returns:
Type | Description |
---|---|
Any
|
Type-accurate deserialized Python object |
Example
Works best with datason-serialized data¶
serialized = dump(original_object) # Preserves type info result = load_typed(serialized)
High fidelity reconstruction using embedded metadata¶
Source code in datason/api.py
Metadata-Driven Example:
# Use embedded type information
typed_data = ds.load_typed(data_with_types)
# Uses metadata for accurate restoration
🗃️ File Operations Functions¶
load_smart_file()¶
Smart file loading with automatic format detection and good accuracy.
datason.load_smart_file(path: Union[str, Path], *, format: Optional[str] = None, **kwargs: Any) -> Iterator[Any]
¶
Load data from JSON/JSONL file using smart deserialization.
Good balance of accuracy and performance for most use cases. Success rate: ~80-90% for complex objects.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
Union[str, Path]
|
Input file path |
required |
format
|
Optional[str]
|
Explicit format ('json' or 'jsonl'), auto-detected if None |
None
|
**kwargs
|
Any
|
Additional deserialization options |
{}
|
Returns:
Type | Description |
---|---|
Iterator[Any]
|
Iterator of deserialized objects |
Examples:
>>> # Load from JSONL (yields each line)
>>> for item in load_smart_file("data.jsonl"):
... process(item)
>>>
>>> # Load from JSON (yields each item in array, or single item)
>>> for item in load_smart_file("data.json"):
... process(item)
>>>
>>> # Or load all at once
>>> data = list(load_smart_file("data.jsonl"))
Source code in datason/api.py
File-Based Production Example:
# Automatic format detection (.json, .jsonl, .gz)
data = ds.load_smart_file("experiment.json")
jsonl_data = ds.load_smart_file("training_logs.jsonl")
compressed_data = ds.load_smart_file("model.json.gz")
# Smart type reconstruction for production use
ml_data = ds.load_smart_file("model_checkpoint.json")
load_perfect_file()¶
Perfect file loading using templates for mission-critical applications.
datason.load_perfect_file(path: Union[str, Path], template: Any, *, format: Optional[str] = None, **kwargs: Any) -> Iterator[Any]
¶
Load data from JSON/JSONL file using perfect template-based deserialization.
Uses template for 100% accurate reconstruction. Success rate: 100% when template matches data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
Union[str, Path]
|
Input file path |
required |
template
|
Any
|
Template object showing expected structure/types |
required |
format
|
Optional[str]
|
Explicit format ('json' or 'jsonl'), auto-detected if None |
None
|
**kwargs
|
Any
|
Additional template options |
{}
|
Returns:
Type | Description |
---|---|
Iterator[Any]
|
Iterator of perfectly reconstructed objects |
Examples:
>>> template = {"weights": np.array([0.0]), "epoch": 0}
>>>
>>> # Perfect loading from JSONL
>>> for item in load_perfect_file("training.jsonl", template):
... assert isinstance(item["weights"], np.ndarray)
>>>
>>> # Perfect loading from JSON
>>> for item in load_perfect_file("training.json", template):
... assert isinstance(item["weights"], np.ndarray)
Source code in datason/api.py
File-Based Critical Example:
import torch
import numpy as np
# Define expected ML structure
ml_template = {
"model": torch.nn.Linear(10, 1),
"weights": torch.randn(100, 50),
"features": np.random.random((1000, 20)),
"metadata": {"accuracy": 0.0}
}
# 100% reliable ML reconstruction from file
perfect_ml = ds.load_perfect_file("experiment.json", ml_template)
# Works with JSONL files too
for item in ds.load_perfect_file("training_log.jsonl", ml_template):
process_item(item)
🔄 Choosing the Right Load Function¶
Decision Matrix¶
# Choose based on your needs:
# Exploration phase - speed matters most
data = ds.load_basic(json_string)
# Development/testing - good balance
data = ds.load_smart(json_string)
# Production - reliability critical
data = ds.load_perfect(json_string, template)
# Has embedded types - leverage metadata
data = ds.load_typed(json_string)
🔗 Related Documentation¶
- Serialization Functions - Corresponding dump functions
- Template System - Creating templates for perfect loading
- Modern API Overview - Complete modern API guide