Template-Based Deserialization (v0.4.5)¶
Template-based deserialization in datason v0.4.5 provides type-guided reconstruction of complex data structures. This feature ensures consistent data types and structures when deserializing, making it ideal for ML pipelines, API contracts, and data validation scenarios.
Overview¶
Traditional deserialization relies on heuristics to guess data types. Template-based deserialization uses a reference object (template) to guide the process, ensuring:
- Consistent type reconstruction
- Validation of data structure
- ML-optimized round-trip fidelity
- Automatic type coercion with templates
๐ฏ NEW in v0.5.5: Comprehensive Type Support & 4-Mode Testing¶
Enhanced Type Coverage (34 Test Cases)¶
Our template deserializer now supports 100% successful reconstruction for:
Core Data Types¶
- Basic Types:
str
,int
,float
,bool
,list
,dict
- Complex Types:
complex
,decimal.Decimal
,uuid.UUID
,pathlib.Path
,datetime
Scientific Computing Types ๐¶
- NumPy Types:
np.int32
,np.float64
,np.bool_
,np.ndarray
(any shape/dtype) - PyTorch Types:
torch.Tensor
(any shape/dtype) - Scikit-learn Types: Fitted models (
LogisticRegression
,RandomForestClassifier
, etc.)
4-Mode Detection Strategy Testing ๐¶
Each type is systematically tested across all 4 detection strategies:
- User Config/Template (100% success target) โ
- Auto Hints (80-90% success expected) โ
- Heuristics Only (best effort) โ
- Hot Path (fast, basic) โ
Deterministic Behavior Guarantee¶
- Predictable type conversion across all modes
- No randomness in type detection
- Consistent results for the same input across runs
- Mode-specific expectations clearly documented
Key Features¶
1. Template-Guided Deserialization¶
Use existing objects as templates to guide deserialization:
import datason
from datason.deserializers import TemplateDeserializer
from datetime import datetime
# Define template with expected structure and types
template = {
'user_id': 0,
'name': '',
'created': datetime.now(),
'active': True,
'score': 0.0
}
# Serialized data with string representations
serialized_data = {
'user_id': '123',
'name': 'Alice',
'created': '2023-12-25T10:30:45',
'active': 'true',
'score': '95.5'
}
# Deserialize with template guidance
deserializer = TemplateDeserializer(template)
result = deserializer.deserialize(serialized_data)
# Result has correct types:
# result['user_id'] is int(123)
# result['created'] is datetime object
# result['active'] is bool(True)
# result['score'] is float(95.5)
2. NEW: Scientific Computing Templates ๐¶
Perfect reconstruction for NumPy, PyTorch, and scikit-learn objects:
import numpy as np
import torch
from sklearn.linear_model import LogisticRegression
from datason.deserializers import deserialize_with_template
# NumPy template preservation
numpy_template = np.array([1.0, 2.0, 3.0], dtype=np.float32)
serialized = datason.serialize(numpy_template)
reconstructed = deserialize_with_template(serialized, numpy_template)
assert reconstructed.dtype == np.float32 # Exact dtype preserved
assert reconstructed.shape == (3,) # Shape preserved
# PyTorch template preservation
torch_template = torch.tensor([[1.0, 2.0], [3.0, 4.0]], dtype=torch.float64)
serialized = datason.serialize(torch_template)
reconstructed = deserialize_with_template(serialized, torch_template)
assert reconstructed.dtype == torch.float64 # Exact dtype preserved
assert reconstructed.shape == (2, 2) # Shape preserved
# Scikit-learn template preservation
model_template = LogisticRegression(random_state=42)
serialized = datason.serialize(model_template)
reconstructed = deserialize_with_template(serialized, model_template)
assert reconstructed.get_params() == model_template.get_params() # Perfect params match
3. NEW: 4-Mode Detection Strategy ๐¶
Understand exactly how each type behaves across all detection modes:
import numpy as np
from datason.deserializers import deserialize_with_template
from datason import deserialize, deserialize_fast
# Original NumPy scalar
original = np.int32(42)
serialized = datason.serialize(original)
# Mode 1: User Config/Template (100% success guarantee)
template_result = deserialize_with_template(serialized, original)
assert type(template_result) is np.int32 # Exact type preserved
# Mode 2: Auto Hints (when type metadata is available)
hints_result = deserialize(serialized) # May preserve or convert
# Result depends on metadata availability
# Mode 3: Heuristics (best effort type detection)
heuristics_result = deserialize(serialized)
assert type(heuristics_result) is int # Deterministic: np.int32 โ int
# Mode 4: Hot Path (fast, basic type conversion)
hot_path_result = deserialize_fast(serialized)
assert type(hot_path_result) is int # Deterministic: np.int32 โ int
4. Automatic Template Inference¶
Generate templates from sample data:
from datason.deserializers import infer_template_from_data
# Sample data for template inference
sample_data = [
{'name': 'Alice', 'age': 30, 'created': '2023-01-01T10:00:00'},
{'name': 'Bob', 'age': 25, 'created': '2023-01-02T11:00:00'},
{'name': 'Charlie', 'age': 35, 'created': '2023-01-03T12:00:00'}
]
# Infer template from samples
template = infer_template_from_data(sample_data)
# Use inferred template for consistent deserialization
deserializer = TemplateDeserializer(template)
new_data = {'name': 'Diana', 'age': '28', 'created': '2023-01-04T13:00:00'}
result = deserializer.deserialize(new_data)
5. ML-Optimized Templates¶
Create templates specifically for machine learning workflows:
from datason.deserializers import create_ml_round_trip_template
import pandas as pd
# Training data
training_df = pd.DataFrame({
'feature1': [1.0, 2.0, 3.0],
'feature2': [10, 20, 30],
'target': ['class_a', 'class_b', 'class_a']
})
# Create ML-optimized template
ml_template = create_ml_round_trip_template(training_df)
# Template includes ML-specific metadata:
# - DataFrame structure and dtypes
# - Shape information
# - Index details
# - ML-specific optimization flags
print(ml_template['__ml_template__']) # True
print(ml_template['structure_type']) # 'dataframe'
print(ml_template['dtypes']) # Column type mapping
6. Convenience Functions¶
Simple template-based deserialization:
from datason.deserializers import deserialize_with_template
# One-line template deserialization
template = {'id': 0, 'value': 0.0, 'name': ''}
data = {'id': '42', 'value': '3.14', 'name': 'test'}
result = deserialize_with_template(data, template)
# Automatically applies template and returns typed result
Supported Template Types¶
Basic Data Types¶
template = {
'int_field': 0,
'float_field': 0.0,
'str_field': '',
'bool_field': True
}
# Handles automatic type coercion
data = {
'int_field': '42', # str โ int
'float_field': '3.14', # str โ float
'str_field': 'hello', # str โ str
'bool_field': 'true' # str โ bool
}
NEW: Scientific Computing Templates ๐¶
NumPy Templates¶
import numpy as np
# NumPy scalar templates
int32_template = np.int32(0)
float64_template = np.float64(0.0)
bool_template = np.bool_(True)
# NumPy array templates (any shape, any dtype)
array_template = np.array([[1.0, 2.0], [3.0, 4.0]], dtype=np.float32)
# Data with string representations
data = {
'scalar': '42',
'array': [[1.5, 2.5], [3.5, 4.5]]
}
# Template-guided reconstruction preserves exact NumPy types
result = deserialize_with_template(data, {
'scalar': int32_template,
'array': array_template
})
assert result['scalar'].dtype == np.int32
assert result['array'].dtype == np.float32
PyTorch Templates¶
import torch
# PyTorch tensor templates
tensor_template = torch.tensor([1.0, 2.0, 3.0], dtype=torch.float64)
# Data to reconstruct
data = {'tensor': [1.5, 2.5, 3.5]}
result = deserialize_with_template(data, {'tensor': tensor_template})
assert result['tensor'].dtype == torch.float64
assert torch.equal(result['tensor'], torch.tensor([1.5, 2.5, 3.5], dtype=torch.float64))
Scikit-learn Templates¶
from sklearn.linear_model import LogisticRegression
# Model template
model_template = LogisticRegression(random_state=42, max_iter=1000)
# Serialized model data
serialized_model = datason.serialize(model_template)
# Perfect reconstruction
reconstructed = deserialize_with_template(serialized_model, model_template)
assert reconstructed.get_params() == model_template.get_params()
DateTime and UUID Templates¶
from datetime import datetime
from uuid import uuid4
template = {
'timestamp': datetime.now(),
'unique_id': uuid4()
}
data = {
'timestamp': '2023-12-25T10:30:45',
'unique_id': '12345678-1234-5678-9012-123456789abc'
}
result = deserialize_with_template(data, template)
# result['timestamp'] is datetime object
# result['unique_id'] is UUID object
DataFrame Templates¶
import pandas as pd
# Template DataFrame with specific dtypes
template_df = pd.DataFrame({
'id': pd.Series([1], dtype='int32'),
'value': pd.Series([0.0], dtype='float64'),
'category': pd.Series([''], dtype='category')
})
# Data to deserialize
data_records = [
{'id': 1, 'value': 10.5, 'category': 'A'},
{'id': 2, 'value': 20.3, 'category': 'B'}
]
result = deserialize_with_template(data_records, template_df)
# result is DataFrame with template's dtypes preserved
Nested Structure Templates¶
template = {
'user': {
'id': 0,
'profile': {
'name': '',
'created': datetime.now()
}
},
'data': [{'key': '', 'value': 0.0}]
}
nested_data = {
'user': {
'id': '123',
'profile': {
'name': 'Alice',
'created': '2023-01-01T10:00:00'
}
},
'data': [
{'key': 'metric1', 'value': '42.5'},
{'key': 'metric2', 'value': '38.2'}
]
}
result = deserialize_with_template(nested_data, template)
# All nested types are correctly reconstructed
Testing & Quality Assurance ๐งช¶
Comprehensive Test Coverage¶
- 34 integration tests covering all supported types
- 4-mode behavior verification for each type
- 100% user config success rate guaranteed
- Deterministic behavior across all detection strategies
Test Results Summary¶
TEMPLATE DESERIALIZER INTEGRATION TEST COVERAGE
============================================================
Basic Types: 6 types (100% expected success in user config)
Complex Types: 5 types (100% expected success in user config)
NumPy Types: 4 types (NEW: 100% user config!)
PyTorch Types: 1 types (NEW: 100% user config!)
Sklearn Types: 1 types (NEW: 100% user config!)
Total Coverage: 17+ types with systematic 4-mode testing
๐ฏ USER CONFIG ACHIEVEMENT: 100% success rate verified!
โก All 4 detection modes tested with realistic expectations
๐ Deterministic behavior verified across all modes
============================================================
Performance Characteristics¶
Mode-Specific Performance¶
- User Config/Template: Highest accuracy, moderate speed
- Auto Hints: Good accuracy, good speed (when metadata available)
- Heuristics: Variable accuracy, good speed
- Hot Path: Basic accuracy, highest speed
Optimization Recommendations¶
- Use templates for critical type preservation (ML pipelines)
- Use auto hints for balanced accuracy/performance
- Use heuristics for general-purpose deserialization
- Use hot path for maximum throughput with basic types
Migration Guide¶
From v0.4.5 to v0.5.5¶
- No breaking changes - all existing template code works
- Enhanced type support - NumPy/PyTorch/sklearn now fully supported
- Better error handling - more descriptive error messages
- Improved performance - faster template matching and type coercion
Best Practices¶
- Always use templates for ML/scientific computing objects
- Test your specific use case across all 4 modes to understand behavior
- Consider caching templates for repeated operations
- Validate results with appropriate equality checks for your data types
Configuration Options¶
Strict vs Flexible Mode¶
template = {'expected': 0, 'known': ''}
# Strict mode: Only process template fields
strict_deserializer = TemplateDeserializer(template, strict=True)
# Flexible mode: Allow extra fields
flexible_deserializer = TemplateDeserializer(template, strict=False)
data_with_extra = {
'expected': '42',
'known': 'value',
'extra_field': 'unexpected'
}
# Flexible mode preserves extra fields
result = flexible_deserializer.deserialize(data_with_extra)
# result includes 'extra_field'
Fallback Auto-Detection¶
# Enable automatic type detection for unknown fields
deserializer = TemplateDeserializer(
template,
strict=False,
fallback_auto_detect=True
)
# Auto-detection applied to fields not in template
data = {
'known_field': '42',
'datetime_field': '2023-01-01T10:00:00', # Auto-detected as datetime
'uuid_field': '12345678-1234-5678-9012-123456789abc' # Auto-detected as UUID
}
Advanced Usage¶
Custom Type Coercion¶
class CustomTemplateDeserializer(TemplateDeserializer):
def _coerce_to_template_type(self, obj, template):
# Custom coercion logic
if isinstance(template, MyCustomType):
return MyCustomType.from_string(obj)
return super()._coerce_to_template_type(obj, template)
# Use custom deserializer
custom_template = {'special_field': MyCustomType()}
deserializer = CustomTemplateDeserializer(custom_template)
Template Validation¶
def validate_template_compatibility(data, template):
"""Check if data is compatible with template."""
try:
deserializer = TemplateDeserializer(template, strict=True)
deserializer.deserialize(data)
return True
except Exception:
return False
# Validate before processing
if validate_template_compatibility(user_data, user_template):
result = deserialize_with_template(user_data, user_template)
else:
# Handle incompatible data
result = fallback_processing(user_data)
Real-World Examples¶
API Response Validation¶
import datason
from datason.deserializers import TemplateDeserializer
# Define API response template
api_response_template = {
'status': '',
'data': {
'user_id': 0,
'username': '',
'created_at': datetime.now(),
'is_active': True
},
'metadata': {
'request_id': uuid4(),
'timestamp': datetime.now()
}
}
def process_api_response(response_json):
"""Process API response with template validation."""
deserializer = TemplateDeserializer(api_response_template)
try:
validated_response = deserializer.deserialize(response_json)
return validated_response
except Exception as e:
raise ValueError(f"Invalid API response format: {e}")
# Usage
response_data = fetch_api_response()
validated_data = process_api_response(response_data)
ML Pipeline Data Consistency¶
import datason
from datason.deserializers import create_ml_round_trip_template
def create_ml_pipeline(training_data, model_config):
"""Create ML pipeline with template-based consistency."""
# Create template from training data structure
data_template = create_ml_round_trip_template(training_data)
# Save template with model
model_artifacts = {
'model': model_config,
'data_template': data_template,
'training_metadata': {
'created': datetime.now(),
'version': '1.0'
}
}
return model_artifacts
def predict_with_template(model_artifacts, new_data):
"""Make predictions with template validation."""
# Ensure new data matches training data structure
template = model_artifacts['data_template']
validated_data = deserialize_with_template(new_data, template)
# Proceed with prediction using validated data
predictions = model_artifacts['model'].predict(validated_data)
return predictions
# Usage
artifacts = create_ml_pipeline(training_df, trained_model)
predictions = predict_with_template(artifacts, new_samples)
Configuration File Processing¶
from datason.deserializers import infer_template_from_data
def process_config_files(config_dir):
"""Process configuration files with template inference."""
config_files = list(Path(config_dir).glob("*.json"))
# Load sample configs to infer template
sample_configs = []
for config_file in config_files[:5]: # Sample first 5
with config_file.open() as f:
sample_configs.append(json.load(f))
# Infer common template
config_template = infer_template_from_data(sample_configs)
# Process all configs with template
processed_configs = []
for config_file in config_files:
with config_file.open() as f:
config_data = json.load(f)
# Apply template for consistency
processed_config = deserialize_with_template(config_data, config_template)
processed_configs.append(processed_config)
return processed_configs
# Usage
configs = process_config_files("./app_configs/")
Time Series Data Normalization¶
def normalize_time_series_data(data_sources):
"""Normalize time series data from multiple sources."""
# Template for time series records
time_series_template = {
'timestamp': datetime.now(),
'sensor_id': '',
'value': 0.0,
'unit': '',
'quality': 1.0
}
normalized_data = []
for source_data in data_sources:
# Apply template to ensure consistent structure
for record in source_data:
normalized_record = deserialize_with_template(record, time_series_template)
normalized_data.append(normalized_record)
return normalized_data
# Usage
sensor_data = [
[{'timestamp': '2023-01-01T10:00:00', 'sensor_id': 'temp_01', 'value': '23.5'}],
[{'timestamp': '2023-01-01T10:01:00', 'sensor_id': 'temp_02', 'value': '24.1'}]
]
normalized = normalize_time_series_data(sensor_data)
Performance Considerations¶
Template Caching¶
# Cache template analysis for repeated use
template = {'id': 0, 'name': '', 'value': 0.0}
deserializer = TemplateDeserializer(template) # Analysis done once
# Reuse deserializer for multiple operations
results = []
for data_item in large_dataset:
result = deserializer.deserialize(data_item) # Fast subsequent calls
results.append(result)
Batch Processing¶
def batch_template_deserialize(data_list, template):
"""Efficiently process large batches with templates."""
deserializer = TemplateDeserializer(template)
# Process in chunks for memory efficiency
chunk_size = 1000
results = []
for i in range(0, len(data_list), chunk_size):
chunk = data_list[i:i+chunk_size]
chunk_results = [deserializer.deserialize(item) for item in chunk]
results.extend(chunk_results)
return results
# Usage
large_dataset = load_large_dataset()
template = infer_template_from_data(large_dataset[:10])
processed_data = batch_template_deserialize(large_dataset, template)
Error Handling¶
Template Mismatch Errors¶
from datason.deserializers import TemplateDeserializationError
try:
result = deserialize_with_template(incompatible_data, template)
except TemplateDeserializationError as e:
print(f"Template mismatch: {e}")
# Handle with fallback processing
result = auto_deserialize(incompatible_data)
Graceful Degradation¶
def robust_template_deserialize(data, template):
"""Template deserialization with graceful fallback."""
try:
# Try template-based deserialization
return deserialize_with_template(data, template, strict=True)
except Exception:
# Fall back to flexible mode
try:
return deserialize_with_template(data, template, strict=False)
except Exception:
# Final fallback to auto-detection
return datason.auto_deserialize(data)
# Usage
result = robust_template_deserialize(uncertain_data, expected_template)
Best Practices¶
1. Template Design¶
# Good: Specific templates with example values
good_template = {
'user_id': 0, # int expected
'email': '', # string expected
'created': datetime.now(), # datetime expected
'active': True # boolean expected
}
# Avoid: Generic templates
avoid_template = {
'user_id': None, # Ambiguous type
'email': None, # Ambiguous type
'created': None, # Ambiguous type
'active': None # Ambiguous type
}
2. Template Inference Sample Size¶
# Use representative sample for inference
sample_size = min(100, len(full_dataset))
sample_data = full_dataset[:sample_size]
template = infer_template_from_data(sample_data)
3. Validation Strategy¶
# Validate critical fields explicitly
def validate_critical_fields(data, template):
critical_fields = ['user_id', 'timestamp', 'amount']
for field in critical_fields:
if field not in data:
raise ValueError(f"Missing critical field: {field}")
expected_type = type(template[field])
if not isinstance(data[field], (str, expected_type)):
raise ValueError(f"Invalid type for {field}")
# Use before template deserialization
validate_critical_fields(user_data, user_template)
result = deserialize_with_template(user_data, user_template)
4. ML Template Persistence¶
# Save ML templates with models
def save_model_with_template(model, training_data, model_path):
"""Save model with data template for consistency."""
# Create ML template
data_template = create_ml_round_trip_template(training_data)
# Package model with template
model_package = {
'model': model,
'data_template': data_template,
'metadata': {
'created': datetime.now(),
'datason_version': datason.__version__
}
}
# Save complete package
with open(model_path, 'w') as f:
json.dump(datason.serialize(model_package), f)
# Load and use with template
def load_model_with_template(model_path):
"""Load model and apply data template."""
with open(model_path) as f:
model_package = datason.deserialize(json.load(f))
return model_package['model'], model_package['data_template']
API Reference¶
Core Functions¶
deserialize_with_template(obj, template, **kwargs)
- Convenience function for template-based deserialization
-
Returns deserialized object matching template structure
-
infer_template_from_data(data, max_samples=100)
- Infer template from sample data
-
Returns template object
-
create_ml_round_trip_template(ml_object)
- Create ML-optimized template
- Returns template with ML-specific metadata
Classes¶
TemplateDeserializer(template, strict=True, fallback_auto_detect=True)
- Template-based deserializer class
-
Methods:
deserialize(obj)
,_analyze_template()
-
TemplateDeserializationError
- Exception for template deserialization failures
Template Types Supported¶
- Basic types:
int
,float
,str
,bool
- Date/time:
datetime
,date
,time
- Identifiers:
UUID
- Collections:
list
,dict
,tuple
- DataFrames:
pandas.DataFrame
,pandas.Series
- NumPy:
numpy.ndarray
- Nested structures: recursive templates
See Also¶
- Chunked Processing - Memory-efficient large data handling
- Configuration Guide - Domain-specific configurations
- Core Features - Basic serialization and auto-detection
- ML Integration - Machine learning workflows