📚 Complete API Reference¶
Auto-generated documentation for all datason functions, classes, and constants.
🚀 Modern API Functions¶
Serialization Functions¶
datason.dump(obj: Any, fp: Any, **kwargs: Any) -> None
¶
Enhanced file serialization (DataSON's smart default).
This saves enhanced DataSON serialized data to a file using save_ml(). For stdlib json.dump() compatibility, use datason.json.dump() or dump_json().
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
Object to serialize |
required |
fp
|
Any
|
File-like object or file path to write to |
required |
**kwargs
|
Any
|
DataSON configuration options |
{}
|
Returns:
Type | Description |
---|---|
None
|
None (writes to file) |
Example
with open('data.json', 'w') as f: ... dump(data, f) # Enhanced serialization with smart features
For JSON compatibility:¶
import datason.json as json with open('data.json', 'w') as f: ... json.dump(data, f) # Exact json.dump() behavior
Source code in datason/api.py
datason.dump_ml(obj: Any, **kwargs: Any) -> Any
¶
ML-optimized serialization for models, tensors, and ML objects.
Automatically configures optimal settings for machine learning objects including NumPy arrays, PyTorch tensors, scikit-learn models, etc.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
ML object to serialize |
required |
**kwargs
|
Any
|
Additional configuration options |
{}
|
Returns:
Type | Description |
---|---|
Any
|
Serialized ML object optimized for reconstruction |
Example
model = sklearn.ensemble.RandomForestClassifier() serialized = dump_ml(model)
Optimized for ML round-trip fidelity¶
Source code in datason/api.py
datason.dump_api(obj: Any, **kwargs: Any) -> Any
¶
API-safe serialization for web responses and APIs.
Produces clean, predictable JSON suitable for API responses. Handles edge cases gracefully and ensures consistent output format.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
Object to serialize for API response |
required |
**kwargs
|
Any
|
Additional configuration options |
{}
|
Returns:
Type | Description |
---|---|
Any
|
API-safe serialized object |
Example
@app.route('/api/data') def get_data(): return dump_api(complex_data_structure)
Source code in datason/api.py
datason.dump_secure(obj: Any, *, redact_pii: bool = True, redact_fields: Optional[List[str]] = None, redact_patterns: Optional[List[str]] = None, **kwargs: Any) -> Any
¶
Security-focused serialization with PII redaction.
Automatically redacts sensitive information like credit cards, SSNs, emails, and common secret fields.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
Object to serialize securely |
required |
redact_pii
|
bool
|
Enable automatic PII pattern detection |
True
|
redact_fields
|
Optional[List[str]]
|
Additional field names to redact |
None
|
redact_patterns
|
Optional[List[str]]
|
Additional regex patterns to redact |
None
|
**kwargs
|
Any
|
Additional configuration options |
{}
|
Returns:
Type | Description |
---|---|
Any
|
Serialized object with sensitive data redacted |
Example
user_data = {"name": "John", "ssn": "123-45-6789"} safe_data = dump_secure(user_data)
SSN will be redacted:¶
Source code in datason/api.py
datason.dump_fast(obj: Any, **kwargs: Any) -> Any
¶
Performance-optimized serialization.
Optimized for speed with minimal type checking and validation. Use when you need maximum performance and can accept some trade-offs in type fidelity.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
Object to serialize quickly |
required |
**kwargs
|
Any
|
Additional configuration options |
{}
|
Returns:
Type | Description |
---|---|
Any
|
Serialized object optimized for speed |
Example
For high-throughput scenarios¶
result = dump_fast(large_dataset)
Source code in datason/api.py
datason.dump_chunked(obj: Any, *, chunk_size: int = 1000, **kwargs: Any) -> Any
¶
Chunked serialization for large objects.
Breaks large objects into manageable chunks for memory efficiency and streaming processing.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
Large object to serialize in chunks |
required |
chunk_size
|
int
|
Size of each chunk |
1000
|
**kwargs
|
Any
|
Additional configuration options |
{}
|
Returns:
Type | Description |
---|---|
Any
|
ChunkedSerializationResult with metadata and chunks |
Example
big_list = list(range(10000)) result = dump_chunked(big_list, chunk_size=1000)
Returns ChunkedSerializationResult with 10 chunks¶
Source code in datason/api.py
datason.stream_dump(file_path: str, **kwargs: Any) -> Any
¶
Streaming serialization to file.
Efficiently serialize large datasets directly to file without loading everything into memory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str
|
Path to output file |
required |
**kwargs
|
Any
|
Additional configuration options |
{}
|
Returns:
Type | Description |
---|---|
Any
|
StreamingSerializer instance for continued operations |
Example
with stream_dump("output.jsonl") as streamer: for item in large_dataset: streamer.write(item)
Source code in datason/api.py
Deserialization Functions¶
datason.load_basic(data: Any, **kwargs: Any) -> Any
¶
Basic deserialization using heuristics only.
Uses simple heuristics to reconstruct Python objects from serialized data. Fast but with limited type fidelity - suitable for exploration and non-critical applications.
Success rate: ~60-70% for complex objects Speed: Fastest Use case: Data exploration, simple objects
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
Any
|
Serialized data to deserialize |
required |
**kwargs
|
Any
|
Additional options (parse_dates, parse_uuids, etc.) |
{}
|
Returns:
Type | Description |
---|---|
Any
|
Deserialized Python object |
Example
serialized = {"numbers": [1, 2, 3], "text": "hello"} result = load_basic(serialized)
Works well for simple structures¶
Source code in datason/api.py
datason.load_smart(data: Any, config: Optional[SerializationConfig] = None, **kwargs: Any) -> Any
¶
Smart deserialization with auto-detection and heuristics.
Combines automatic type detection with heuristic fallbacks. Good balance of accuracy and performance for most use cases.
Success rate: ~80-90% for complex objects Speed: Moderate Use case: General purpose, production data processing
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
Any
|
Serialized data to deserialize |
required |
config
|
Optional[SerializationConfig]
|
Configuration for deserialization behavior |
None
|
**kwargs
|
Any
|
Additional options |
{}
|
Returns:
Type | Description |
---|---|
Any
|
Deserialized Python object with improved type fidelity |
Example
serialized = dump_api(complex_object) result = load_smart(serialized)
Better type reconstruction than load_basic¶
Source code in datason/api.py
datason.load_perfect(data: Any, template: Any, **kwargs: Any) -> Any
¶
Perfect deserialization using template matching.
Uses a template object to achieve 100% accurate reconstruction. Requires you to provide the structure/type information but guarantees perfect fidelity.
Success rate: 100% when template matches data Speed: Fast (direct template matching) Use case: Critical applications, ML model loading, exact reconstruction
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
Any
|
Serialized data to deserialize |
required |
template
|
Any
|
Template object showing expected structure/types |
required |
**kwargs
|
Any
|
Additional options |
{}
|
Returns:
Type | Description |
---|---|
Any
|
Perfectly reconstructed Python object matching template |
Example
original = MyComplexClass(...) serialized = dump_ml(original) template = MyComplexClass.get_template() # or original itself result = load_perfect(serialized, template)
Guaranteed perfect reconstruction¶
Source code in datason/api.py
datason.load_typed(data: Any, config: Optional[SerializationConfig] = None, **kwargs: Any) -> Any
¶
Metadata-based type reconstruction.
Uses embedded type metadata from serialization to reconstruct objects. Requires data was serialized with type information preserved.
Success rate: ~95% when metadata available Speed: Fast (direct metadata lookup) Use case: When you control both serialization and deserialization
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
Any
|
Serialized data with embedded type metadata |
required |
config
|
Optional[SerializationConfig]
|
Configuration for type reconstruction |
None
|
**kwargs
|
Any
|
Additional options |
{}
|
Returns:
Type | Description |
---|---|
Any
|
Type-accurate deserialized Python object |
Example
Works best with datason-serialized data¶
serialized = dump(original_object) # Preserves type info result = load_typed(serialized)
High fidelity reconstruction using embedded metadata¶
Source code in datason/api.py
Utility Functions¶
datason.dumps(obj: Any, **kwargs: Any) -> Any
¶
Enhanced serialization returning dict (DataSON's smart default).
This is DataSON's enhanced API that returns a dict with smart type handling, datetime parsing, ML support, and other advanced features.
For JSON string output or stdlib compatibility, use datason.json.dumps() or dumps_json().
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
Object to serialize |
required |
**kwargs
|
Any
|
DataSON configuration options |
{}
|
Returns:
Type | Description |
---|---|
Any
|
Serialized dict with enhanced type handling |
Examples:
>>> obj = {"timestamp": datetime.now(), "data": [1, 2, 3]}
>>> result = dumps(obj) # Returns dict with smart datetime handling
>>> # For JSON string compatibility:
>>> import datason.json as json
>>> json_str = json.dumps(obj) # Returns JSON string
Source code in datason/api.py
datason.loads(s: str, **kwargs: Any) -> Any
¶
Enhanced JSON string deserialization (DataSON's smart default).
This provides smart deserialization with datetime parsing, type reconstruction, and other DataSON enhancements. For stdlib json.loads() compatibility, use datason.json.loads() or loads_json().
Parameters:
Name | Type | Description | Default |
---|---|---|---|
s
|
str
|
JSON string to deserialize |
required |
**kwargs
|
Any
|
DataSON configuration options |
{}
|
Returns:
Type | Description |
---|---|
Any
|
Deserialized Python object with enhanced type handling |
Example
json_str = '{"timestamp": "2024-01-01T00:00:00Z", "data": [1, 2, 3]}' result = loads(json_str) # Smart parsing with datetime handling
For JSON compatibility:¶
import datason.json as json result = json.loads(json_str) # Exact json.loads() behavior
Source code in datason/api.py
datason.help_api() -> Dict[str, Any]
¶
Get help on choosing the right API function.
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
Dictionary with API guidance and function recommendations |
Example
help_info = help_api() print(help_info['recommendations'])
Source code in datason/api.py
794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 |
|
datason.get_api_info() -> Dict[str, Any]
¶
Get information about the modern API.
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
Dictionary with API version and feature information |
Source code in datason/api.py
📋 Traditional API Functions¶
Core Functions¶
datason.serialize(obj: Any, config: Any = None, **kwargs: Any) -> Any
¶
Serialize an object (DEPRECATED - use dump/dumps instead).
DEPRECATION WARNING: Direct use of serialize() is discouraged. Use the clearer API functions instead: - dump(obj, file) - write to file (like json.dump) - dumps(obj) - convert to string (like json.dumps) - serialize_enhanced(obj, **options) - enhanced serialization with clear options
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
Object to serialize |
required |
config
|
Any
|
Optional configuration |
None
|
**kwargs
|
Any
|
Additional options |
{}
|
Returns:
Type | Description |
---|---|
Any
|
Serialized object |
Source code in datason/__init__.py
datason.deserialize(obj: Any, parse_dates: bool = True, parse_uuids: bool = True) -> Any
¶
Recursively deserialize JSON-compatible data back to Python objects.
Attempts to intelligently restore datetime objects, UUIDs, and other types that were serialized to strings by the serialize function.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
The JSON-compatible object to deserialize |
required |
parse_dates
|
bool
|
Whether to attempt parsing ISO datetime strings back to datetime objects |
True
|
parse_uuids
|
bool
|
Whether to attempt parsing UUID strings back to UUID objects |
True
|
Returns:
Type | Description |
---|---|
Any
|
Python object with restored types where possible |
Examples:
>>> data = {"date": "2023-01-01T12:00:00", "id": "12345678-1234-5678-9012-123456789abc"}
>>> deserialize(data)
{"date": datetime(2023, 1, 1, 12, 0), "id": UUID('12345678-1234-5678-9012-123456789abc')}
Source code in datason/deserializers_new.py
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 |
|
datason.auto_deserialize(obj: Any, aggressive: bool = False, config: Optional[SerializationConfig] = None) -> Any
¶
NEW: Intelligent auto-detection deserialization with heuristics.
Uses pattern recognition and heuristics to automatically detect and restore complex data types without explicit configuration.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
JSON-compatible object to deserialize |
required |
aggressive
|
bool
|
Whether to use aggressive type detection (may have false positives) |
False
|
config
|
Optional[SerializationConfig]
|
Configuration object to control deserialization behavior |
None
|
Returns:
Type | Description |
---|---|
Any
|
Python object with auto-detected types restored |
Examples:
>>> data = {"records": [{"a": 1, "b": 2}, {"a": 3, "b": 4}]}
>>> auto_deserialize(data, aggressive=True)
{"records": DataFrame(...)} # May detect as DataFrame
>>> # API-compatible UUID handling
>>> from datason.config import get_api_config
>>> auto_deserialize("12345678-1234-5678-9012-123456789abc", config=get_api_config())
"12345678-1234-5678-9012-123456789abc" # Stays as string
Source code in datason/deserializers_new.py
193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 |
|
datason.safe_deserialize(json_str: str, allow_pickle: bool = False, **kwargs: Any) -> Any
¶
Safely deserialize a JSON string, handling parse errors gracefully.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
json_str
|
str
|
JSON string to parse and deserialize |
required |
allow_pickle
|
bool
|
Whether to allow deserialization of pickle-serialized objects |
False
|
**kwargs
|
Any
|
Arguments passed to deserialize() |
{}
|
Returns:
Type | Description |
---|---|
Any
|
Deserialized Python object, or the original string if parsing fails |
Raises:
Type | Description |
---|---|
DeserializationSecurityError
|
If pickle data is detected and allow_pickle=False |
Source code in datason/deserializers_new.py
Chunked & Streaming¶
datason.serialize_chunked(obj: Any, chunk_size: int = 1000, config: Optional[SerializationConfig] = None, memory_limit_mb: Optional[int] = None) -> ChunkedSerializationResult
¶
Serialize large objects in memory-bounded chunks.
This function breaks large objects (lists, DataFrames, arrays) into smaller chunks to enable processing of datasets larger than available memory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
Object to serialize (typically list, DataFrame, or array) |
required |
chunk_size
|
int
|
Number of items per chunk |
1000
|
config
|
Optional[SerializationConfig]
|
Serialization configuration |
None
|
memory_limit_mb
|
Optional[int]
|
Optional memory limit in MB (not enforced yet, for future use) |
None
|
Returns:
Type | Description |
---|---|
ChunkedSerializationResult
|
ChunkedSerializationResult with iterator of serialized chunks |
Examples:
>>> large_list = list(range(10000))
>>> result = serialize_chunked(large_list, chunk_size=100)
>>> chunks = result.to_list() # Get all chunks
>>> len(chunks) # 100 chunks of 100 items each
100
>>> # Save directly to file without loading all chunks
>>> result = serialize_chunked(large_data, chunk_size=1000)
>>> result.save_to_file("large_data.jsonl", format="jsonl")
Source code in datason/core_new.py
datason.stream_serialize(file_path: Union[str, Path], config: Optional[SerializationConfig] = None, format: str = 'jsonl', buffer_size: int = 8192) -> StreamingSerializer
¶
Create a streaming serializer context manager.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
Union[str, Path]
|
Path to output file |
required |
config
|
Optional[SerializationConfig]
|
Serialization configuration |
None
|
format
|
str
|
Output format ('jsonl' or 'json') |
'jsonl'
|
buffer_size
|
int
|
Write buffer size in bytes |
8192
|
Returns:
Type | Description |
---|---|
StreamingSerializer
|
StreamingSerializer context manager |
Examples:
>>> with stream_serialize("large_data.jsonl") as stream:
... for item in large_dataset:
... stream.write(item)
>>> # Or write chunked data
>>> with stream_serialize("massive_data.jsonl") as stream:
... stream.write_chunked(massive_dataframe, chunk_size=1000)
Source code in datason/core_new.py
datason.deserialize_chunked_file(file_path: Union[str, Path], format: str = 'jsonl', chunk_processor: Optional[Callable[[Any], Any]] = None) -> Generator[Any, None, None]
¶
Deserialize a chunked file created with streaming serialization.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
Union[str, Path]
|
Path to the chunked file |
required |
format
|
str
|
File format ('jsonl' or 'json') |
'jsonl'
|
chunk_processor
|
Optional[Callable[[Any], Any]]
|
Optional function to process each chunk |
None
|
Yields:
Type | Description |
---|---|
Any
|
Deserialized chunks from the file |
Examples:
>>> # Process chunks one at a time (memory efficient)
>>> for chunk in deserialize_chunked_file("large_data.jsonl"):
... process_chunk(chunk)
>>> # Apply custom processing to each chunk
>>> def process_chunk(chunk):
... return [item * 2 for item in chunk]
>>>
>>> processed_chunks = list(deserialize_chunked_file(
... "data.jsonl",
... chunk_processor=process_chunk
... ))
Source code in datason/core_new.py
1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 |
|
datason.estimate_memory_usage(obj: Any, config: Optional[SerializationConfig] = None) -> Dict[str, Any]
¶
Estimate memory usage for serializing an object.
This is a rough estimation to help users decide on chunking strategies.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
Object to analyze |
required |
config
|
Optional[SerializationConfig]
|
Serialization configuration |
None
|
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
Dictionary with memory usage estimates |
Examples:
>>> import pandas as pd
>>> df = pd.DataFrame({'a': range(10000), 'b': range(10000)})
>>> stats = estimate_memory_usage(df)
>>> print(f"Estimated serialized size: {stats['estimated_serialized_mb']:.1f} MB")
>>> print(f"Recommended chunk size: {stats['recommended_chunk_size']}")
Source code in datason/core_new.py
Configuration Functions¶
datason.get_ml_config() -> SerializationConfig
¶
Get configuration optimized for ML workflows.
Returns:
Type | Description |
---|---|
SerializationConfig
|
Configuration with aggressive type coercion and tensor-friendly settings |
Source code in datason/config.py
datason.get_api_config() -> SerializationConfig
¶
Get configuration optimized for API responses.
Returns:
Type | Description |
---|---|
SerializationConfig
|
Configuration with clean, consistent output for web APIs |
Source code in datason/config.py
datason.get_strict_config() -> SerializationConfig
¶
Get configuration with strict type checking.
Returns:
Type | Description |
---|---|
SerializationConfig
|
Configuration that raises errors on unknown types |
Source code in datason/config.py
datason.get_performance_config() -> SerializationConfig
¶
Get configuration optimized for performance.
Returns:
Type | Description |
---|---|
SerializationConfig
|
Configuration with minimal processing for maximum speed |
Source code in datason/config.py
Template Functions¶
datason.deserialize_with_template(obj: Any, template: Any, **kwargs: Any) -> Any
¶
Convenience function for template-based deserialization.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
Serialized object to deserialize |
required |
template
|
Any
|
Template object to guide deserialization |
required |
**kwargs
|
Any
|
Additional arguments for TemplateDeserializer |
{}
|
Returns:
Type | Description |
---|---|
Any
|
Deserialized object matching template structure |
Examples:
>>> import pandas as pd
>>> template_df = pd.DataFrame({'a': [1], 'b': ['text']})
>>> serialized_data = [{'a': 2, 'b': 'hello'}, {'a': 3, 'b': 'world'}]
>>> result = deserialize_with_template(serialized_data, template_df)
>>> isinstance(result, pd.DataFrame)
True
>>> result.dtypes['a'] # Should match template
int64
Source code in datason/deserializers_new.py
datason.infer_template_from_data(data: Any, max_samples: int = 100) -> Any
¶
Infer a template from sample data.
This function analyzes sample data to create a template that can be used for subsequent template-based deserialization.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
Any
|
Sample data to analyze (list of records, DataFrame, etc.) |
required |
max_samples
|
int
|
Maximum number of samples to analyze |
100
|
Returns:
Type | Description |
---|---|
Any
|
Inferred template object |
Examples:
>>> sample_data = [
... {'name': 'Alice', 'age': 30, 'date': '2023-01-01T10:00:00'},
... {'name': 'Bob', 'age': 25, 'date': '2023-01-02T11:00:00'}
... ]
>>> template = infer_template_from_data(sample_data)
>>> # template will be a dict with expected types
Source code in datason/deserializers_new.py
datason.create_ml_round_trip_template(ml_object: Any) -> Dict[str, Any]
¶
Create a template optimized for ML object round-trip serialization.
This function creates templates specifically designed for machine learning workflows where perfect round-trip fidelity is crucial.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ml_object
|
Any
|
ML object (model, dataset, etc.) to create template for |
required |
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
Template dictionary with ML-specific metadata |
Examples:
>>> import sklearn.linear_model
>>> model = sklearn.linear_model.LogisticRegression()
>>> template = create_ml_round_trip_template(model)
>>> # template will include model structure, parameters, etc.
Source code in datason/deserializers_new.py
1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 |
|
🏗️ Classes & Data Types¶
Configuration Classes¶
datason.SerializationConfig(date_format: DateFormat = DateFormat.ISO, custom_date_format: Optional[str] = None, uuid_format: str = 'object', parse_uuids: bool = True, dataframe_orient: DataFrameOrient = DataFrameOrient.RECORDS, datetime_output: OutputType = OutputType.JSON_SAFE, series_output: OutputType = OutputType.JSON_SAFE, dataframe_output: OutputType = OutputType.JSON_SAFE, numpy_output: OutputType = OutputType.JSON_SAFE, nan_handling: NanHandling = NanHandling.NULL, type_coercion: TypeCoercion = TypeCoercion.SAFE, preserve_decimals: bool = True, preserve_complex: bool = True, max_depth: int = 50, max_size: int = 100000, max_string_length: int = 1000000, custom_serializers: Optional[Dict[type, Callable[[Any], Any]]] = None, sort_keys: bool = False, ensure_ascii: bool = False, check_if_serialized: bool = False, include_type_hints: bool = False, auto_detect_types: bool = False, redact_fields: Optional[List[str]] = None, redact_patterns: Optional[List[str]] = None, redact_large_objects: bool = False, redaction_replacement: str = '<REDACTED>', include_redaction_summary: bool = False, audit_trail: bool = False, cache_scope: CacheScope = CacheScope.OPERATION, cache_size_limit: int = 1000, cache_warn_on_limit: bool = True, cache_metrics_enabled: bool = False)
dataclass
¶
Configuration for datason serialization behavior.
Attributes:
Name | Type | Description |
---|---|---|
date_format |
DateFormat
|
How to format datetime objects |
custom_date_format |
Optional[str]
|
Custom strftime format when date_format is CUSTOM |
dataframe_orient |
DataFrameOrient
|
Pandas DataFrame orientation |
datetime_output |
OutputType
|
How to output datetime objects |
series_output |
OutputType
|
How to output pandas Series |
dataframe_output |
OutputType
|
How to output pandas DataFrames (overrides orient for object output) |
numpy_output |
OutputType
|
How to output numpy arrays |
nan_handling |
NanHandling
|
How to handle NaN/null values |
type_coercion |
TypeCoercion
|
Type coercion behavior |
preserve_decimals |
bool
|
Whether to preserve decimal.Decimal precision |
preserve_complex |
bool
|
Whether to preserve complex numbers as dict |
max_depth |
int
|
Maximum recursion depth (security) |
max_size |
int
|
Maximum collection size (security) |
max_string_length |
int
|
Maximum string length (security) |
custom_serializers |
Optional[Dict[type, Callable[[Any], Any]]]
|
Dict of type -> serializer function |
sort_keys |
bool
|
Whether to sort dictionary keys in output |
ensure_ascii |
bool
|
Whether to ensure ASCII output only |
check_if_serialized |
bool
|
Skip processing if object is already JSON-safe |
include_type_hints |
bool
|
Include type metadata for perfect round-trip deserialization |
redact_fields |
Optional[List[str]]
|
Field patterns to redact (e.g., ["password", "api_key", "*.secret"]) |
redact_patterns |
Optional[List[str]]
|
Regex patterns to redact (e.g., credit card numbers) |
redact_large_objects |
bool
|
Auto-redact objects >10MB |
redaction_replacement |
str
|
Replacement text for redacted content |
include_redaction_summary |
bool
|
Include summary of what was redacted |
audit_trail |
bool
|
Track all redaction operations for compliance |
datason.ChunkedSerializationResult(chunks: Iterator[Any], metadata: Dict[str, Any])
¶
Result container for chunked serialization operations.
Initialize chunked result.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
chunks
|
Iterator[Any]
|
Iterator of serialized chunks |
required |
metadata
|
Dict[str, Any]
|
Metadata about the chunking operation |
required |
Source code in datason/core_new.py
save_to_file(file_path: Union[str, Path], format: str = 'jsonl') -> None
¶
Save chunks to a file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
Union[str, Path]
|
Path to save the chunks |
required |
format
|
str
|
Format to save ('jsonl' for JSON lines, 'json' for array) |
'jsonl'
|
Source code in datason/core_new.py
datason.StreamingSerializer(file_path: Union[str, Path], config: Optional[SerializationConfig] = None, format: str = 'jsonl', buffer_size: int = 8192)
¶
Context manager for streaming serialization to files.
Enables processing of datasets larger than available memory by writing serialized data directly to files without keeping everything in memory.
Initialize streaming serializer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
Union[str, Path]
|
Path to output file |
required |
config
|
Optional[SerializationConfig]
|
Serialization configuration |
None
|
format
|
str
|
Output format ('jsonl' or 'json') |
'jsonl'
|
buffer_size
|
int
|
Write buffer size in bytes |
8192
|
Source code in datason/core_new.py
__enter__() -> StreamingSerializer
¶
Enter context manager.
Source code in datason/core_new.py
__exit__(exc_type: Any, exc_val: Any, exc_tb: Any) -> None
¶
Exit context manager.
Source code in datason/core_new.py
write(obj: Any) -> None
¶
Write a single object to the stream.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
Object to serialize and write |
required |
Source code in datason/core_new.py
write_chunked(obj: Any, chunk_size: int = 1000) -> None
¶
Write a large object using chunked serialization.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
Large object to chunk and write |
required |
chunk_size
|
int
|
Size of each chunk |
1000
|
Source code in datason/core_new.py
datason.TemplateDeserializer(template: Any, strict: bool = True, fallback_auto_detect: bool = True)
¶
Template-based deserializer for enhanced type fidelity and round-trip scenarios.
This class allows users to provide a template object that guides the deserialization process, ensuring that the output matches the expected structure and types.
Initialize template deserializer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
template
|
Any
|
Template object to guide deserialization |
required |
strict
|
bool
|
If True, raise errors when structure doesn't match |
True
|
fallback_auto_detect
|
bool
|
If True, use auto-detection when template doesn't match |
True
|
Source code in datason/deserializers_new.py
deserialize(obj: Any) -> Any
¶
Deserialize object using template guidance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
Serialized object to deserialize |
required |
Returns:
Type | Description |
---|---|
Any
|
Deserialized object matching template structure |
Source code in datason/deserializers_new.py
Exception Classes¶
datason.SecurityError
¶
Bases: Exception
Raised when security limits are exceeded during serialization.
datason.TemplateDeserializationError
¶
Bases: Exception
Raised when template-based deserialization fails.
Enums & Constants¶
datason.DateFormat
¶
Bases: Enum
Supported date/time output formats.
datason.DataFrameOrient
¶
Bases: Enum
Supported pandas DataFrame orientations.
Based on pandas.DataFrame.to_dict() valid orientations.
datason.NanHandling
¶
Bases: Enum
How to handle NaN/null values.
datason.TypeCoercion
¶
Bases: Enum
Type coercion behavior.
🔧 Utility Functions¶
datason.safe_float(value: Any, default: float = 0.0) -> float
¶
Convert value to float, handling NaN, None, and Inf values safely.
This function is particularly useful when working with pandas DataFrames that may contain NaN values or when processing data from external sources that may have None values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
value
|
Any
|
Value to convert to float |
required |
default
|
float
|
Default value to return if conversion fails or value is NaN/None/Inf |
0.0
|
Returns:
Type | Description |
---|---|
float
|
Float value or default if conversion fails |
Examples:
>>> safe_float(42.5)
42.5
>>> safe_float(None)
0.0
>>> safe_float(float('nan'))
0.0
>>> safe_float(float('inf'))
0.0
>>> safe_float("invalid", 10.0)
10.0
Source code in datason/converters.py
datason.safe_int(value: Any, default: int = 0) -> int
¶
Convert value to int, handling NaN and None values safely.
This function is particularly useful when working with pandas DataFrames that may contain NaN values or when processing data from external sources that may have None values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
value
|
Any
|
Value to convert to int |
required |
default
|
int
|
Default value to return if conversion fails or value is NaN/None |
0
|
Returns:
Type | Description |
---|---|
int
|
Integer value or default if conversion fails |
Examples:
>>> safe_int(42)
42
>>> safe_int(42.7)
42
>>> safe_int(None)
0
>>> safe_int(float('nan'))
0
>>> safe_int("invalid", 10)
10
Source code in datason/converters.py
datason.ensure_timestamp(val: Any) -> Any
¶
Ensure a scalar date value is a pandas Timestamp. Use this for group-level date fields.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
val
|
Any
|
A date value (can be pd.Timestamp, datetime, or string) |
required |
Returns:
Type | Description |
---|---|
Any
|
pd.Timestamp or pd.NaT |
Raises: TypeError: If input is a list, dict, or other non-date-like object
Source code in datason/datetime_utils.py
🛡️ Privacy & Security Functions¶
datason.create_financial_redaction_engine() -> RedactionEngine
¶
Create a redaction engine optimized for financial data.
Source code in datason/redaction.py
datason.create_healthcare_redaction_engine() -> RedactionEngine
¶
Create a redaction engine optimized for healthcare data.
Source code in datason/redaction.py
datason.create_minimal_redaction_engine() -> RedactionEngine
¶
Create a minimal redaction engine for basic privacy protection.
Source code in datason/redaction.py
🧠 ML Integration Functions (Optional)¶
These functions are available when ML libraries are installed:
datason.detect_and_serialize_ml_object(obj: Any) -> Optional[Dict[str, Any]]
¶
Detect and serialize ML/AI objects automatically.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
Object that might be from an ML/AI library |
required |
Returns:
Type | Description |
---|---|
Optional[Dict[str, Any]]
|
Serialized object or None if not an ML/AI object |
Source code in datason/ml_serializers.py
861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 |
|
📊 Cache & Performance Functions¶
datason.clear_all_caches() -> None
¶
Clear all caches across all scopes (for testing/debugging).
Source code in datason/cache_manager.py
datason.get_cache_metrics(scope: Optional[CacheScope] = None) -> Dict[CacheScope, CacheMetrics]
¶
Get cache metrics for a specific scope or all scopes.
📦 Package Information¶
datason.get_version() -> str
¶
datason.get_info() -> dict
¶
Get information about the datason package.