📤 Modern API: Serialization Functions¶
Intention-revealing dump functions for different use cases and optimization needs.
🎯 Function Overview¶
Function | Purpose | Best For |
---|---|---|
dump() |
General-purpose with composable options | Flexible workflows |
dump_ml() |
ML-optimized for models and tensors | Data science |
dump_api() |
Clean JSON for web APIs | Web development |
dump_secure() |
Security-focused with PII redaction | Sensitive data |
dump_fast() |
Performance-optimized | High-throughput |
dump_chunked() |
Memory-efficient for large data | Big datasets |
stream_dump() |
Direct file streaming | Very large files |
FILE OPERATIONS | ||
save_ml() |
Save ML data to JSON/JSONL files | ML model persistence |
save_secure() |
Save with PII redaction to files | Secure file storage |
save_api() |
Save clean data to files | API data export |
save_chunked() |
Save large data efficiently to files | Big dataset export |
📦 Detailed Function Documentation¶
dump()¶
General-purpose serialization with composable options.
datason.dump(obj: Any, fp: Any, **kwargs: Any) -> None
¶
Enhanced file serialization (DataSON's smart default).
This saves enhanced DataSON serialized data to a file using save_ml(). For stdlib json.dump() compatibility, use datason.json.dump() or dump_json().
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
Object to serialize |
required |
fp
|
Any
|
File-like object or file path to write to |
required |
**kwargs
|
Any
|
DataSON configuration options |
{}
|
Returns:
Type | Description |
---|---|
None
|
None (writes to file) |
Example
with open('data.json', 'w') as f: ... dump(data, f) # Enhanced serialization with smart features
For JSON compatibility:¶
import datason.json as json with open('data.json', 'w') as f: ... json.dump(data, f) # Exact json.dump() behavior
Source code in datason/api.py
Composable Options Example:
import datason as ds
import torch
import pandas as pd
# Basic usage
data = {"values": [1, 2, 3], "timestamp": datetime.now()}
result = ds.dump(data)
# Composable options for specific needs
ml_data = {"model": torch.nn.Linear(10, 1), "df": pd.DataFrame({"x": [1, 2, 3]})}
# Combine security + ML optimization + chunked processing
secure_ml_result = ds.dump(
ml_data,
secure=True, # Enable PII redaction
ml_mode=True, # Optimize for ML objects
chunked=True # Memory-efficient processing
)
dump_ml()¶
ML-optimized serialization for models, tensors, and NumPy arrays.
datason.dump_ml(obj: Any, **kwargs: Any) -> Any
¶
ML-optimized serialization for models, tensors, and ML objects.
Automatically configures optimal settings for machine learning objects including NumPy arrays, PyTorch tensors, scikit-learn models, etc.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
ML object to serialize |
required |
**kwargs
|
Any
|
Additional configuration options |
{}
|
Returns:
Type | Description |
---|---|
Any
|
Serialized ML object optimized for reconstruction |
Example
model = sklearn.ensemble.RandomForestClassifier() serialized = dump_ml(model)
Optimized for ML round-trip fidelity¶
Source code in datason/api.py
ML Workflow Example:
import torch
import numpy as np
from sklearn.ensemble import RandomForestClassifier
ml_data = {
"pytorch_model": torch.nn.Linear(10, 1),
"sklearn_model": RandomForestClassifier(),
"tensor": torch.randn(100, 10),
"numpy_array": np.random.random((100, 10)),
}
# Automatically optimized for ML objects
result = ds.dump_ml(ml_data)
dump_api()¶
API-safe serialization for clean JSON output.
datason.dump_api(obj: Any, **kwargs: Any) -> Any
¶
API-safe serialization for web responses and APIs.
Produces clean, predictable JSON suitable for API responses. Handles edge cases gracefully and ensures consistent output format.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
Object to serialize for API response |
required |
**kwargs
|
Any
|
Additional configuration options |
{}
|
Returns:
Type | Description |
---|---|
Any
|
API-safe serialized object |
Example
@app.route('/api/data') def get_data(): return dump_api(complex_data_structure)
Source code in datason/api.py
Web API Example:
# Web API response data
api_data = {
"status": "success",
"data": [1, 2, 3],
"errors": None, # Will be removed
"timestamp": datetime.now(),
"metadata": {"version": "1.0"}
}
# Clean JSON output, removes null values
clean_result = ds.dump_api(api_data)
dump_secure()¶
Security-focused serialization with PII redaction.
datason.dump_secure(obj: Any, *, redact_pii: bool = True, redact_fields: Optional[List[str]] = None, redact_patterns: Optional[List[str]] = None, **kwargs: Any) -> Any
¶
Security-focused serialization with PII redaction.
Automatically redacts sensitive information like credit cards, SSNs, emails, and common secret fields.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
Object to serialize securely |
required |
redact_pii
|
bool
|
Enable automatic PII pattern detection |
True
|
redact_fields
|
Optional[List[str]]
|
Additional field names to redact |
None
|
redact_patterns
|
Optional[List[str]]
|
Additional regex patterns to redact |
None
|
**kwargs
|
Any
|
Additional configuration options |
{}
|
Returns:
Type | Description |
---|---|
Any
|
Serialized object with sensitive data redacted |
Example
user_data = {"name": "John", "ssn": "123-45-6789"} safe_data = dump_secure(user_data)
SSN will be redacted:¶
Source code in datason/api.py
Security Example:
# Sensitive user data
user_data = {
"name": "John Doe",
"email": "john@example.com",
"ssn": "123-45-6789",
"password": "secret123",
"credit_card": "4532-1234-5678-9012"
}
# Automatic PII redaction
secure_result = ds.dump_secure(user_data, redact_pii=True)
# Custom redaction patterns
custom_result = ds.dump_secure(
user_data,
redact_fields=["internal_id"],
redact_patterns=[r"\b\d{4}-\d{4}-\d{4}-\d{4}\b"]
)
dump_fast()¶
Performance-optimized for high-throughput scenarios.
datason.dump_fast(obj: Any, **kwargs: Any) -> Any
¶
Performance-optimized serialization.
Optimized for speed with minimal type checking and validation. Use when you need maximum performance and can accept some trade-offs in type fidelity.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
Object to serialize quickly |
required |
**kwargs
|
Any
|
Additional configuration options |
{}
|
Returns:
Type | Description |
---|---|
Any
|
Serialized object optimized for speed |
Example
For high-throughput scenarios¶
result = dump_fast(large_dataset)
Source code in datason/api.py
High-Throughput Example:
# Large batch processing
batch_data = [{"id": i, "value": random.random()} for i in range(10000)]
# Minimal overhead, optimized for speed
fast_result = ds.dump_fast(batch_data)
dump_chunked()¶
Memory-efficient chunked serialization for large objects.
datason.dump_chunked(obj: Any, *, chunk_size: int = 1000, **kwargs: Any) -> Any
¶
Chunked serialization for large objects.
Breaks large objects into manageable chunks for memory efficiency and streaming processing.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
Large object to serialize in chunks |
required |
chunk_size
|
int
|
Size of each chunk |
1000
|
**kwargs
|
Any
|
Additional configuration options |
{}
|
Returns:
Type | Description |
---|---|
Any
|
ChunkedSerializationResult with metadata and chunks |
Example
big_list = list(range(10000)) result = dump_chunked(big_list, chunk_size=1000)
Returns ChunkedSerializationResult with 10 chunks¶
Source code in datason/api.py
Large Dataset Example:
# Very large dataset
large_data = {
"images": [np.random.random((512, 512, 3)) for _ in range(1000)],
"features": np.random.random((100000, 200))
}
# Process in memory-efficient chunks
chunked_result = ds.dump_chunked(large_data, chunk_size=1000)
stream_dump()¶
Direct file streaming for very large data.
datason.stream_dump(file_path: str, **kwargs: Any) -> Any
¶
Streaming serialization to file.
Efficiently serialize large datasets directly to file without loading everything into memory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str
|
Path to output file |
required |
**kwargs
|
Any
|
Additional configuration options |
{}
|
Returns:
Type | Description |
---|---|
Any
|
StreamingSerializer instance for continued operations |
Example
with stream_dump("output.jsonl") as streamer: for item in large_dataset: streamer.write(item)
Source code in datason/api.py
File Streaming Example:
# Stream directly to file
huge_data = {"massive_array": np.random.random((1000000, 100))}
with open('large_output.json', 'w') as f:
ds.stream_dump(huge_data, f)
🗃️ File Operations Functions¶
save_ml()¶
ML-optimized file saving with perfect type preservation.
datason.save_ml(obj: Any, path: Union[str, Path], *, format: Optional[str] = None, **kwargs: Any) -> None
¶
Save ML-optimized data to JSON or JSONL file.
Combines ML-specific serialization with file I/O, preserving ML types like NumPy arrays, PyTorch tensors, etc.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
ML object or data to save |
required |
path
|
Union[str, Path]
|
Output file path (.json for single object, .jsonl for multiple objects) |
required |
format
|
Optional[str]
|
Explicit format ('json' or 'jsonl'), auto-detected from extension if None |
None
|
**kwargs
|
Any
|
Additional ML configuration options |
{}
|
Examples:
>>> import numpy as np
>>> data = [{"weights": np.array([1, 2, 3]), "epoch": 1}]
>>>
>>> # Save as JSONL (multiple objects, one per line)
>>> save_ml(data, "training.jsonl")
>>> save_ml(data, "training.json", format="jsonl") # Force JSONL
>>>
>>> # Save as JSON (single array object)
>>> save_ml(data, "training.json")
>>> save_ml(data, "training.jsonl", format="json") # Force JSON
Source code in datason/api.py
978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 |
|
ML File Workflow Example:
import torch
import numpy as np
from sklearn.ensemble import RandomForestClassifier
# Complete ML experiment data
experiment = {
"model": RandomForestClassifier(n_estimators=100),
"weights": torch.randn(100, 50),
"features": np.random.random((1000, 20)),
"metadata": {"version": "1.0", "accuracy": 0.95}
}
# Save to JSON file with perfect ML type preservation
ds.save_ml(experiment, "experiment.json")
# Save to JSONL file (each key as separate line)
ds.save_ml(experiment, "experiment.jsonl")
# Automatic compression detection
ds.save_ml(experiment, "experiment.json.gz") # Compressed
save_secure()¶
Secure file saving with PII redaction and integrity verification.
datason.save_secure(obj: Any, path: Union[str, Path], *, format: Optional[str] = None, redact_pii: bool = True, redact_fields: Optional[List[str]] = None, redact_patterns: Optional[List[str]] = None, **kwargs: Any) -> None
¶
Save data to JSON/JSONL file with security features.
Automatically redacts sensitive information before saving.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
Data to save securely |
required |
path
|
Union[str, Path]
|
Output file path |
required |
format
|
Optional[str]
|
Explicit format ('json' or 'jsonl'), auto-detected if None |
None
|
redact_pii
|
bool
|
Enable automatic PII pattern detection |
True
|
redact_fields
|
Optional[List[str]]
|
Additional field names to redact |
None
|
redact_patterns
|
Optional[List[str]]
|
Additional regex patterns to redact |
None
|
**kwargs
|
Any
|
Additional security options |
{}
|
Examples:
>>> user_data = [{"name": "John", "ssn": "123-45-6789"}]
>>>
>>> # Save as JSONL (auto-detected)
>>> save_secure(user_data, "users.jsonl", redact_pii=True)
>>>
>>> # Save as JSON (auto-detected)
>>> save_secure(user_data, "users.json", redact_pii=True)
Source code in datason/api.py
1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 |
|
Secure File Example:
# Sensitive data with PII
user_data = {
"users": [
{"name": "John Doe", "ssn": "123-45-6789", "email": "john@example.com"},
{"name": "Jane Smith", "ssn": "987-65-4321", "email": "jane@example.com"}
],
"api_key": "sk-1234567890abcdef"
}
# Automatic PII redaction with audit trail
ds.save_secure(user_data, "users.json", redact_pii=True)
# Custom redaction patterns
ds.save_secure(
user_data,
"users_custom.json",
redact_fields=["api_key"],
redact_patterns=[r'\b\d{3}-\d{2}-\d{4}\b'] # SSN pattern
)
save_api()¶
Clean API-safe file saving with null removal and formatting.
datason.save_api(obj: Any, path: Union[str, Path], *, format: Optional[str] = None, **kwargs: Any) -> None
¶
Save API-safe data to JSON/JSONL file.
Produces clean, predictable output suitable for API data exchange.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
Data to save for API use |
required |
path
|
Union[str, Path]
|
Output file path |
required |
format
|
Optional[str]
|
Explicit format ('json' or 'jsonl'), auto-detected if None |
None
|
**kwargs
|
Any
|
Additional API configuration options |
{}
|
Examples:
>>> api_data = [{"status": "success", "data": [1, 2, 3]}]
>>>
>>> # Save as single JSON object
>>> save_api(api_data, "responses.json")
>>>
>>> # Save as JSONL (one response per line)
>>> save_api(api_data, "responses.jsonl")
Source code in datason/api.py
API Export Example:
# API response data with nulls and complex types
api_response = {
"status": "success",
"data": [1, 2, 3],
"errors": None, # Will be removed
"timestamp": datetime.now(),
"pagination": {"page": 1, "total": None} # Null removed
}
# Clean JSON output for API consumption
ds.save_api(api_response, "api_export.json")
# Multiple responses to JSONL
responses = [api_response, api_response, api_response]
ds.save_api(responses, "api_batch.jsonl")
save_chunked()¶
Memory-efficient file saving for large datasets.
datason.save_chunked(obj: Any, path: Union[str, Path], *, chunk_size: int = 1000, format: Optional[str] = None, **kwargs: Any) -> None
¶
Save large data to JSON/JSONL file using chunked serialization.
Memory-efficient saving for large datasets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
Large dataset to save |
required |
path
|
Union[str, Path]
|
Output file path |
required |
chunk_size
|
int
|
Size of each chunk |
1000
|
format
|
Optional[str]
|
Explicit format ('json' or 'jsonl'), auto-detected if None |
None
|
**kwargs
|
Any
|
Additional chunking options |
{}
|
Example
large_dataset = list(range(100000)) save_chunked(large_dataset, "large.jsonl", chunk_size=5000) save_chunked(large_dataset, "large.json", chunk_size=5000) # JSON array format
Source code in datason/api.py
Large Dataset File Example:
# Large dataset that might not fit in memory
large_data = {
"training_data": [{"features": np.random.random(1000)} for _ in range(10000)],
"metadata": {"size": "10K samples", "version": "1.0"}
}
# Memory-efficient chunked file saving
ds.save_chunked(large_data, "training.json", chunk_size=1000)
# JSONL format for streaming
ds.save_chunked(large_data, "training.jsonl", chunk_size=500)
# Compressed chunked saving
ds.save_chunked(large_data, "training.json.gz", chunk_size=1000)
🔄 Choosing the Right Function¶
Decision Tree¶
- Need security/PII redaction? → Use
dump_secure()
- Working with ML models/tensors? → Use
dump_ml()
- Building web APIs? → Use
dump_api()
- Processing very large data? → Use
dump_chunked()
orstream_dump()
- Need maximum speed? → Use
dump_fast()
- Want flexibility? → Use
dump()
with options
Performance Comparison¶
Function | Speed | Memory Usage | Features |
---|---|---|---|
dump_fast() |
⚡⚡⚡ | 🧠🧠 | Minimal |
dump() |
⚡⚡ | 🧠🧠 | Composable |
dump_api() |
⚡⚡ | 🧠🧠 | Clean output |
dump_ml() |
⚡ | 🧠🧠🧠 | ML optimized |
dump_secure() |
⚡ | 🧠🧠🧠 | Security features |
dump_chunked() |
⚡ | 🧠 | Memory efficient |
🎨 Composable Patterns¶
Combining Features¶
# Security + ML + Performance
secure_ml_fast = ds.dump(
ml_model_data,
secure=True,
ml_mode=True,
fast=True
)
# API + Security
secure_api = ds.dump_api(api_data, secure=True)
# ML + Chunked for large models
large_ml = ds.dump_ml(huge_model, chunked=True)
🔗 Related Documentation¶
- Deserialization Functions - Load functions
- Utility Functions - Helper functions
- Data Privacy - Security and redaction details
- ML Integration - Machine learning support