📤 Modern API: Serialization Functions¶
Intention-revealing dump functions for different use cases and optimization needs.
🎯 Function Overview¶
Function | Purpose | Best For |
---|---|---|
dump() |
General-purpose with composable options | Flexible workflows |
dump_ml() |
ML-optimized for models and tensors | Data science |
dump_api() |
Clean JSON for web APIs | Web development |
dump_secure() |
Security-focused with PII redaction | Sensitive data |
dump_fast() |
Performance-optimized | High-throughput |
dump_chunked() |
Memory-efficient for large data | Big datasets |
stream_dump() |
Direct file streaming | Very large files |
FILE OPERATIONS | ||
save_ml() |
Save ML data to JSON/JSONL files | ML model persistence |
save_secure() |
Save with PII redaction to files | Secure file storage |
save_api() |
Save clean data to files | API data export |
save_chunked() |
Save large data efficiently to files | Big dataset export |
📦 Detailed Function Documentation¶
dump()¶
General-purpose serialization with composable options.
datason.dump(obj: Any, fp: Any, **kwargs: Any) -> None
¶
Enhanced file serialization (DataSON's smart default).
This is DataSON's smart file writer with datetime handling, type preservation, and enhanced ML support. For stdlib json.dump() compatibility, use datason.json.dump() or dump_json().
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fp
|
Any
|
File-like object or file path to write to |
required |
obj
|
Any
|
Object to serialize |
required |
**kwargs
|
Any
|
DataSON configuration options |
{}
|
Returns:
Type | Description |
---|---|
None
|
None (writes to file) |
Example
with open('data.json', 'w') as f: ... dump(data, f) # Smart serialization with datetime handling
For JSON compatibility:¶
import datason.json as json with open('data.json', 'w') as f: ... json.dump(data, f) # Exact json.dump() behavior
Source code in datason/api.py
Composable Options Example:
import datason as ds
import torch
import pandas as pd
# Basic usage
data = {"values": [1, 2, 3], "timestamp": datetime.now()}
result = ds.dump(data)
# Composable options for specific needs
ml_data = {"model": torch.nn.Linear(10, 1), "df": pd.DataFrame({"x": [1, 2, 3]})}
# Combine security + ML optimization + chunked processing
secure_ml_result = ds.dump(
ml_data,
secure=True, # Enable PII redaction
ml_mode=True, # Optimize for ML objects
chunked=True # Memory-efficient processing
)
dump_ml()¶
ML-optimized serialization for models, tensors, and NumPy arrays.
datason.dump_ml(obj: Any, **kwargs: Any) -> Any
¶
ML-optimized serialization for models, tensors, and ML objects.
Automatically configures optimal settings for machine learning objects including NumPy arrays, PyTorch tensors, scikit-learn models, etc.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
ML object to serialize |
required |
**kwargs
|
Any
|
Additional configuration options |
{}
|
Returns:
Type | Description |
---|---|
Any
|
Serialized ML object optimized for reconstruction |
Example
model = sklearn.ensemble.RandomForestClassifier() serialized = dump_ml(model)
Optimized for ML round-trip fidelity¶
Source code in datason/api.py
ML Workflow Example:
import torch
import numpy as np
from sklearn.ensemble import RandomForestClassifier
ml_data = {
"pytorch_model": torch.nn.Linear(10, 1),
"sklearn_model": RandomForestClassifier(),
"tensor": torch.randn(100, 10),
"numpy_array": np.random.random((100, 10)),
}
# Automatically optimized for ML objects
result = ds.dump_ml(ml_data)
dump_api()¶
API-safe serialization for clean JSON output.
datason.dump_api(obj: Any, **kwargs: Any) -> Any
¶
API-safe serialization for web responses and APIs.
Produces clean, predictable JSON suitable for API responses. Handles edge cases gracefully and ensures consistent output format.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
Object to serialize for API response |
required |
**kwargs
|
Any
|
Additional configuration options |
{}
|
Returns:
Type | Description |
---|---|
Any
|
API-safe serialized object |
Example
@app.route('/api/data') def get_data(): return dump_api(complex_data_structure)
Source code in datason/api.py
Web API Example:
# Web API response data
api_data = {
"status": "success",
"data": [1, 2, 3],
"errors": None, # Will be removed
"timestamp": datetime.now(),
"metadata": {"version": "1.0"}
}
# Clean JSON output, removes null values
clean_result = ds.dump_api(api_data)
dump_secure()¶
Security-focused serialization with PII redaction.
datason.dump_secure(obj: Any, *, redact_pii: bool = True, redact_fields: Optional[List[str]] = None, redact_patterns: Optional[List[str]] = None, **kwargs: Any) -> Any
¶
Security-focused serialization with PII redaction.
Automatically redacts sensitive information like credit cards, SSNs, emails, and common secret fields.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
Object to serialize securely |
required |
redact_pii
|
bool
|
Enable automatic PII pattern detection |
True
|
redact_fields
|
Optional[List[str]]
|
Additional field names to redact |
None
|
redact_patterns
|
Optional[List[str]]
|
Additional regex patterns to redact |
None
|
**kwargs
|
Any
|
Additional configuration options |
{}
|
Returns:
Type | Description |
---|---|
Any
|
Serialized object with sensitive data redacted |
Example
user_data = {"name": "John", "ssn": "123-45-6789"} safe_data = dump_secure(user_data)
SSN will be redacted:¶
Source code in datason/api.py
Security Example:
# Sensitive user data
user_data = {
"name": "John Doe",
"email": "john@example.com",
"ssn": "123-45-6789",
"password": "secret123",
"credit_card": "4532-1234-5678-9012"
}
# Automatic PII redaction
secure_result = ds.dump_secure(user_data, redact_pii=True)
# Custom redaction patterns
custom_result = ds.dump_secure(
user_data,
redact_fields=["internal_id"],
redact_patterns=[r"\b\d{4}-\d{4}-\d{4}-\d{4}\b"]
)
dump_fast()¶
Performance-optimized for high-throughput scenarios.
datason.dump_fast(obj: Any, **kwargs: Any) -> Any
¶
Performance-optimized serialization.
Optimized for speed with minimal type checking and validation. Use when you need maximum performance and can accept some trade-offs in type fidelity.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
Object to serialize quickly |
required |
**kwargs
|
Any
|
Additional configuration options |
{}
|
Returns:
Type | Description |
---|---|
Any
|
Serialized object optimized for speed |
Example
For high-throughput scenarios¶
result = dump_fast(large_dataset)
Source code in datason/api.py
High-Throughput Example:
# Large batch processing
batch_data = [{"id": i, "value": random.random()} for i in range(10000)]
# Minimal overhead, optimized for speed
fast_result = ds.dump_fast(batch_data)
dump_chunked()¶
Memory-efficient chunked serialization for large objects.
datason.dump_chunked(obj: Any, *, chunk_size: int = 1000, **kwargs: Any) -> Any
¶
Chunked serialization for large objects.
Breaks large objects into manageable chunks for memory efficiency and streaming processing.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
Large object to serialize in chunks |
required |
chunk_size
|
int
|
Size of each chunk |
1000
|
**kwargs
|
Any
|
Additional configuration options |
{}
|
Returns:
Type | Description |
---|---|
Any
|
ChunkedSerializationResult with metadata and chunks |
Example
big_list = list(range(10000)) result = dump_chunked(big_list, chunk_size=1000)
Returns ChunkedSerializationResult with 10 chunks¶
Source code in datason/api.py
Large Dataset Example:
# Very large dataset
large_data = {
"images": [np.random.random((512, 512, 3)) for _ in range(1000)],
"features": np.random.random((100000, 200))
}
# Process in memory-efficient chunks
chunked_result = ds.dump_chunked(large_data, chunk_size=1000)
stream_dump()¶
Direct file streaming for very large data.
datason.stream_dump(file_path: str, **kwargs: Any) -> StreamingSerializer
¶
Streaming serialization to file.
Efficiently serialize large datasets directly to file without loading everything into memory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str
|
Path to output file |
required |
**kwargs
|
Any
|
Additional configuration options |
{}
|
Returns:
Type | Description |
---|---|
StreamingSerializer
|
StreamingSerializer instance for continued operations |
Example
with stream_dump("output.jsonl") as streamer: for item in large_dataset: streamer.write(item)
Source code in datason/api.py
File Streaming Example:
# Stream directly to file
huge_data = {"massive_array": np.random.random((1000000, 100))}
with open('large_output.json', 'w') as f:
ds.stream_dump(huge_data, f)
🗃️ File Operations Functions¶
save_ml()¶
ML-optimized file saving with perfect type preservation.
datason.save_ml(obj: Any, path: Union[str, Path], *, format: Optional[str] = None, **kwargs: Any) -> None
¶
Save ML-optimized data to JSON or JSONL file.
Combines ML-specific serialization with file I/O, preserving ML types like NumPy arrays, PyTorch tensors, etc.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
ML object or data to save |
required |
path
|
Union[str, Path]
|
Output file path (.json for single object, .jsonl for multiple objects) |
required |
format
|
Optional[str]
|
Explicit format ('json' or 'jsonl'), auto-detected from extension if None |
None
|
**kwargs
|
Any
|
Additional ML configuration options |
{}
|
Examples:
>>> import numpy as np
>>> data = [{"weights": np.array([1, 2, 3]), "epoch": 1}]
>>>
>>> # Save as JSONL (multiple objects, one per line)
>>> save_ml(data, "training.jsonl")
>>> save_ml(data, "training.json", format="jsonl") # Force JSONL
>>>
>>> # Save as JSON (single array object)
>>> save_ml(data, "training.json")
>>> save_ml(data, "training.jsonl", format="json") # Force JSON
Source code in datason/api.py
1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 |
|
ML File Workflow Example:
import torch
import numpy as np
from sklearn.ensemble import RandomForestClassifier
# Complete ML experiment data
experiment = {
"model": RandomForestClassifier(n_estimators=100),
"weights": torch.randn(100, 50),
"features": np.random.random((1000, 20)),
"metadata": {"version": "1.0", "accuracy": 0.95}
}
# Save to JSON file with perfect ML type preservation
ds.save_ml(experiment, "experiment.json")
# Save to JSONL file (each key as separate line)
ds.save_ml(experiment, "experiment.jsonl")
# Automatic compression detection
ds.save_ml(experiment, "experiment.json.gz") # Compressed
save_secure()¶
Secure file saving with PII redaction and integrity verification.
datason.save_secure(obj: Any, path: Union[str, Path], *, format: Optional[str] = None, redact_pii: bool = True, redact_fields: Optional[List[str]] = None, redact_patterns: Optional[List[str]] = None, **kwargs: Any) -> None
¶
Save data to JSON/JSONL file with security features.
Automatically redacts sensitive information before saving.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
Data to save securely |
required |
path
|
Union[str, Path]
|
Output file path |
required |
format
|
Optional[str]
|
Explicit format ('json' or 'jsonl'), auto-detected if None |
None
|
redact_pii
|
bool
|
Enable automatic PII pattern detection |
True
|
redact_fields
|
Optional[List[str]]
|
Additional field names to redact |
None
|
redact_patterns
|
Optional[List[str]]
|
Additional regex patterns to redact |
None
|
**kwargs
|
Any
|
Additional security options |
{}
|
Examples:
>>> user_data = [{"name": "John", "ssn": "123-45-6789"}]
>>>
>>> # Save as JSONL (auto-detected)
>>> save_secure(user_data, "users.jsonl", redact_pii=True)
>>>
>>> # Save as JSON (auto-detected)
>>> save_secure(user_data, "users.json", redact_pii=True)
Source code in datason/api.py
1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 |
|
Secure File Example:
# Sensitive data with PII
user_data = {
"users": [
{"name": "John Doe", "ssn": "123-45-6789", "email": "john@example.com"},
{"name": "Jane Smith", "ssn": "987-65-4321", "email": "jane@example.com"}
],
"api_key": "sk-1234567890abcdef"
}
# Automatic PII redaction with audit trail
ds.save_secure(user_data, "users.json", redact_pii=True)
# Custom redaction patterns
ds.save_secure(
user_data,
"users_custom.json",
redact_fields=["api_key"],
redact_patterns=[r'\b\d{3}-\d{2}-\d{4}\b'] # SSN pattern
)
save_api()¶
Clean API-safe file saving with null removal and formatting.
datason.save_api(obj: Any, path: Union[str, Path], *, format: Optional[str] = None, **kwargs: Any) -> None
¶
Save API-safe data to JSON/JSONL file.
Produces clean, predictable output suitable for API data exchange.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
Data to save for API use |
required |
path
|
Union[str, Path]
|
Output file path |
required |
format
|
Optional[str]
|
Explicit format ('json' or 'jsonl'), auto-detected if None |
None
|
**kwargs
|
Any
|
Additional API configuration options |
{}
|
Examples:
>>> api_data = [{"status": "success", "data": [1, 2, 3]}]
>>>
>>> # Save as single JSON object
>>> save_api(api_data, "responses.json")
>>>
>>> # Save as JSONL (one response per line)
>>> save_api(api_data, "responses.jsonl")
Source code in datason/api.py
API Export Example:
# API response data with nulls and complex types
api_response = {
"status": "success",
"data": [1, 2, 3],
"errors": None, # Will be removed
"timestamp": datetime.now(),
"pagination": {"page": 1, "total": None} # Null removed
}
# Clean JSON output for API consumption
ds.save_api(api_response, "api_export.json")
# Multiple responses to JSONL
responses = [api_response, api_response, api_response]
ds.save_api(responses, "api_batch.jsonl")
save_chunked()¶
Memory-efficient file saving for large datasets.
datason.save_chunked(obj: Any, path: Union[str, Path], *, chunk_size: int = 1000, format: Optional[str] = None, **kwargs: Any) -> None
¶
Save large data to JSON/JSONL file using chunked serialization.
Memory-efficient saving for large datasets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
Any
|
Large dataset to save |
required |
path
|
Union[str, Path]
|
Output file path |
required |
chunk_size
|
int
|
Size of each chunk |
1000
|
format
|
Optional[str]
|
Explicit format ('json' or 'jsonl'), auto-detected if None |
None
|
**kwargs
|
Any
|
Additional chunking options |
{}
|
Example
large_dataset = list(range(100000)) save_chunked(large_dataset, "large.jsonl", chunk_size=5000) save_chunked(large_dataset, "large.json", chunk_size=5000) # JSON array format
Source code in datason/api.py
Large Dataset File Example:
# Large dataset that might not fit in memory
large_data = {
"training_data": [{"features": np.random.random(1000)} for _ in range(10000)],
"metadata": {"size": "10K samples", "version": "1.0"}
}
# Memory-efficient chunked file saving
ds.save_chunked(large_data, "training.json", chunk_size=1000)
# JSONL format for streaming
ds.save_chunked(large_data, "training.jsonl", chunk_size=500)
# Compressed chunked saving
ds.save_chunked(large_data, "training.json.gz", chunk_size=1000)
🔄 Choosing the Right Function¶
Decision Tree¶
- Need security/PII redaction? → Use
dump_secure()
- Working with ML models/tensors? → Use
dump_ml()
- Building web APIs? → Use
dump_api()
- Processing very large data? → Use
dump_chunked()
orstream_dump()
- Need maximum speed? → Use
dump_fast()
- Want flexibility? → Use
dump()
with options
Performance Comparison¶
Function | Speed | Memory Usage | Features |
---|---|---|---|
dump_fast() |
⚡⚡⚡ | 🧠🧠 | Minimal |
dump() |
⚡⚡ | 🧠🧠 | Composable |
dump_api() |
⚡⚡ | 🧠🧠 | Clean output |
dump_ml() |
⚡ | 🧠🧠🧠 | ML optimized |
dump_secure() |
⚡ | 🧠🧠🧠 | Security features |
dump_chunked() |
⚡ | 🧠 | Memory efficient |
🎨 Composable Patterns¶
Combining Features¶
# Security + ML + Performance
secure_ml_fast = ds.dump(
ml_model_data,
secure=True,
ml_mode=True,
fast=True
)
# API + Security
secure_api = ds.dump_api(api_data, secure=True)
# ML + Chunked for large models
large_ml = ds.dump_ml(huge_model, chunked=True)
🔗 Related Documentation¶
- Deserialization Functions - Load functions
- Utility Functions - Helper functions
- Data Privacy - Security and redaction details
- ML Integration - Machine learning support