datason Feature Matrix 🔍¶
Library Comparison Matrix¶
Feature | datason | json (stdlib) | pickle | joblib | ujson | orjson |
---|---|---|---|---|---|---|
Core Features | ||||||
Basic JSON Types | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Complex Python Objects | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ |
Human Readable Output | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ |
Cross-Language Compatible | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ |
Data Science Support | ||||||
NumPy Arrays | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ |
Pandas DataFrames | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ |
Pandas Series | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ |
DateTime Objects | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ |
Timezone Handling | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ |
ML/AI Support | ||||||
PyTorch Tensors | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ |
TensorFlow Tensors | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ |
Scikit-learn Models | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ |
JAX Arrays | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ |
HuggingFace Models | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ |
Special Types | ||||||
UUID Objects | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ |
NaN/Infinity Handling | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ |
Complex Numbers | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ |
Bytes/ByteArray | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ |
Performance | ||||||
Small Objects | ⚡ | ⚡⚡ | ⚡ | ⚡ | ⚡⚡⚡ | ⚡⚡⚡ |
Large Objects | ⚡⚡ | ⚡ | ⚡⚡ | ⚡⚡⚡ | ⚡⚡ | ⚡⚡⚡ |
Memory Efficiency | ⚡⚡ | ⚡⚡ | ⚡ | ⚡⚡ | ⚡⚡ | ⚡⚡⚡ |
Safety | ||||||
Type Preservation | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ |
Circular Reference Safe | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ |
Error Handling | ✅ | ⚡ | ⚡ | ⚡ | ⚡ | ⚡ |
Ecosystem | ||||||
Zero Dependencies | ❌* | ✅ | ✅ | ❌ | ❌ | ❌ |
Optional Dependencies | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
Python Version Support | 3.8+ | 3.7+ | 3.7+ | 3.8+ | 3.7+ | 3.8+ |
*Core functionality works without dependencies; ML features require optional packages
Use Case Recommendations¶
🎯 When to Use datason¶
Perfect for: - 🤖 ML/AI Applications: Model serialization, experiment tracking - 📊 Data Science Workflows: DataFrame processing, scientific computing - 🌐 API Development: Complex response serialization - 🔬 Research Projects: Reproducible experiment data - 📱 Cross-platform Data Exchange: When you need human-readable format
Example scenarios:
# ✅ datason excels here
ml_experiment = {
'model': trained_sklearn_model,
'predictions': torch_tensor_predictions,
'data': pandas_dataframe,
'timestamp': datetime.now(),
'metrics': {'accuracy': 0.95}
}
🎯 When to Use Alternatives¶
Use json
(stdlib) when:
- Simple data types only (dict, list, str, int, float, bool)
- Maximum compatibility needed
- Minimal dependencies required
- Working with web APIs expecting standard JSON
Use pickle
when:
- Python-only environment
- Complex object graphs with references
- Maximum fidelity required (exact object reconstruction)
- Performance is critical and human readability not needed
Use joblib
when:
- Primarily working with NumPy arrays
- Scientific computing focused
- Need compression
- Working with large arrays/matrices
Use ujson
/orjson
when:
- Maximum performance for basic JSON types
- High-throughput applications
- Simple data structures
- Speed is the primary concern
Performance Benchmarks¶
Real benchmarks (Python 3.13.3, macOS):
Simple JSON-Compatible Data (1000 user objects)¶
Complex Data (500 objects with UUIDs/datetimes)¶
High-Throughput Scenarios¶
datason performance:
- Large nested datasets: 272,654 items/second
- NumPy array processing: 5.5M elements/second
- Pandas DataFrame rows: 195,242 rows/second
- Round-trip (serialize+JSON+deserialize): 1.4ms
Memory Usage (Efficient processing)¶
datason: Optimized for streaming large datasets
joblib: Good for NumPy arrays with compression
pickle: High memory usage for complex objects
json: N/A (can't serialize DataFrames/tensors)
*See benchmark_real_performance.py
for detailed benchmarking methodology
Feature Deep Dive¶
🤖 ML/AI Object Support¶
datason provides native support for major ML frameworks:
# PyTorch
tensor = torch.randn(100, 100)
# Result: {"_type": "torch.Tensor", "_shape": [100, 100], "_data": [...]}
# TensorFlow
tf_tensor = tf.constant([[1, 2], [3, 4]])
# Result: {"_type": "tf.Tensor", "_shape": [2, 2], "_data": [[1, 2], [3, 4]]}
# Scikit-learn
model = RandomForestClassifier()
# Result: {"_type": "sklearn.model", "_class": "...", "_params": {...}}
📊 Data Science Integration¶
Seamless pandas and NumPy support:
# DataFrame with mixed types
df = pd.DataFrame({
'id': [1, 2, 3],
'name': ['Alice', 'Bob', 'Charlie'],
'created': [datetime.now(), datetime.now(), datetime.now()]
})
# Automatically handles datetime serialization in DataFrame
🛡️ Safety Features¶
- NaN/Infinity Handling: Converts to
null
for JSON compatibility - Circular Reference Detection: Prevents infinite recursion
- Graceful Fallbacks: Falls back to string representation for unknown types
- Type Preservation: Maintains type information for accurate reconstruction
⚡ Performance Optimizations¶
- Early Detection: Skips processing for already-serialized data
- Memory Streaming: Handles large datasets without loading everything in memory
- Smart Caching: Reuses serialization results for repeated objects
- Lazy Loading: Only imports ML libraries when needed
Integration Examples¶
Flask/FastAPI APIs¶
@app.route('/model/predict')
def predict():
result = model.predict(data)
return jsonify(datason.serialize({
'predictions': result, # NumPy array
'model_info': model, # sklearn model
'timestamp': datetime.now()
}))
Experiment Tracking¶
experiment = {
'id': uuid.uuid4(),
'model': trained_model,
'data': validation_dataframe,
'metrics': metrics_dict,
'artifacts': {
'weights': model.coef_,
'predictions': predictions_array
}
}
experiment_json = datason.serialize(experiment)
Data Pipeline State¶
pipeline_state = {
'stage': 'feature_engineering',
'input_data': raw_dataframe,
'processed_data': processed_dataframe,
'transformations': sklearn_preprocessor,
'statistics': summary_stats,
'timestamp': datetime.now()
}
Decision Matrix¶
Use this matrix to choose the right tool:
Requirement | Recommended Library |
---|---|
Basic JSON + High Performance | orjson, ujson |
Standard JSON only | json (stdlib) |
Python objects + High Performance | pickle |
NumPy arrays + Compression | joblib |
ML/AI data + Human Readable | datason |
Complex data + Cross-platform | datason |
API responses with mixed types | datason |
Data science workflows | datason |
Migration Guide¶
From json
to datason¶
# Before
try:
json_str = json.dumps(data)
except TypeError:
# Handle unsupported types manually
pass
# After
json_str = json.dumps(datason.serialize(data)) # Just works!
From pickle
to datason¶
# Before (Python-only)
with open('data.pkl', 'wb') as f:
pickle.dump(data, f)
# After (Cross-platform)
with open('data.json', 'w') as f:
json.dump(datason.serialize(data), f)
From joblib
to datason¶
# Before (Binary format)
joblib.dump(sklearn_model, 'model.joblib')
# After (Human-readable)
with open('model.json', 'w') as f:
json.dump(datason.serialize(sklearn_model), f)
Key Differentiators Explained¶
🌐 Cross-Language Compatibility¶
datason outputs standard JSON that any programming language can read:
# datason output - works everywhere
{
"model_results": {
"_type": "torch.Tensor",
"_shape": [3, 3],
"_data": [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]]
},
"timestamp": "2024-01-15T10:30:00Z",
"accuracy": 0.95
}
Real-world scenarios where this matters:
🚀 Microservices Architecture
# Python ML service
ml_results = datason.serialize({
'predictions': torch_predictions,
'model_version': '2.1.0',
'confidence': confidence_scores
})
# Send to JavaScript frontend, Java backend, Go service, etc.
🌍 Multi-Language Data Pipelines
# Python data processing
processed_data = datason.serialize({
'features': numpy_features,
'labels': pandas_series,
'metadata': {'processed_at': datetime.now()}
})
# R analytics, JavaScript visualization, Java storage - all can read this
🔗 API Integrations
# Your Python API response
@app.route('/api/analysis')
def get_analysis():
return datason.serialize({
'dataframe': analysis_df,
'charts': plot_data,
'timestamp': datetime.now()
})
# Any client (React, Vue, mobile apps) can consume this directly
Compare with pickle (Python-only):
# Pickle output - binary garbage to other languages
b'\x80\x04\x95\x1a\x00\x00\x00\x00\x00\x00\x00}\x94\x8c\x04name\x94\x8c\x04John\x94s.'
# ❌ JavaScript: "What is this??"
# ❌ Java: "Cannot parse this"
# ❌ R: "Error reading data"
👁️ Human-Readable Output¶
datason produces readable JSON you can inspect, debug, and version control:
{
"experiment_id": "exp_20240115_v1",
"model": {
"_type": "sklearn.RandomForestClassifier",
"_params": {
"n_estimators": 100,
"max_depth": 10,
"random_state": 42
}
},
"results": {
"accuracy": 0.94,
"precision": 0.91,
"recall": 0.88
},
"training_data": {
"_type": "pandas.DataFrame",
"_shape": [1000, 25],
"_columns": ["feature_1", "feature_2", "..."],
"_sample": [
{"feature_1": 1.2, "feature_2": 0.8, "...": "..."}
]
},
"timestamp": "2024-01-15T10:30:00Z"
}
Practical benefits:
🔍 Easy Debugging
# You can actually READ what's happening
cat experiment_results.json | jq '.results.accuracy'
# Output: 0.94
# vs pickle debugging (impossible):
cat experiment_results.pkl
# Output: Binary gibberish
📋 Data Inspection
# Open in any text editor and see your data structure
# Perfect for:
# - Checking serialization worked correctly
# - Understanding data flow in pipelines
# - Sharing results with non-Python colleagues
# - Troubleshooting API responses
🔄 Version Control Friendly
# Git can show meaningful diffs
git diff experiment_results.json
# Shows actual changes to accuracy, parameters, etc.
# vs pickle in git (useless):
git diff experiment_results.pkl
# "Binary files differ" - no useful information
📊 Business Stakeholder Sharing
# Data scientists can share results with business teams
# Marketing can read the JSON and understand conversion rates
# Product managers can see A/B test results directly
# No need for special Python tools to view the data
🛠️ When Each Tool Excels¶
Scenario | Best Tool | Why |
---|---|---|
Python-only ML pipeline | pickle | Fastest, perfect object reconstruction |
Multi-language microservices | datason | JSON works everywhere |
API responses | datason | Human-readable, debuggable |
Data science collaboration | datason | Readable by everyone |
High-performance NumPy | joblib | Optimized for arrays |
Web APIs (basic JSON) | orjson/ujson | Maximum speed |
Debugging complex data | datason | Inspect results easily |
Cross-team data sharing | datason | No language barriers |
Scientific reproducibility | datason | Human-readable results |
🎯 Concrete Use Case Examples¶
❌ Problems with pickle:
# Data science team (Python)
pickle.dump(ml_results, open('results.pkl', 'wb'))
# Frontend team (JavaScript)
❌ "How do we read this .pkl file?"
❌ "Can you export to CSV for everything?"
❌ "We can't debug API responses"
# Business stakeholders
❌ "We need special tools to see our data"
❌ "Can't inspect experiment results directly"
✅ Solutions with datason:
# Data science team (Python)
json.dump(datason.serialize(ml_results), open('results.json', 'w'))
# Frontend team (JavaScript)
✅ fetch('/api/results').then(r => r.json()) // Just works!
✅ console.log(results.model.accuracy) // Easy debugging
# Business stakeholders
✅ Open results.json in any text editor
✅ See experiment results immediately
✅ Share data with external partners easily
🚀 Real-World Success Story:
# Before: Separate export formats for each team
pickle.dump(model, 'model.pkl') # For Python team
model.to_csv('model.csv') # For Excel users
json.dump(manual_dict, 'api.json') # For frontend
# 3 different formats, data inconsistency, maintenance nightmare
# After: One format for everyone
datason_result = datason.serialize({
'model': sklearn_model,
'predictions': numpy_predictions,
'metadata': {'timestamp': datetime.now()}
})
# ✅ Python team: Deserialize back to original objects
# ✅ Frontend team: Standard JSON consumption
# ✅ Business team: Human-readable results
# ✅ External partners: Universal JSON format