datason AI Usage Guide¶
Overview¶
datason is a universal JSON serialization library for Python that handles complex data structures including ML/AI objects, pandas DataFrames, datetime objects, and more. Perfect for data scientists, ML engineers, and developers working with complex Python objects.
Key Features for AI/ML Workflows¶
🤖 Machine Learning Object Serialization¶
import datason
import torch
import tensorflow as tf
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
# Serialize PyTorch tensors
tensor = torch.tensor([[1, 2], [3, 4]], dtype=torch.float32)
serialized = datason.serialize(tensor)
# Output: {"_type": "torch.Tensor", "_shape": [2, 2], "_dtype": "torch.float32", "_data": [[1.0, 2.0], [3.0, 4.0]]}
# Serialize TensorFlow tensors
tf_tensor = tf.constant([[1, 2], [3, 4]])
serialized = datason.serialize(tf_tensor)
# Output: {"_type": "tf.Tensor", "_shape": [2, 2], "_dtype": "int32", "_data": [[1, 2], [3, 4]]}
# Serialize scikit-learn models
model = RandomForestClassifier(n_estimators=10, random_state=42)
serialized = datason.serialize(model)
# Output: {"_type": "sklearn.model", "_class": "sklearn.ensemble._forest.RandomForestClassifier", "_params": {...}}
📊 Data Science Object Support¶
import pandas as pd
import numpy as np
from datetime import datetime
# Serialize pandas DataFrames
df = pd.DataFrame({
'name': ['Alice', 'Bob'],
'age': [25, 30],
'created_at': [datetime.now(), datetime.now()]
})
serialized = datason.serialize(df)
# Automatically converts datetime objects to ISO format strings
# Handle NaN and infinity values
data = {'values': [1, 2, np.nan, np.inf, -np.inf]}
serialized = datason.serialize(data)
# Output: {'values': [1, 2, null, null, null]} # Safe handling of special values
🏗️ Complex Nested Structures¶
complex_data = {
'model_results': {
'predictions': torch.tensor([0.8, 0.2, 0.9]),
'model_info': model,
'metadata': {
'created_at': datetime.now(),
'dataset_size': 1000,
'features': ['feature1', 'feature2']
}
},
'dataframe': df,
'numpy_arrays': [np.array([1, 2, 3]), np.array([4, 5, 6])]
}
# Single function call handles everything
result = datason.serialize(complex_data)
Common AI/ML Use Cases¶
1. Model Experiment Tracking¶
experiment_data = {
'model': trained_model,
'hyperparameters': {'lr': 0.001, 'batch_size': 32},
'metrics': {'accuracy': 0.95, 'loss': 0.05},
'predictions': prediction_tensor,
'timestamp': datetime.now()
}
json_data = datason.serialize(experiment_data)
# Save to file or database for experiment tracking
2. API Response Serialization¶
from flask import Flask, jsonify
app = Flask(__name__)
@app.route('/model/predict')
def predict():
predictions = model.predict(input_data)
response = {
'predictions': predictions, # Can be torch tensor, numpy array, etc.
'confidence': confidence_scores,
'model_version': '1.0.0',
'processed_at': datetime.now()
}
return jsonify(datason.serialize(response))
3. Data Pipeline Serialization¶
pipeline_state = {
'preprocessor': sklearn_preprocessor,
'model': trained_model,
'feature_columns': df.columns.tolist(),
'training_data_sample': df.head(),
'validation_metrics': validation_results
}
serialized_pipeline = datason.serialize(pipeline_state)
Performance Features¶
Real Benchmark Results¶
Measured on Python 3.13.3, macOS with NumPy & Pandas installed
# Simple data (JSON-compatible, 1000 users)
datason.serialize(simple_data) # 0.6ms vs 0.4ms standard JSON
# Overhead: 1.6x (very reasonable for added functionality)
# Complex data (500 sessions with UUIDs/datetimes)
datason.serialize(complex_data) # 2.1ms vs 0.7ms pickle
# 3.2x vs pickle but with JSON output + cross-platform compatibility
# High-throughput scenarios
# - Large nested datasets: 272,654 items/second
# - NumPy array processing: 5.5M elements/second
# - Pandas DataFrame serialization: 195,242 rows/second
# Round-trip performance (real-world API usage)
serialize + JSON.dumps + JSON.loads + deserialize # 1.4ms total
Optimized for Large Data¶
# Efficiently handles large DataFrames
large_df = pd.DataFrame(np.random.randn(5000, 20))
serialized = datason.serialize(large_df) # ~26ms for 5K rows
# Batch processing of ML objects
batch_tensors = [torch.randn(100, 10) for _ in range(100)]
serialized_batch = datason.serialize(batch_tensors)
Memory Management¶
# Automatic memory optimization for tensors
gpu_tensor = torch.randn(1000, 1000).cuda()
serialized = datason.serialize(gpu_tensor) # Automatically moves to CPU for serialization
Full benchmarks available in benchmark_real_performance.py
Cross-Language Integration Examples¶
🌐 One Output, Multiple Languages¶
datason's JSON output works seamlessly across programming languages:
# Python: Generate ML results
import datason
results = datason.serialize({
'model_accuracy': 0.94,
'predictions': numpy_predictions,
'feature_importance': feature_weights,
'training_date': datetime.now(),
'model_id': uuid.uuid4()
})
# Save or send via API
json.dump(results, open('ml_results.json', 'w'))
JavaScript Frontend:
// React/Vue/Angular can directly consume this
fetch('/api/ml-results')
.then(response => response.json())
.then(data => {
console.log(`Model accuracy: ${data.model_accuracy}`);
// Visualize predictions
plotChart(data.predictions);
// Show feature importance
displayFeatures(data.feature_importance);
// Format timestamp for UI
const date = new Date(data.training_date);
});
Java Backend:
// Spring Boot service can process the same data
ObjectMapper mapper = new ObjectMapper();
MLResults results = mapper.readValue(jsonString, MLResults.class);
System.out.println("Accuracy: " + results.getModelAccuracy());
System.out.println("Model ID: " + results.getModelId());
// Store in database, trigger alerts, etc.
if (results.getModelAccuracy() < 0.90) {
alertService.sendModelPerformanceAlert();
}
R Analytics:
# R can read and analyze the same results
library(jsonlite)
results <- fromJSON("ml_results.json")
cat("Model Accuracy:", results$model_accuracy, "\n")
# Statistical analysis on predictions
summary(results$predictions)
hist(results$predictions)
# Compare feature importance
barplot(results$feature_importance,
names.arg = paste("Feature", 1:length(results$feature_importance)))
Go Microservice:
// Go service for model monitoring
type MLResults struct {
ModelAccuracy float64 `json:"model_accuracy"`
Predictions []float64 `json:"predictions"`
FeatureImportance []float64 `json:"feature_importance"`
TrainingDate string `json:"training_date"`
ModelID string `json:"model_id"`
}
var results MLResults
json.Unmarshal(jsonData, &results)
fmt.Printf("Monitoring model %s with %.2f%% accuracy\n",
results.ModelID, results.ModelAccuracy*100)
// Send metrics to monitoring system
metrics.RecordModelAccuracy(results.ModelAccuracy)
🔄 Real-World Multi-Language Workflow¶
graph LR
A[Python ML Training] --> B[datason Serialize]
B --> C[JSON API Response]
C --> D[JavaScript Dashboard]
C --> E[Java Storage Service]
C --> F[R Analytics]
C --> G[Go Monitoring]
D --> H[Business Users]
E --> I[Database]
F --> J[Statistical Reports]
G --> K[Alerts & Metrics]
The Power of Universal JSON:
- ✅ One serialization format for all teams
- ✅ No language-specific converters needed
- ✅ Consistent data structure across services
- ✅ Easy debugging across the entire pipeline
- ✅ Business stakeholders can inspect data directly
Integration Examples¶
With Popular ML Libraries¶
# HuggingFace Transformers
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
model = AutoModel.from_pretrained('bert-base-uncased')
model_config = {
'tokenizer_vocab_size': len(tokenizer.vocab),
'model_config': model.config,
'sample_output': model(**tokenizer("Hello world", return_tensors="pt"))
}
serialized = datason.serialize(model_config)
# JAX/Flax support
import jax.numpy as jnp
jax_array = jnp.array([1, 2, 3, 4])
serialized = datason.serialize(jax_array)
# SciPy sparse matrices
from scipy.sparse import csr_matrix
sparse_matrix = csr_matrix([[1, 0, 2], [0, 0, 3], [4, 5, 6]])
serialized = datason.serialize(sparse_matrix)
Error Handling and Fallbacks¶
# Graceful handling of unsupported objects
class CustomMLModel:
def __init__(self):
self.weights = [1, 2, 3]
custom_model = CustomMLModel()
# Falls back to dict serialization or string representation
serialized = datason.serialize(custom_model)
Installation and Setup¶
pip install datason
# For ML dependencies (optional)
pip install datason[ml] # Includes torch, tensorflow, jax support
pip install datason[all] # Includes all optional dependencies
Why Choose datason?¶
- Zero Configuration: Works out of the box with any Python object
- ML/AI Native: Built specifically for data science and machine learning workflows
- Performance Optimized: Handles large datasets and complex objects efficiently
- Extensible: Easy to add custom serializers for your specific object types
- Safe: Handles edge cases like NaN, infinity, and circular references gracefully
- Type Preservation: Maintains type information for accurate deserialization
Comparison with Alternatives¶
Feature | datason | json | pickle | joblib |
---|---|---|---|---|
ML Objects | ✅ | ❌ | ✅ | ✅ |
Cross-language | ✅ | ✅ | ❌ | ❌ |
Human Readable | ✅ | ✅ | ❌ | ❌ |
Type Safe | ✅ | ❌ | ✅ | ✅ |
Performance | ⚡ | ⚡ | 🐌 | ⚡ |
Perfect for: Data scientists, ML engineers, API developers, research teams, and anyone working with complex Python objects in production systems.