Skip to content

📤 Modern API: Serialization Functions

Intention-revealing dump functions for different use cases and optimization needs.

🎯 Function Overview

Function Purpose Best For
dump() General-purpose with composable options Flexible workflows
dump_ml() ML-optimized for models and tensors Data science
dump_api() Clean JSON for web APIs Web development
dump_secure() Security-focused with PII redaction Sensitive data
dump_fast() Performance-optimized High-throughput
dump_chunked() Memory-efficient for large data Big datasets
stream_dump() Direct file streaming Very large files
FILE OPERATIONS
save_ml() Save ML data to JSON/JSONL files ML model persistence
save_secure() Save with PII redaction to files Secure file storage
save_api() Save clean data to files API data export
save_chunked() Save large data efficiently to files Big dataset export

📦 Detailed Function Documentation

dump()

General-purpose serialization with composable options.

datason.dump(obj: Any, fp: Any, **kwargs: Any) -> None

Enhanced file serialization (DataSON's smart default).

This saves enhanced DataSON serialized data to a file using save_ml(). For stdlib json.dump() compatibility, use datason.json.dump() or dump_json().

Parameters:

Name Type Description Default
obj Any

Object to serialize

required
fp Any

File-like object or file path to write to

required
**kwargs Any

DataSON configuration options

{}

Returns:

Type Description
None

None (writes to file)

Example

with open('data.json', 'w') as f: ... dump(data, f) # Enhanced serialization with smart features

For JSON compatibility:

import datason.json as json with open('data.json', 'w') as f: ... json.dump(data, f) # Exact json.dump() behavior

Source code in datason/api.py
def dump(obj: Any, fp: Any, **kwargs: Any) -> None:
    """Enhanced file serialization (DataSON's smart default).

    This saves enhanced DataSON serialized data to a file using save_ml().
    For stdlib json.dump() compatibility, use datason.json.dump() or dump_json().

    Args:
        obj: Object to serialize
        fp: File-like object or file path to write to
        **kwargs: DataSON configuration options

    Returns:
        None (writes to file)

    Example:
        >>> with open('data.json', 'w') as f:
        ...     dump(data, f)  # Enhanced serialization with smart features

        >>> # For JSON compatibility:
        >>> import datason.json as json
        >>> with open('data.json', 'w') as f:
        ...     json.dump(data, f)  # Exact json.dump() behavior
    """
    # Use enhanced file saving (supports both file objects and paths)
    if hasattr(fp, "write"):
        # File-like object: serialize to enhanced format and write
        import json

        serialized = _serialize_core(obj, **kwargs)
        json.dump(serialized, fp)
    else:
        # File path: use save_ml for enhanced features
        save_ml(obj, fp, **kwargs)

Composable Options Example:

import datason as ds
import torch
import pandas as pd

# Basic usage
data = {"values": [1, 2, 3], "timestamp": datetime.now()}
result = ds.dump(data)

# Composable options for specific needs
ml_data = {"model": torch.nn.Linear(10, 1), "df": pd.DataFrame({"x": [1, 2, 3]})}

# Combine security + ML optimization + chunked processing
secure_ml_result = ds.dump(
    ml_data,
    secure=True,    # Enable PII redaction
    ml_mode=True,   # Optimize for ML objects
    chunked=True    # Memory-efficient processing
)

dump_ml()

ML-optimized serialization for models, tensors, and NumPy arrays.

datason.dump_ml(obj: Any, **kwargs: Any) -> Any

ML-optimized serialization for models, tensors, and ML objects.

Automatically configures optimal settings for machine learning objects including NumPy arrays, PyTorch tensors, scikit-learn models, etc.

Parameters:

Name Type Description Default
obj Any

ML object to serialize

required
**kwargs Any

Additional configuration options

{}

Returns:

Type Description
Any

Serialized ML object optimized for reconstruction

Example

model = sklearn.ensemble.RandomForestClassifier() serialized = dump_ml(model)

Optimized for ML round-trip fidelity

Source code in datason/api.py
def dump_ml(obj: Any, **kwargs: Any) -> Any:
    """ML-optimized serialization for models, tensors, and ML objects.

    Automatically configures optimal settings for machine learning objects
    including NumPy arrays, PyTorch tensors, scikit-learn models, etc.

    Args:
        obj: ML object to serialize
        **kwargs: Additional configuration options

    Returns:
        Serialized ML object optimized for reconstruction

    Example:
        >>> model = sklearn.ensemble.RandomForestClassifier()
        >>> serialized = dump_ml(model)
        >>> # Optimized for ML round-trip fidelity
    """
    # Create a copy of ML-optimized config to avoid modifying shared instances
    base_config = get_ml_config()
    from dataclasses import replace

    config = replace(base_config, **kwargs)

    # Directly call serialize - serializer handles circular references properly
    from .core_new import serialize

    return serialize(obj, config=config)

ML Workflow Example:

import torch
import numpy as np
from sklearn.ensemble import RandomForestClassifier

ml_data = {
    "pytorch_model": torch.nn.Linear(10, 1),
    "sklearn_model": RandomForestClassifier(),
    "tensor": torch.randn(100, 10),
    "numpy_array": np.random.random((100, 10)),
}

# Automatically optimized for ML objects
result = ds.dump_ml(ml_data)

dump_api()

API-safe serialization for clean JSON output.

datason.dump_api(obj: Any, **kwargs: Any) -> Any

API-safe serialization for web responses and APIs.

Produces clean, predictable JSON suitable for API responses. Handles edge cases gracefully and ensures consistent output format.

Parameters:

Name Type Description Default
obj Any

Object to serialize for API response

required
**kwargs Any

Additional configuration options

{}

Returns:

Type Description
Any

API-safe serialized object

Example

@app.route('/api/data') def get_data(): return dump_api(complex_data_structure)

Source code in datason/api.py
def dump_api(obj: Any, **kwargs: Any) -> Any:
    """API-safe serialization for web responses and APIs.

    Produces clean, predictable JSON suitable for API responses.
    Handles edge cases gracefully and ensures consistent output format.

    Args:
        obj: Object to serialize for API response
        **kwargs: Additional configuration options

    Returns:
        API-safe serialized object

    Example:
        >>> @app.route('/api/data')
        >>> def get_data():
        >>>     return dump_api(complex_data_structure)
    """
    # Create a copy of API-optimized config to avoid modifying shared instances
    base_config = get_api_config()
    from dataclasses import replace

    config = replace(base_config, **kwargs)

    # Directly call serialize - serializer handles circular references properly
    from .core_new import serialize

    return serialize(obj, config=config)

Web API Example:

# Web API response data
api_data = {
    "status": "success",
    "data": [1, 2, 3],
    "errors": None,        # Will be removed
    "timestamp": datetime.now(),
    "metadata": {"version": "1.0"}
}

# Clean JSON output, removes null values
clean_result = ds.dump_api(api_data)

dump_secure()

Security-focused serialization with PII redaction.

datason.dump_secure(obj: Any, *, redact_pii: bool = True, redact_fields: Optional[List[str]] = None, redact_patterns: Optional[List[str]] = None, **kwargs: Any) -> Any

Security-focused serialization with PII redaction.

Automatically redacts sensitive information like credit cards, SSNs, emails, and common secret fields.

Parameters:

Name Type Description Default
obj Any

Object to serialize securely

required
redact_pii bool

Enable automatic PII pattern detection

True
redact_fields Optional[List[str]]

Additional field names to redact

None
redact_patterns Optional[List[str]]

Additional regex patterns to redact

None
**kwargs Any

Additional configuration options

{}

Returns:

Type Description
Any

Serialized object with sensitive data redacted

Example

user_data = {"name": "John", "ssn": "123-45-6789"} safe_data = dump_secure(user_data)

SSN will be redacted:

Source code in datason/api.py
def dump_secure(
    obj: Any,
    *,
    redact_pii: bool = True,
    redact_fields: Optional[List[str]] = None,
    redact_patterns: Optional[List[str]] = None,
    **kwargs: Any,
) -> Any:
    """Security-focused serialization with PII redaction.

    Automatically redacts sensitive information like credit cards,
    SSNs, emails, and common secret fields.

    Args:
        obj: Object to serialize securely
        redact_pii: Enable automatic PII pattern detection
        redact_fields: Additional field names to redact
        redact_patterns: Additional regex patterns to redact
        **kwargs: Additional configuration options

    Returns:
        Serialized object with sensitive data redacted

    Example:
        >>> user_data = {"name": "John", "ssn": "123-45-6789"}
        >>> safe_data = dump_secure(user_data)
        >>> # SSN will be redacted: {"name": "John", "ssn": "[REDACTED]"}
    """
    # Create secure config with redaction settings
    patterns = []
    fields = []

    if redact_pii:
        patterns.extend(
            [
                r"\b\d{4}-\d{4}-\d{4}-\d{4}\b",  # Credit cards with dashes
                r"\b\d{16}\b",  # Credit cards without dashes
                r"\b\d{3}-\d{2}-\d{4}\b",  # SSN
                r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b",  # Email
            ]
        )
        fields.extend(["password", "api_key", "secret", "token", "ssn", "credit_card"])

    if redact_patterns:
        patterns.extend(redact_patterns)
    if redact_fields:
        fields.extend(redact_fields)

    # Remove include_redaction_summary from kwargs if present to avoid duplicate
    kwargs_clean = {k: v for k, v in kwargs.items() if k != "include_redaction_summary"}

    config = SerializationConfig(
        redact_patterns=patterns,
        redact_fields=fields,
        include_redaction_summary=True,
        # Keep normal max_depth to maintain security
        **kwargs_clean,
    )

    # Directly call serialize - serializer handles circular references properly
    from .core_new import serialize

    return serialize(obj, config=config)

Security Example:

# Sensitive user data
user_data = {
    "name": "John Doe",
    "email": "john@example.com",
    "ssn": "123-45-6789",
    "password": "secret123",
    "credit_card": "4532-1234-5678-9012"
}

# Automatic PII redaction
secure_result = ds.dump_secure(user_data, redact_pii=True)

# Custom redaction patterns
custom_result = ds.dump_secure(
    user_data,
    redact_fields=["internal_id"],
    redact_patterns=[r"\b\d{4}-\d{4}-\d{4}-\d{4}\b"]
)

dump_fast()

Performance-optimized for high-throughput scenarios.

datason.dump_fast(obj: Any, **kwargs: Any) -> Any

Performance-optimized serialization.

Optimized for speed with minimal type checking and validation. Use when you need maximum performance and can accept some trade-offs in type fidelity.

Parameters:

Name Type Description Default
obj Any

Object to serialize quickly

required
**kwargs Any

Additional configuration options

{}

Returns:

Type Description
Any

Serialized object optimized for speed

Example

For high-throughput scenarios

result = dump_fast(large_dataset)

Source code in datason/api.py
def dump_fast(obj: Any, **kwargs: Any) -> Any:
    """Performance-optimized serialization.

    Optimized for speed with minimal type checking and validation.
    Use when you need maximum performance and can accept some trade-offs
    in type fidelity.

    Args:
        obj: Object to serialize quickly
        **kwargs: Additional configuration options

    Returns:
        Serialized object optimized for speed

    Example:
        >>> # For high-throughput scenarios
        >>> result = dump_fast(large_dataset)
    """
    config = get_performance_config()
    return serialize(obj, config=config)

High-Throughput Example:

# Large batch processing
batch_data = [{"id": i, "value": random.random()} for i in range(10000)]

# Minimal overhead, optimized for speed
fast_result = ds.dump_fast(batch_data)

dump_chunked()

Memory-efficient chunked serialization for large objects.

datason.dump_chunked(obj: Any, *, chunk_size: int = 1000, **kwargs: Any) -> Any

Chunked serialization for large objects.

Breaks large objects into manageable chunks for memory efficiency and streaming processing.

Parameters:

Name Type Description Default
obj Any

Large object to serialize in chunks

required
chunk_size int

Size of each chunk

1000
**kwargs Any

Additional configuration options

{}

Returns:

Type Description
Any

ChunkedSerializationResult with metadata and chunks

Example

big_list = list(range(10000)) result = dump_chunked(big_list, chunk_size=1000)

Returns ChunkedSerializationResult with 10 chunks

Source code in datason/api.py
def dump_chunked(obj: Any, *, chunk_size: int = 1000, **kwargs: Any) -> Any:
    """Chunked serialization for large objects.

    Breaks large objects into manageable chunks for memory efficiency
    and streaming processing.

    Args:
        obj: Large object to serialize in chunks
        chunk_size: Size of each chunk
        **kwargs: Additional configuration options

    Returns:
        ChunkedSerializationResult with metadata and chunks

    Example:
        >>> big_list = list(range(10000))
        >>> result = dump_chunked(big_list, chunk_size=1000)
        >>> # Returns ChunkedSerializationResult with 10 chunks
    """
    return serialize_chunked(obj, chunk_size=chunk_size, **kwargs)

Large Dataset Example:

# Very large dataset
large_data = {
    "images": [np.random.random((512, 512, 3)) for _ in range(1000)],
    "features": np.random.random((100000, 200))
}

# Process in memory-efficient chunks
chunked_result = ds.dump_chunked(large_data, chunk_size=1000)

stream_dump()

Direct file streaming for very large data.

datason.stream_dump(file_path: str, **kwargs: Any) -> Any

Streaming serialization to file.

Efficiently serialize large datasets directly to file without loading everything into memory.

Parameters:

Name Type Description Default
file_path str

Path to output file

required
**kwargs Any

Additional configuration options

{}

Returns:

Type Description
Any

StreamingSerializer instance for continued operations

Example

with stream_dump("output.jsonl") as streamer: for item in large_dataset: streamer.write(item)

Source code in datason/api.py
def stream_dump(file_path: str, **kwargs: Any) -> Any:
    """Streaming serialization to file.

    Efficiently serialize large datasets directly to file without
    loading everything into memory.

    Args:
        file_path: Path to output file
        **kwargs: Additional configuration options

    Returns:
        StreamingSerializer instance for continued operations

    Example:
        >>> with stream_dump("output.jsonl") as streamer:
        >>>     for item in large_dataset:
        >>>         streamer.write(item)
    """
    return stream_serialize(file_path, **kwargs)

File Streaming Example:

# Stream directly to file
huge_data = {"massive_array": np.random.random((1000000, 100))}

with open('large_output.json', 'w') as f:
    ds.stream_dump(huge_data, f)

🗃️ File Operations Functions

save_ml()

ML-optimized file saving with perfect type preservation.

datason.save_ml(obj: Any, path: Union[str, Path], *, format: Optional[str] = None, **kwargs: Any) -> None

Save ML-optimized data to JSON or JSONL file.

Combines ML-specific serialization with file I/O, preserving ML types like NumPy arrays, PyTorch tensors, etc.

Parameters:

Name Type Description Default
obj Any

ML object or data to save

required
path Union[str, Path]

Output file path (.json for single object, .jsonl for multiple objects)

required
format Optional[str]

Explicit format ('json' or 'jsonl'), auto-detected from extension if None

None
**kwargs Any

Additional ML configuration options

{}

Examples:

>>> import numpy as np
>>> data = [{"weights": np.array([1, 2, 3]), "epoch": 1}]
>>>
>>> # Save as JSONL (multiple objects, one per line)
>>> save_ml(data, "training.jsonl")
>>> save_ml(data, "training.json", format="jsonl")  # Force JSONL
>>>
>>> # Save as JSON (single array object)
>>> save_ml(data, "training.json")
>>> save_ml(data, "training.jsonl", format="json")  # Force JSON
Source code in datason/api.py
def save_ml(obj: Any, path: Union[str, Path], *, format: Optional[str] = None, **kwargs: Any) -> None:
    """Save ML-optimized data to JSON or JSONL file.

    Combines ML-specific serialization with file I/O, preserving
    ML types like NumPy arrays, PyTorch tensors, etc.

    Args:
        obj: ML object or data to save
        path: Output file path (.json for single object, .jsonl for multiple objects)
        format: Explicit format ('json' or 'jsonl'), auto-detected from extension if None
        **kwargs: Additional ML configuration options

    Examples:
        >>> import numpy as np
        >>> data = [{"weights": np.array([1, 2, 3]), "epoch": 1}]
        >>>
        >>> # Save as JSONL (multiple objects, one per line)
        >>> save_ml(data, "training.jsonl")
        >>> save_ml(data, "training.json", format="jsonl")  # Force JSONL
        >>>
        >>> # Save as JSON (single array object)
        >>> save_ml(data, "training.json")
        >>> save_ml(data, "training.jsonl", format="json")  # Force JSON
    """
    import gzip
    import json
    from pathlib import Path

    # Get ML-optimized config
    config = get_ml_config()

    # Apply any additional config options
    for key, value in kwargs.items():
        if hasattr(config, key):
            setattr(config, key, value)

    # Detect format
    path_obj = Path(path)
    detected_format = _detect_file_format(path_obj, format)

    # Check for compression
    is_compressed = path_obj.suffix == ".gz" or (len(path_obj.suffixes) > 1 and path_obj.suffixes[-1] == ".gz")

    # Pre-serialize the object - this already applies the ML-specific serialization
    serialized = dump_ml(obj, **kwargs)

    # Open file with appropriate compression
    def open_func(mode):
        if is_compressed:
            return gzip.open(path_obj, mode, encoding="utf-8")
        else:
            return path_obj.open(mode)

    # Write to file in appropriate format, don't re-serialize
    with open_func("wt") as f:
        if detected_format == "jsonl":
            # JSONL: Write each item on a separate line
            if isinstance(serialized, (list, tuple)):
                for item in serialized:
                    json.dump(item, f, ensure_ascii=False)
                    f.write("\n")
            else:
                json.dump(serialized, f, ensure_ascii=False)
                f.write("\n")
        else:
            # JSON: Write as single object
            json.dump(serialized, f, ensure_ascii=False)

ML File Workflow Example:

import torch
import numpy as np
from sklearn.ensemble import RandomForestClassifier

# Complete ML experiment data
experiment = {
    "model": RandomForestClassifier(n_estimators=100),
    "weights": torch.randn(100, 50),
    "features": np.random.random((1000, 20)),
    "metadata": {"version": "1.0", "accuracy": 0.95}
}

# Save to JSON file with perfect ML type preservation
ds.save_ml(experiment, "experiment.json")

# Save to JSONL file (each key as separate line)
ds.save_ml(experiment, "experiment.jsonl")

# Automatic compression detection
ds.save_ml(experiment, "experiment.json.gz")  # Compressed

save_secure()

Secure file saving with PII redaction and integrity verification.

datason.save_secure(obj: Any, path: Union[str, Path], *, format: Optional[str] = None, redact_pii: bool = True, redact_fields: Optional[List[str]] = None, redact_patterns: Optional[List[str]] = None, **kwargs: Any) -> None

Save data to JSON/JSONL file with security features.

Automatically redacts sensitive information before saving.

Parameters:

Name Type Description Default
obj Any

Data to save securely

required
path Union[str, Path]

Output file path

required
format Optional[str]

Explicit format ('json' or 'jsonl'), auto-detected if None

None
redact_pii bool

Enable automatic PII pattern detection

True
redact_fields Optional[List[str]]

Additional field names to redact

None
redact_patterns Optional[List[str]]

Additional regex patterns to redact

None
**kwargs Any

Additional security options

{}

Examples:

>>> user_data = [{"name": "John", "ssn": "123-45-6789"}]
>>>
>>> # Save as JSONL (auto-detected)
>>> save_secure(user_data, "users.jsonl", redact_pii=True)
>>>
>>> # Save as JSON (auto-detected)
>>> save_secure(user_data, "users.json", redact_pii=True)
Source code in datason/api.py
def save_secure(
    obj: Any,
    path: Union[str, Path],
    *,
    format: Optional[str] = None,
    redact_pii: bool = True,
    redact_fields: Optional[List[str]] = None,
    redact_patterns: Optional[List[str]] = None,
    **kwargs: Any,
) -> None:
    """Save data to JSON/JSONL file with security features.

    Automatically redacts sensitive information before saving.

    Args:
        obj: Data to save securely
        path: Output file path
        format: Explicit format ('json' or 'jsonl'), auto-detected if None
        redact_pii: Enable automatic PII pattern detection
        redact_fields: Additional field names to redact
        redact_patterns: Additional regex patterns to redact
        **kwargs: Additional security options

    Examples:
        >>> user_data = [{"name": "John", "ssn": "123-45-6789"}]
        >>>
        >>> # Save as JSONL (auto-detected)
        >>> save_secure(user_data, "users.jsonl", redact_pii=True)
        >>>
        >>> # Save as JSON (auto-detected)
        >>> save_secure(user_data, "users.json", redact_pii=True)
    """
    import gzip
    import json
    from pathlib import Path

    # Create secure config with redaction settings (same logic as dump_secure)
    patterns = []
    fields = []

    if redact_pii:
        patterns.extend(
            [
                r"\b\d{4}-\d{4}-\d{4}-\d{4}\b",  # Credit cards with dashes
                r"\b\d{16}\b",  # Credit cards without dashes
                r"\b\d{3}-\d{2}-\d{4}\b",  # SSN
                r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b",  # Email
            ]
        )
        fields.extend(["password", "api_key", "secret", "token", "ssn", "credit_card"])

    if redact_patterns:
        patterns.extend(redact_patterns)
    if redact_fields:
        fields.extend(redact_fields)

    # Apply secure configuration
    secure_kwargs = {"redact_patterns": patterns, "redact_fields": fields, "include_redaction_summary": True, **kwargs}

    # Pre-serialize with redaction applied - this already handles the secure serialization
    secure_serialized = dump_secure(obj, **secure_kwargs)

    # Detect format
    path_obj = Path(path)
    detected_format = _detect_file_format(path_obj, format)

    # Check for compression
    is_compressed = path_obj.suffix == ".gz" or (len(path_obj.suffixes) > 1 and path_obj.suffixes[-1] == ".gz")

    # Open file with appropriate compression
    def open_func(mode):
        if is_compressed:
            return gzip.open(path_obj, mode, encoding="utf-8")
        else:
            return path_obj.open(mode)

    # Write to file in appropriate format, don't re-serialize
    with open_func("wt") as f:
        if detected_format == "jsonl":
            # JSONL: Write each item on a separate line
            if isinstance(secure_serialized, (list, tuple)):
                for item in secure_serialized:
                    json.dump(item, f, ensure_ascii=False)
                    f.write("\n")
            else:
                json.dump(secure_serialized, f, ensure_ascii=False)
                f.write("\n")
        else:
            # JSON: Write as single object
            json.dump(secure_serialized, f, ensure_ascii=False)

Secure File Example:

# Sensitive data with PII
user_data = {
    "users": [
        {"name": "John Doe", "ssn": "123-45-6789", "email": "john@example.com"},
        {"name": "Jane Smith", "ssn": "987-65-4321", "email": "jane@example.com"}
    ],
    "api_key": "sk-1234567890abcdef"
}

# Automatic PII redaction with audit trail
ds.save_secure(user_data, "users.json", redact_pii=True)

# Custom redaction patterns
ds.save_secure(
    user_data,
    "users_custom.json",
    redact_fields=["api_key"],
    redact_patterns=[r'\b\d{3}-\d{2}-\d{4}\b']  # SSN pattern
)

save_api()

Clean API-safe file saving with null removal and formatting.

datason.save_api(obj: Any, path: Union[str, Path], *, format: Optional[str] = None, **kwargs: Any) -> None

Save API-safe data to JSON/JSONL file.

Produces clean, predictable output suitable for API data exchange.

Parameters:

Name Type Description Default
obj Any

Data to save for API use

required
path Union[str, Path]

Output file path

required
format Optional[str]

Explicit format ('json' or 'jsonl'), auto-detected if None

None
**kwargs Any

Additional API configuration options

{}

Examples:

>>> api_data = [{"status": "success", "data": [1, 2, 3]}]
>>>
>>> # Save as single JSON object
>>> save_api(api_data, "responses.json")
>>>
>>> # Save as JSONL (one response per line)
>>> save_api(api_data, "responses.jsonl")
Source code in datason/api.py
def save_api(obj: Any, path: Union[str, Path], *, format: Optional[str] = None, **kwargs: Any) -> None:
    """Save API-safe data to JSON/JSONL file.

    Produces clean, predictable output suitable for API data exchange.

    Args:
        obj: Data to save for API use
        path: Output file path
        format: Explicit format ('json' or 'jsonl'), auto-detected if None
        **kwargs: Additional API configuration options

    Examples:
        >>> api_data = [{"status": "success", "data": [1, 2, 3]}]
        >>>
        >>> # Save as single JSON object
        >>> save_api(api_data, "responses.json")
        >>>
        >>> # Save as JSONL (one response per line)
        >>> save_api(api_data, "responses.jsonl")
    """
    import gzip
    import json
    from pathlib import Path

    # Get API-optimized config and serialize the data
    api_serialized = dump_api(obj, **kwargs)

    # Detect format
    path_obj = Path(path)
    detected_format = _detect_file_format(path_obj, format)

    # Check for compression
    is_compressed = path_obj.suffix == ".gz" or (len(path_obj.suffixes) > 1 and path_obj.suffixes[-1] == ".gz")

    # Open file with appropriate compression
    def open_func(mode):
        if is_compressed:
            return gzip.open(path_obj, mode, encoding="utf-8")
        else:
            return path_obj.open(mode)

    # Write to file in appropriate format, don't re-serialize
    with open_func("wt") as f:
        if detected_format == "jsonl":
            # JSONL: Write each item on a separate line
            if isinstance(api_serialized, (list, tuple)):
                for item in api_serialized:
                    json.dump(item, f, ensure_ascii=False)
                    f.write("\n")
            else:
                json.dump(api_serialized, f, ensure_ascii=False)
                f.write("\n")
        else:
            # JSON: Write as single object
            json.dump(api_serialized, f, ensure_ascii=False)

API Export Example:

# API response data with nulls and complex types
api_response = {
    "status": "success",
    "data": [1, 2, 3],
    "errors": None,  # Will be removed
    "timestamp": datetime.now(),
    "pagination": {"page": 1, "total": None}  # Null removed
}

# Clean JSON output for API consumption
ds.save_api(api_response, "api_export.json")

# Multiple responses to JSONL
responses = [api_response, api_response, api_response]
ds.save_api(responses, "api_batch.jsonl")

save_chunked()

Memory-efficient file saving for large datasets.

datason.save_chunked(obj: Any, path: Union[str, Path], *, chunk_size: int = 1000, format: Optional[str] = None, **kwargs: Any) -> None

Save large data to JSON/JSONL file using chunked serialization.

Memory-efficient saving for large datasets.

Parameters:

Name Type Description Default
obj Any

Large dataset to save

required
path Union[str, Path]

Output file path

required
chunk_size int

Size of each chunk

1000
format Optional[str]

Explicit format ('json' or 'jsonl'), auto-detected if None

None
**kwargs Any

Additional chunking options

{}
Example

large_dataset = list(range(100000)) save_chunked(large_dataset, "large.jsonl", chunk_size=5000) save_chunked(large_dataset, "large.json", chunk_size=5000) # JSON array format

Source code in datason/api.py
def save_chunked(
    obj: Any, path: Union[str, Path], *, chunk_size: int = 1000, format: Optional[str] = None, **kwargs: Any
) -> None:
    """Save large data to JSON/JSONL file using chunked serialization.

    Memory-efficient saving for large datasets.

    Args:
        obj: Large dataset to save
        path: Output file path
        chunk_size: Size of each chunk
        format: Explicit format ('json' or 'jsonl'), auto-detected if None
        **kwargs: Additional chunking options

    Example:
        >>> large_dataset = list(range(100000))
        >>> save_chunked(large_dataset, "large.jsonl", chunk_size=5000)
        >>> save_chunked(large_dataset, "large.json", chunk_size=5000)  # JSON array format
    """
    detected_format = _detect_file_format(path, format)
    chunked_result = dump_chunked(obj, chunk_size=chunk_size, **kwargs)
    chunked_result.save_to_file(path, format=detected_format)

Large Dataset File Example:

# Large dataset that might not fit in memory
large_data = {
    "training_data": [{"features": np.random.random(1000)} for _ in range(10000)],
    "metadata": {"size": "10K samples", "version": "1.0"}
}

# Memory-efficient chunked file saving
ds.save_chunked(large_data, "training.json", chunk_size=1000)

# JSONL format for streaming
ds.save_chunked(large_data, "training.jsonl", chunk_size=500)

# Compressed chunked saving
ds.save_chunked(large_data, "training.json.gz", chunk_size=1000)

🔄 Choosing the Right Function

Decision Tree

  1. Need security/PII redaction? → Use dump_secure()
  2. Working with ML models/tensors? → Use dump_ml()
  3. Building web APIs? → Use dump_api()
  4. Processing very large data? → Use dump_chunked() or stream_dump()
  5. Need maximum speed? → Use dump_fast()
  6. Want flexibility? → Use dump() with options

Performance Comparison

Function Speed Memory Usage Features
dump_fast() ⚡⚡⚡ 🧠🧠 Minimal
dump() ⚡⚡ 🧠🧠 Composable
dump_api() ⚡⚡ 🧠🧠 Clean output
dump_ml() 🧠🧠🧠 ML optimized
dump_secure() 🧠🧠🧠 Security features
dump_chunked() 🧠 Memory efficient

🎨 Composable Patterns

Combining Features

# Security + ML + Performance
secure_ml_fast = ds.dump(
    ml_model_data,
    secure=True,
    ml_mode=True,
    fast=True
)

# API + Security
secure_api = ds.dump_api(api_data, secure=True)

# ML + Chunked for large models
large_ml = ds.dump_ml(huge_model, chunked=True)