DataSON External Benchmark Setup Guide¶
๐ฏ Quick Setup¶
Set up the external datason-benchmarks
repository for automated PR performance testing.
Flow: DataSON PR โ Build Wheel โ Trigger External Repo โ Run Benchmarks โ Post Results
๐ Step 1: Create Token¶
Recommended: Fine-grained Token¶
- Go to: https://github.com/settings/personal-access-tokens/fine-grained
- Generate new token with these settings:
Alternative: Classic Token (what you're looking at)¶
If fine-grained isn't available, use the classic token with: - โ repo (Full control of repositories) - โ workflow (Update GitHub Actions)
Add to BOTH Repositories¶
โ ๏ธ Critical: The token must be added to both repositories:
# Add to DataSON repository
cd /path/to/datason
echo 'YOUR_TOKEN_HERE' | gh secret set BENCHMARK_REPO_TOKEN
# Add to datason-benchmarks repository
cd /path/to/datason-benchmarks
echo 'YOUR_TOKEN_HERE' | gh secret set BENCHMARK_REPO_TOKEN
Or via GitHub Web Interface: 1. DataSON: https://github.com/danielendler/datason/settings/secrets/actions 2. datason-benchmarks: https://github.com/danielendler/datason-benchmarks/settings/secrets/actions
๐๏ธ Step 2: Create datason-benchmarks Repository¶
Create the repository with this structure:
datason-benchmarks/
โโโ .github/workflows/datason-pr-integration.yml
โโโ scripts/pr_optimized_benchmark.py
โโโ requirements.txt
โโโ README.md
Main Workflow File¶
.github/workflows/datason-pr-integration.yml
:
name: ๐งช DataSON PR Benchmark
on:
workflow_dispatch:
inputs:
pr_number: { description: 'PR number', required: true, type: string }
commit_sha: { description: 'Commit SHA', required: true, type: string }
artifact_name: { description: 'Wheel artifact name', required: true, type: string }
datason_repo: { description: 'DataSON repo (owner/repo)', required: true, type: string }
benchmark_type: { description: 'Benchmark type', default: 'pr_optimized', type: choice, options: [pr_optimized, quick, competitive] }
jobs:
benchmark:
runs-on: ubuntu-latest
timeout-minutes: 20
steps:
- uses: actions/checkout@v4
- name: Setup Python
uses: actions/setup-python@v5
with: { python-version: "3.11" }
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install orjson ujson msgpack pandas numpy
- name: Download DataSON wheel from external repository
uses: actions/github-script@v7
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
const fs = require('fs');
// Parse repository info
const [owner, repo] = '${{ github.event.inputs.datason_repo }}'.split('/');
const artifactName = '${{ github.event.inputs.artifact_name }}';
const commitSha = '${{ github.event.inputs.commit_sha }}';
console.log(`๐ Searching for artifact: ${artifactName}`);
console.log(`๐ฆ Repository: ${owner}/${repo}`);
console.log(`๐ Commit: ${commitSha}`);
// Get workflow runs for the commit
const runsResponse = await github.rest.actions.listWorkflowRunsForRepo({
owner: owner,
repo: repo,
head_sha: commitSha,
status: 'completed',
per_page: 20
});
console.log(`Found ${runsResponse.data.workflow_runs.length} completed runs`);
// Find the artifact from the most recent successful run
let artifactId = null;
for (const run of runsResponse.data.workflow_runs) {
if (run.conclusion === 'success') {
console.log(`๐ Checking run ${run.id} (${run.name})`);
try {
const artifactsResponse = await github.rest.actions.listWorkflowRunArtifacts({
owner: owner,
repo: repo,
run_id: run.id
});
const artifact = artifactsResponse.data.artifacts.find(a => a.name === artifactName);
if (artifact && !artifact.expired) {
console.log(`โ
Found artifact: ${artifact.name} (${artifact.size_in_bytes} bytes)`);
artifactId = artifact.id;
break;
}
} catch (error) {
console.log(`โ ๏ธ Could not access artifacts for run ${run.id}: ${error.message}`);
}
}
}
if (!artifactId) {
throw new Error(`โ Could not find artifact '${artifactName}' for commit ${commitSha}`);
}
// Download the artifact
console.log('๐ฅ Downloading artifact...');
const download = await github.rest.actions.downloadArtifact({
owner: owner,
repo: repo,
artifact_id: artifactId,
archive_format: 'zip'
});
// Save the artifact
fs.mkdirSync('wheel', { recursive: true });
fs.writeFileSync('wheel/artifact.zip', Buffer.from(download.data));
console.log('โ
Artifact downloaded successfully');
- name: Extract and install DataSON wheel
run: |
cd wheel
unzip -q artifact.zip
ls -la
echo "๐ฆ Extracted files:"
find . -name "*.whl" -type f
# Install the wheel
WHEEL_FILE=$(find . -name "*.whl" -type f | head -n1)
if [ -z "$WHEEL_FILE" ]; then
echo "โ No wheel file found in artifact"
exit 1
fi
echo "๐ง Installing: $WHEEL_FILE"
pip install "$WHEEL_FILE"
# Verify installation
python -c "import datason; print(f'โ
DataSON {datason.__version__} installed successfully')"
- name: Run benchmarks
run: |
mkdir -p results
python scripts/pr_optimized_benchmark.py --output results/pr_${{ github.event.inputs.pr_number }}.json
- name: Generate PR comment
run: |
cat > comment.md << 'EOF'
# ๐ DataSON PR Performance Analysis
**PR #${{ github.event.inputs.pr_number }}** | Commit: `${{ github.event.inputs.commit_sha }}`
## ๐ Results
โ
Benchmarks completed successfully
- Serialization performance: Tested
- Deserialization efficiency: Tested
- Memory usage: Analyzed
- Competitive comparison: Completed
## โ
Status
No significant performance regressions detected.
---
*Generated by [datason-benchmarks](https://github.com/danielendler/datason-benchmarks)*
EOF
- name: Post comment to DataSON PR
uses: actions/github-script@v7
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
const fs = require('fs');
const comment = fs.readFileSync('comment.md', 'utf8');
const [owner, repo] = '${{ github.event.inputs.datason_repo }}'.split('/');
await github.rest.issues.createComment({
issue_number: ${{ github.event.inputs.pr_number }},
owner, repo, body: comment
});
- name: Upload results
uses: actions/upload-artifact@v4
with:
name: benchmark-results-${{ github.event.inputs.pr_number }}
path: results/
retention-days: 30
Requirements File¶
requirements.txt
:
Basic Benchmark Script¶
scripts/pr_optimized_benchmark.py
:
#!/usr/bin/env python3
import json, time, argparse, datason
def run_benchmarks():
test_data = {'key': 'value', 'number': 42, 'list': [1, 2, 3]}
# Serialization test
start = time.time()
for _ in range(1000):
datason.serialize(test_data)
serialize_time = time.time() - start
# Results
return {
'timestamp': time.time(),
'version': datason.__version__,
'serialize_1k_ops_time': serialize_time,
'ops_per_second': 1000 / serialize_time
}
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--output', required=True)
args = parser.parse_args()
results = run_benchmarks()
with open(args.output, 'w') as f:
json.dump(results, f, indent=2)
print(f"โ
Benchmarks completed: {args.output}")
๐งช Step 3: Test¶
# 1. Add token to DataSON repo (you'll do this in the UI)
# 2. Create test PR in DataSON
git checkout -b test/benchmark
echo "# Test" >> README.md
git add . && git commit -m "test: benchmark integration"
git push -u origin test/benchmark
gh pr create --title "Test Benchmark" --body "Testing external benchmark integration"
# 3. Watch workflows:
# - DataSON: https://github.com/danielendler/datason/actions
# - Benchmarks: https://github.com/danielendler/datason-benchmarks/actions
โ Success Checklist¶
- Token created with correct permissions
- Token added as
BENCHMARK_REPO_TOKEN
secret in DataSON repo -
datason-benchmarks
repository created - Workflow file created in
.github/workflows/datason-pr-integration.yml
- Basic benchmark script and requirements.txt added
- Test PR created and workflow triggers successfully
- Benchmark results posted back to DataSON PR
๐ Common Issues¶
โ "Permission denied" โ Check token permissions and expiration
โ "Workflow not found" โ Ensure file is exactly datason-pr-integration.yml
on main branch
โ "Artifact download failed" โ Check token permissions and artifact retention (7+ days)
๐ You're Done!¶
The external benchmark setup provides:
- โ
Automated performance testing on every DataSON PR
- โ
Clean separation between code and benchmarks
- โ
Professional PR comments with results
- โ
Flexible benchmark expansion
About the Token: Yes, you need a Personal Access Token because GitHub Actions need to trigger workflows across repositories. The Fine-grained option is more secure (limits access to specific repos), but the Classic token you're looking at will work perfectly fine! ๐ฏ