GitHub: dbos_experiments/exp15
Experiment 15: DBOS Payload Size Performance Analysis
Purpose
This experiment measures the performance impact of different payload sizes in DBOS steps and compares two approaches:
- Multiple small step calls - Calling a step multiple times with incrementally larger payloads
- Single batched step call - Calling a step once that processes all payloads internally
The goal is to understand the overhead of DBOS step serialization, deserialization, and database storage as payload sizes increase from 1 byte to 1 MB.
Experiment Design
Approach 1: Multiple Step Calls (size_workflow)
Calls the size_step() function 7 times with increasing payload sizes:
- Iteration 1: 10^0 = 1 byte
- Iteration 2: 10^1 = 10 bytes
- Iteration 3: 10^2 = 100 bytes
- Iteration 4: 10^3 = 1,000 bytes (1 KB)
- Iteration 5: 10^4 = 10,000 bytes (10 KB)
- Iteration 6: 10^5 = 100,000 bytes (100 KB)
- Iteration 7: 10^6 = 1,000,000 bytes (1 MB)
Total payload: 1,111,111 bytes across 7 separate step calls
Approach 2: Single Batched Step (batch_size_workflow)
Calls batch_size_step() once, which internally generates all 7 payloads and returns them concatenated.
Total payload: 1,111,111 bytes in 1 step call
Key Observations
Performance Results
Workflow: Starting
Workflow: Iteration 1/7
Step: Size step with payload size 1 bytes
Workflow: Payload size is 1 bytes, took 55.41 ms
Workflow: Iteration 2/7
Step: Size step with payload size 10 bytes
Workflow: Payload size is 10 bytes, took 21.63 ms
Workflow: Iteration 3/7
Step: Size step with payload size 100 bytes
Workflow: Payload size is 100 bytes, took 21.71 ms
Workflow: Iteration 4/7
Step: Size step with payload size 1000 bytes
Workflow: Payload size is 1000 bytes, took 21.81 ms
Workflow: Iteration 5/7
Step: Size step with payload size 10000 bytes
Workflow: Payload size is 10000 bytes, took 23.87 ms
Workflow: Iteration 6/7
Step: Size step with payload size 100000 bytes
Workflow: Payload size is 100000 bytes, took 35.45 ms
Workflow: Iteration 7/7
Step: Size step with payload size 1000000 bytes
Workflow: Payload size is 1000000 bytes, took 120.46 ms
Workflow: Completed successfully in 300.35 ms
----------------------------------------------------
Workflow: Starting
Step: Batch size step iteration 1/7
Step: Batch size step iteration 2/7
Step: Batch size step iteration 3/7
Step: Batch size step iteration 4/7
Step: Batch size step iteration 5/7
Step: Batch size step iteration 6/7
Step: Batch size step iteration 7/7
Workflow: Payload size is 1111111 bytes, took 157.82 ms
Main: Workflow output: True
Analysis
| Approach | Total Time | Number of Steps | Overhead per Step |
|---|---|---|---|
| Multiple small steps | 300.35 ms | 7 | ~42.9 ms average |
| Single batched step | 157.82 ms | 1 | N/A |
Key Findings:
- First step overhead: The first step call takes ~55ms, likely due to initialization overhead
- Small payload consistency: Steps with payloads 1-1000 bytes take ~21-24ms consistently
- Scaling behavior: Performance degrades as payload size increases:
- 100 KB: 35.45 ms
- 1 MB: 120.46 ms
- Batching advantage: Single batched step is ~47% faster (157ms vs 300ms)
- Eliminates 6 step serialization/deserialization cycles
- Reduces database writes from 7 to 1
- Avoids repeated DBOS framework overhead
Performance Breakdown
Step Size | Time (ms) | Delta from Previous
----------------|-----------|--------------------
1 byte | 55.41 | baseline (includes init)
10 bytes | 21.63 | -33.78 ms (steady state)
100 bytes | 21.71 | +0.08 ms
1 KB | 21.81 | +0.10 ms
10 KB | 23.87 | +2.06 ms
100 KB | 35.45 | +11.58 ms
1 MB | 120.46 | +85.01 ms (non-linear growth)
Performance Implications
When to Use Multiple Steps
- Better granularity: Individual step recovery and retry
- Better observability: Track progress of each payload size
- Memory efficiency: Process data incrementally
- Acceptable for small payloads (< 10 KB): Overhead is minimal (~21-24ms per step)
When to Use Batched Steps
- Large payloads: Reduces serialization overhead significantly
- High throughput requirements: 47% faster for same total data
- Atomic operations: All-or-nothing processing
- Simple workflows: When granular recovery isn’t needed
Code Structure
size_step(payload_size: int) -> bytes
- Generates random bytes of size 10^payload_size
- Returns the payload
- Logs the payload size
size_workflow() -> bool
- Calls
size_step()7 times with increasing sizes - Measures and logs time for each step call
- Validates payload sizes
- Returns total execution time
batch_size_step(iterations: int) -> bytes
- Generates all 7 payloads internally
- Concatenates them into a single return value
- Logs progress for each iteration
batch_size_workflow() -> bool
- Calls
batch_size_step()once with all iterations - Measures total time
- Returns execution time
Usage
# Run the experiment
python exp15/ex1.py
Prerequisites
- PostgreSQL database running on
localhost:5432 - Database:
testwith usertrustle:trustle - Python dependencies:
pip install dbos
Environment Variables
export DBOS_DATABASE_URL="postgresql://trustle:trustle@localhost:5432/test?sslmode=disable"
Learning Points
- DBOS step overhead: ~21-24ms baseline per step for small payloads
- Serialization cost: Grows non-linearly with payload size
- Database I/O impact: Each step write adds overhead
- Batching benefits: Significant performance gain for high-volume workflows
- Design trade-offs: Granularity vs. performance
- Scaling behavior: 1 MB payload takes 5.5x longer than 100 KB
- First call penalty: Initial step has 2.5x overhead (~55ms vs ~21ms)
Recommendations
- Small data (< 10 KB): Use multiple steps for better observability
- Medium data (10-100 KB): Balance between granularity and performance
- Large data (> 100 KB): Consider batching or streaming approaches
- High-throughput: Batch processing can save ~47% execution time
- Critical workflows: Multiple steps provide better recovery granularity
Future Experiments
Potential areas to explore:
- Compression impact on payload serialization
- Streaming large payloads through multiple steps
- Impact of concurrent step execution
- Database storage costs for large payloads
- Memory usage patterns for different approaches
Recent changes
-
2025-10-18 66833e7 added readmes -
2025-10-14 247bfc9 new approach for the DB -
2025-10-14 f872897 test dbos step payload size
Categories: experiments, Python
Tags: dbos-experiments