benchmarks/README.md

# SpaceTime Benchmark Suite

Standardized benchmarks for measuring and comparing space-time tradeoffs across algorithms and systems.

## Features

- **Standard Benchmarks**: Sorting, searching, graph algorithms, matrix operations
- **Real-World Workloads**: Database queries, ML training, distributed computing
- **Accurate Measurement**: Time, memory (peak/average), cache misses, throughput
- **Statistical Analysis**: Compare strategies with confidence
- **Reproducible Results**: Controlled environment, result validation
- **Visualization**: Automatic plots and analysis

## Installation

```bash
# From sqrtspace-tools root directory
pip install numpy matplotlib psutil

# For database benchmarks
pip install sqlite3  # Usually pre-installed
```

## Quick Start

```bash
# Run quick benchmark suite
python spacetime_benchmarks.py --quick

# Run all benchmarks
python spacetime_benchmarks.py

# Run specific suite
python spacetime_benchmarks.py --suite sorting

# Analyze saved results
python spacetime_benchmarks.py --analyze results_20240315_143022.json
```

## Benchmark Categories

### 1. Sorting Algorithms
Compare memory-time tradeoffs in sorting:

```python
# Strategies benchmarked:
- standard: In-memory quicksort/mergesort (O(n) space)
- sqrt_n: External sort with √n buffer (O(√n) space)
- constant: Streaming sort (O(1) space)

# Example results for n=1,000,000:
Standard: 0.125s, 8.0MB memory
√n buffer: 0.187s, 0.3MB memory (96% less memory, 50% slower)
Streaming: 0.543s, 0.01MB memory (99.9% less memory, 4.3x slower)
```

### 2. Search Data Structures
Compare different index structures:

```python
# Strategies benchmarked:
- hash: Standard hash table (O(n) space)
- btree: B-tree index (O(n) space, cache-friendly)
- external: External index with √n cache

# Example results for n=1,000,000:
Hash table: 0.003s per query, 40MB memory
B-tree: 0.008s per query, 35MB memory
External: 0.025s per query, 2MB memory (95% less)
```

### 3. Database Operations
Real SQLite database with different cache configurations:

```python
# Strategies benchmarked:
- standard: Default cache size (2000 pages)
- sqrt_n: √n cache pages
- minimal: Minimal cache (10 pages)

# Example results for n=100,000 rows:
Standard: 1000 queries in 0.45s, 16MB cache
√n cache: 1000 queries in 0.52s, 1.2MB cache
Minimal: 1000 queries in 1.83s, 0.08MB cache
```

### 4. ML Training
Neural network training with memory optimizations:

```python
# Strategies benchmarked:
- standard: Keep all activations for backprop
- gradient_checkpoint: Recompute activations (√n checkpoints)
- mixed_precision: FP16 compute, FP32 master weights

# Example results for 50,000 samples:
Standard: 2.3s, 195MB peak memory
Checkpointing: 2.8s, 42MB peak memory (78% less)
Mixed precision: 2.1s, 98MB peak memory (50% less)
```

### 5. Graph Algorithms
Graph traversal with memory constraints:

```python
# Strategies benchmarked:
- bfs: Standard breadth-first search
- dfs_iterative: Depth-first with explicit stack
- memory_bounded: Limited queue size (like IDA*)

# Example results for n=50,000 nodes:
BFS: 0.18s, 12MB memory (full frontier)
DFS: 0.15s, 4MB memory (stack only)
Bounded: 0.31s, 0.8MB memory (√n queue)
```

### 6. Matrix Operations
Cache-aware matrix multiplication:

```python
# Strategies benchmarked:
- standard: Naive multiplication
- blocked: Cache-blocked multiplication
- streaming: Row-by-row streaming

# Example results for 2000×2000 matrices:
Standard: 1.2s, 32MB memory
Blocked: 0.8s, 32MB memory (33% faster)
Streaming: 3.5s, 0.5MB memory (98% less memory)
```

## Running Benchmarks

### Command Line Options

```bash
# Run all benchmarks
python spacetime_benchmarks.py

# Quick benchmarks (subset for testing)
python spacetime_benchmarks.py --quick

# Specific suite only
python spacetime_benchmarks.py --suite sorting
python spacetime_benchmarks.py --suite database
python spacetime_benchmarks.py --suite ml

# With automatic plotting
python spacetime_benchmarks.py --plot

# Analyze previous results
python spacetime_benchmarks.py --analyze results_20240315_143022.json
```

### Programmatic Usage

```python
from spacetime_benchmarks import BenchmarkRunner, benchmark_sorting

runner = BenchmarkRunner()

# Run single benchmark
result = runner.run_benchmark(
    name="Custom Sort",
    category=BenchmarkCategory.SORTING,
    strategy="sqrt_n",
    benchmark_func=benchmark_sorting,
    data_size=1000000
)

print(f"Time: {result.time_seconds:.3f}s")
print(f"Memory: {result.memory_peak_mb:.1f}MB")
print(f"Space-Time Product: {result.space_time_product:.1f}")

# Compare strategies
comparisons = runner.compare_strategies(
    name="Sort Comparison",
    category=BenchmarkCategory.SORTING,
    benchmark_func=benchmark_sorting,
    strategies=["standard", "sqrt_n", "constant"],
    data_sizes=[10000, 100000, 1000000]
)

for comp in comparisons:
    print(f"\n{comp.baseline.strategy} vs {comp.optimized.strategy}:")
    print(f"  Memory reduction: {comp.memory_reduction:.1f}%")
    print(f"  Time overhead: {comp.time_overhead:.1f}%")
    print(f"  Recommendation: {comp.recommendation}")
```

## Custom Benchmarks

Add your own benchmarks:

```python
def benchmark_custom_algorithm(n: int, strategy: str = 'standard', **kwargs) -> int:
    """Custom algorithm with space-time tradeoffs"""
    
    if strategy == 'standard':
        # O(n) space implementation
        data = list(range(n))
        # ... algorithm ...
        return n  # Return operation count
        
    elif strategy == 'memory_efficient':
        # O(√n) space implementation
        buffer_size = int(np.sqrt(n))
        # ... algorithm ...
        return n
        
# Register and run
runner = BenchmarkRunner()
runner.compare_strategies(
    "Custom Algorithm",
    BenchmarkCategory.CUSTOM,
    benchmark_custom_algorithm,
    ["standard", "memory_efficient"],
    [1000, 10000, 100000]
)
```

## Understanding Results

### Key Metrics

1. **Time (seconds)**: Wall-clock execution time
2. **Peak Memory (MB)**: Maximum memory usage during execution
3. **Average Memory (MB)**: Average memory over execution
4. **Throughput (ops/sec)**: Operations completed per second
5. **Space-Time Product**: Memory × Time (lower is better)

### Interpreting Comparisons

```
Comparison standard vs sqrt_n:
  Memory reduction: 94.3%      # How much less memory
  Time overhead: 47.2%         # How much slower
  Space-time improvement: 91.8% # Overall efficiency gain
  Recommendation: Use sqrt_n for 94% memory savings
```

### When to Use Each Strategy

| Strategy | Use When | Avoid When |
|----------|----------|------------|
| Standard | Memory abundant, Speed critical | Memory constrained |
| √n Optimized | Memory limited, Moderate slowdown OK | Real-time systems |
| O(log n) | Extreme memory constraints | Random access needed |
| O(1) Space | Streaming data, Minimal memory | Need multiple passes |

## Benchmark Output

### Results File Format

```json
{
  "system_info": {
    "cpu_count": 8,
    "memory_gb": 32.0,
    "l3_cache_mb": 12.0
  },
  "results": [
    {
      "name": "Sorting",
      "category": "sorting",
      "strategy": "sqrt_n",
      "data_size": 1000000,
      "time_seconds": 0.187,
      "memory_peak_mb": 8.2,
      "memory_avg_mb": 6.5,
      "throughput": 5347593.5,
      "space_time_product": 1.534,
      "metadata": {
        "success": true,
        "operations": 1000000
      }
    }
  ],
  "timestamp": 1710512345.678
}
```

### Visualization

Automatic plots show:
- Time complexity curves
- Memory usage scaling
- Space-time product comparison
- Throughput vs data size

## Performance Tips

1. **System Preparation**:
   ```bash
   # Disable CPU frequency scaling
   sudo cpupower frequency-set -g performance
   
   # Clear caches
   sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
   ```

2. **Accurate Memory Measurement**:
   - Results include Python overhead
   - Use `memory_peak_mb` for maximum usage
   - `memory_avg_mb` shows typical usage

3. **Reproducibility**:
   - Run multiple times and average
   - Control background processes
   - Use consistent data sizes

## Extending the Suite

### Adding New Categories

```python
class BenchmarkCategory(Enum):
    # ... existing categories ...
    CUSTOM = "custom"

def custom_suite(runner: BenchmarkRunner):
    """Run custom benchmarks"""
    strategies = ['approach1', 'approach2']
    data_sizes = [1000, 10000, 100000]
    
    runner.compare_strategies(
        "Custom Workload",
        BenchmarkCategory.CUSTOM,
        benchmark_custom,
        strategies,
        data_sizes
    )
```

### Platform-Specific Metrics

```python
def get_cache_misses():
    """Get L3 cache misses (Linux perf)"""
    if platform.system() == 'Linux':
        # Use perf_event_open or read from perf
        pass
    return None
```

## Real-World Insights

From our benchmarks:

1. **√n strategies typically save 90-99% memory** with 20-100% time overhead

2. **Cache-aware algorithms can be faster** despite theoretical complexity

3. **Memory bandwidth often dominates** over computational complexity

4. **Optimal strategy depends on**:
   - Data size vs available memory
   - Latency requirements
   - Power/cost constraints

## Troubleshooting

### Memory Measurements Seem Low
- Python may not release memory immediately
- Use `gc.collect()` before benchmarks
- Check for lazy evaluation

### High Variance in Results
- Disable CPU throttling
- Close other applications  
- Increase data sizes for stability

### Database Benchmarks Fail
- Ensure write permissions in output directory
- Check SQLite installation
- Verify disk space available

## Contributing

Add new benchmarks following the pattern:

1. Implement `benchmark_*` function
2. Return operation count
3. Handle different strategies
4. Add suite function
5. Update documentation

## See Also

- [SpaceTimeCore](../core/spacetime_core.py): Core calculations
- [Profiler](../profiler/): Profile your applications
- [Visual Explorer](../explorer/): Visualize tradeoffs