Initial

2025-07-20 03:56:21 -04:00
commit 59539f4daa
65 changed files with 6964 additions and 0 deletions
--- a/experiments/checkpointed_sorting/README.md
+++ b/experiments/checkpointed_sorting/README.md
@@ -0,0 +1,96 @@
+# Checkpointed Sorting Experiment
+
+## Overview
+This experiment demonstrates how external merge sort with limited memory exhibits the space-time tradeoff predicted by Williams' 2025 result.
+
+## Key Concepts
+
+### Standard In-Memory Sort
+- **Space**: O(n) - entire array in memory
+- **Time**: O(n log n) - optimal comparison-based sorting
+- **Example**: Python's built-in sort, quicksort
+
+### Checkpointed External Sort
+- **Space**: O(√n) - only √n elements in memory at once
+- **Time**: O(n√n) - due to disk I/O and recomputation
+- **Technique**: Sort chunks that fit in memory, merge with limited buffers
+
+### Extreme Space-Limited Sort
+- **Space**: O(log n) - minimal memory usage
+- **Time**: O(n²) - extensive recomputation required
+- **Technique**: Iterative merging with frequent checkpointing
+
+## Running the Experiments
+
+### Quick Test
+```bash
+python test_quick.py
+```
+Runs with small input sizes (100-1000) to verify correctness.
+
+### Full Experiment
+```bash
+python run_final_experiment.py
+```
+Runs complete experiment with:
+- Input sizes: 1000, 2000, 5000, 10000, 20000
+- 10 trials per size for statistical significance
+- RAM disk comparison to isolate I/O overhead
+- Generates publication-quality plots
+
+### Rigorous Analysis
+```bash
+python rigorous_experiment.py
+```
+Comprehensive experiment with:
+- 20 trials per size
+- Detailed memory profiling
+- Environment logging
+- Statistical analysis with confidence intervals
+
+## Actual Results (Apple M3 Max, 64GB RAM)
+
+| Input Size | In-Memory Time | Checkpointed Time | Slowdown | Memory Reduction |
+|------------|----------------|-------------------|----------|------------------|
+| 1,000      | 0.022 ms       | 8.2 ms           | 375×     | 0.1× (overhead)  |
+| 5,000      | 0.045 ms       | 23.4 ms          | 516×     | 0.2×             |
+| 10,000     | 0.091 ms       | 40.5 ms          | 444×     | 0.2×             |
+| 20,000     | 0.191 ms       | 71.4 ms          | 375×     | 0.2×             |
+
+Note: Memory shows algorithmic overhead due to Python's memory management.
+
+## Key Findings
+
+1. **Massive Constant Factors**: 375-627× slowdown instead of theoretical √n
+2. **I/O Not Dominant**: Fast NVMe SSDs show only 1.0-1.1× I/O overhead
+3. **Scaling Confirmed**: Power law fits show n^1.0 for in-memory, n^1.4 for checkpointed
+
+## Real-World Applications
+
+- **Database Systems**: External sorting for large datasets
+- **MapReduce**: Shuffle phase with limited memory
+- **Video Processing**: Frame-by-frame processing with checkpoints
+- **Scientific Computing**: Out-of-core algorithms
+
+## Visualization
+
+The experiment generates:
+1. `paper_sorting_figure.png` - Clean figure for publication
+2. `rigorous_sorting_analysis.png` - Detailed analysis with error bars
+3. `memory_usage_analysis.png` - Memory scaling comparison
+4. `experiment_environment.json` - Hardware/software configuration
+5. `final_experiment_results.json` - Raw experimental data
+
+## Dependencies
+
+```bash
+pip install numpy scipy matplotlib psutil
+```
+
+## Reproducing Results
+
+To reproduce our results exactly:
+1. Ensure CPU frequency scaling is disabled
+2. Close all other applications
+3. Run on a machine with fast SSD (>3GB/s read)
+4. Use Python 3.10+ with NumPy 2.0+