MIssing ollama figures

2025-07-21 18:06:37 -04:00
parent d77a43217e
commit 979788de5c
15 changed files with 824 additions and 819 deletions
--- a/experiments/llm_ollama/README.md
+++ b/experiments/llm_ollama/README.md
@@ -0,0 +1,37 @@
+# LLM Space-Time Tradeoffs with Ollama
+
+This experiment demonstrates real space-time tradeoffs in Large Language Model inference using Ollama with actual models.
+
+## Experiments
+
+### 1. Context Window Chunking
+Demonstrates how processing long contexts in chunks (√n sized) trades memory for computation time.
+
+### 2. Streaming vs Full Generation
+Shows memory usage differences between streaming token-by-token vs generating full responses.
+
+### 3. Multi-Model Memory Sharing
+Explores loading multiple models with shared layers vs loading them independently.
+
+## Key Findings
+
+The experiments show:
+1. Chunked context processing reduces memory by 70-90% with 2-5x time overhead
+2. Streaming generation uses O(1) memory vs O(n) for full generation
+3. Real models exhibit the theoretical √n space-time tradeoff
+
+## Running the Experiments
+
+```bash
+# Run all experiments
+python ollama_spacetime_experiment.py
+
+# Run specific experiment
+python ollama_spacetime_experiment.py --experiment context_chunking
+```
+
+## Requirements
+- Ollama installed locally
+- At least one model (e.g., llama3.2:latest)
+- Python 3.8+
+- 8GB+ RAM recommended