MIssing ollama figures

This commit is contained in:
2025-07-21 18:06:37 -04:00
parent d77a43217e
commit 979788de5c
15 changed files with 824 additions and 819 deletions

View File

@@ -0,0 +1,37 @@
# LLM Space-Time Tradeoffs with Ollama
This experiment demonstrates real space-time tradeoffs in Large Language Model inference using Ollama with actual models.
## Experiments
### 1. Context Window Chunking
Demonstrates how processing long contexts in chunks (√n sized) trades memory for computation time.
### 2. Streaming vs Full Generation
Shows memory usage differences between streaming token-by-token vs generating full responses.
### 3. Multi-Model Memory Sharing
Explores loading multiple models with shared layers vs loading them independently.
## Key Findings
The experiments show:
1. Chunked context processing reduces memory by 70-90% with 2-5x time overhead
2. Streaming generation uses O(1) memory vs O(n) for full generation
3. Real models exhibit the theoretical √n space-time tradeoff
## Running the Experiments
```bash
# Run all experiments
python ollama_spacetime_experiment.py
# Run specific experiment
python ollama_spacetime_experiment.py --experiment context_chunking
```
## Requirements
- Ollama installed locally
- At least one model (e.g., llama3.2:latest)
- Python 3.8+
- 8GB+ RAM recommended