Initial push

2025-07-20 03:41:39 -04:00
commit d315f5d26e
118 changed files with 25819 additions and 0 deletions
--- a/samples/SampleWebApi/README.md
+++ b/samples/SampleWebApi/README.md
@@ -0,0 +1,190 @@
+# SqrtSpace SpaceTime Sample Web API
+
+This sample demonstrates how to build a memory-efficient Web API using the SqrtSpace SpaceTime library. It showcases real-world scenarios where √n space-time tradeoffs can significantly improve application performance and scalability.
+
+## Features Demonstrated
+
+### 1. **Memory-Efficient Data Processing**
+- Streaming large datasets without loading everything into memory
+- Automatic batching using √n-sized chunks
+- External sorting and aggregation for datasets that exceed memory limits
+
+### 2. **Checkpoint-Enabled Operations**
+- Resumable bulk operations that can recover from failures
+- Progress tracking for long-running tasks
+- Automatic state persistence at optimal intervals
+
+### 3. **Real-World API Patterns**
+
+#### Products Controller (`/api/products`)
+- **Paginated queries** - Basic memory control through pagination
+- **Streaming endpoints** - Stream millions of products using NDJSON format
+- **Smart search** - Automatically switches to external sorting for large result sets
+- **Bulk updates** - Checkpoint-enabled price updates that can resume after failures
+- **CSV export** - Stream large exports without memory bloat
+- **Statistics** - Calculate aggregates over large datasets efficiently
+
+#### Analytics Controller (`/api/analytics`)
+- **Revenue analysis** - External grouping for large-scale aggregations
+- **Top customers** - Find top N using external sorting when needed
+- **Real-time streaming** - Server-Sent Events for continuous analytics
+- **Complex reports** - Multi-stage report generation with checkpointing
+- **Pattern analysis** - ML-ready data processing with memory constraints
+- **Memory monitoring** - Track how the system manages memory
+
+### 4. **Automatic Memory Management**
+- Adapts processing strategy based on data size
+- Spills to disk when memory pressure is detected
+- Provides memory usage statistics for monitoring
+
+## Running the Sample
+
+1. **Start the API:**
+   ```bash
+   dotnet run
+   ```
+
+2. **Access Swagger UI:**
+   Navigate to `https://localhost:5001/swagger` to explore the API
+
+3. **Generate Test Data:**
+   The application automatically seeds the database with:
+   - 1,000 customers
+   - 10,000 products
+   - 50,000 orders
+   
+   A background service continuously generates new orders to simulate real-time data.
+
+## Key Scenarios to Try
+
+### 1. Stream Large Dataset
+```bash
+# Stream all products (10,000+) without loading into memory
+curl -N https://localhost:5001/api/products/stream
+
+# The response is newline-delimited JSON (NDJSON)
+```
+
+### 2. Bulk Update with Checkpointing
+```bash
+# Start a bulk price update
+curl -X POST https://localhost:5001/api/products/bulk-update-prices \
+  -H "Content-Type: application/json" \
+  -H "X-Operation-Id: price-update-123" \
+  -d '{"categoryFilter": "Electronics", "priceMultiplier": 1.1}'
+
+# If it fails, resume with the same Operation ID
+```
+
+### 3. Generate Complex Report
+```bash
+# Generate a report with automatic checkpointing
+curl -X POST https://localhost:5001/api/analytics/reports/generate \
+  -H "Content-Type: application/json" \
+  -d '{
+    "startDate": "2024-01-01",
+    "endDate": "2024-12-31",
+    "metricsToInclude": ["revenue", "categories", "customers", "products"],
+    "includeDetailedBreakdown": true
+  }'
+```
+
+### 4. Real-Time Analytics Stream
+```bash
+# Connect to real-time analytics stream
+curl -N https://localhost:5001/api/analytics/real-time/orders
+
+# Streams analytics data every second using Server-Sent Events
+```
+
+### 5. Export Large Dataset
+```bash
+# Export all products to CSV (streams the file)
+curl https://localhost:5001/api/products/export/csv > products.csv
+```
+
+## Memory Efficiency Examples
+
+### Small Dataset (In-Memory Processing)
+When working with small datasets (<10,000 items), the API uses standard in-memory processing:
+```csharp
+// Standard LINQ operations
+var results = await query
+    .Where(p => p.Category == "Books")
+    .OrderBy(p => p.Price)
+    .ToListAsync();
+```
+
+### Large Dataset (External Processing)
+For large datasets (>10,000 items), the API automatically switches to external processing:
+```csharp
+// Automatic external sorting
+if (count > 10000)
+{
+    query = query.UseExternalSorting();
+}
+
+// Process in √n-sized batches
+await foreach (var batch in query.BatchBySqrtNAsync())
+{
+    // Process batch
+}
+```
+
+## Configuration
+
+The sample includes configurable memory limits:
+
+```csharp
+// appsettings.json
+{
+  "MemoryOptions": {
+    "MaxMemoryMB": 512,
+    "WarningThresholdPercent": 80
+  }
+}
+```
+
+## Monitoring
+
+Check memory usage statistics:
+```bash
+curl https://localhost:5001/api/analytics/memory-stats
+```
+
+Response:
+```json
+{
+  "currentMemoryUsageMB": 245,
+  "peakMemoryUsageMB": 412,
+  "externalSortOperations": 3,
+  "checkpointsSaved": 15,
+  "dataSpilledToDiskMB": 89,
+  "cacheHitRate": 0.87,
+  "currentMemoryPressure": "Medium"
+}
+```
+
+## Architecture Highlights
+
+1. **Service Layer**: Encapsulates business logic and SpaceTime optimizations
+2. **Entity Framework Integration**: Seamless integration with EF Core queries
+3. **Middleware**: Automatic checkpoint and streaming support
+4. **Background Services**: Continuous data generation for testing
+5. **Memory Monitoring**: Real-time tracking of memory usage
+
+## Best Practices Demonstrated
+
+1. **Know Your Data Size**: Check count before choosing processing strategy
+2. **Stream When Possible**: Use IAsyncEnumerable for large results
+3. **Checkpoint Long Operations**: Enable recovery from failures
+4. **Monitor Memory Usage**: Track and respond to memory pressure
+5. **Use External Processing**: Let the library handle large datasets efficiently
+
+## Next Steps
+
+- Modify the memory limits and observe behavior changes
+- Add your own endpoints using SpaceTime patterns
+- Connect to a real database for production scenarios
+- Implement caching with hot/cold storage tiers
+- Add distributed processing with Redis coordination