Initial push
This commit is contained in:
190
samples/SampleWebApi/README.md
Normal file
190
samples/SampleWebApi/README.md
Normal file
@@ -0,0 +1,190 @@
|
||||
# SqrtSpace SpaceTime Sample Web API
|
||||
|
||||
This sample demonstrates how to build a memory-efficient Web API using the SqrtSpace SpaceTime library. It showcases real-world scenarios where √n space-time tradeoffs can significantly improve application performance and scalability.
|
||||
|
||||
## Features Demonstrated
|
||||
|
||||
### 1. **Memory-Efficient Data Processing**
|
||||
- Streaming large datasets without loading everything into memory
|
||||
- Automatic batching using √n-sized chunks
|
||||
- External sorting and aggregation for datasets that exceed memory limits
|
||||
|
||||
### 2. **Checkpoint-Enabled Operations**
|
||||
- Resumable bulk operations that can recover from failures
|
||||
- Progress tracking for long-running tasks
|
||||
- Automatic state persistence at optimal intervals
|
||||
|
||||
### 3. **Real-World API Patterns**
|
||||
|
||||
#### Products Controller (`/api/products`)
|
||||
- **Paginated queries** - Basic memory control through pagination
|
||||
- **Streaming endpoints** - Stream millions of products using NDJSON format
|
||||
- **Smart search** - Automatically switches to external sorting for large result sets
|
||||
- **Bulk updates** - Checkpoint-enabled price updates that can resume after failures
|
||||
- **CSV export** - Stream large exports without memory bloat
|
||||
- **Statistics** - Calculate aggregates over large datasets efficiently
|
||||
|
||||
#### Analytics Controller (`/api/analytics`)
|
||||
- **Revenue analysis** - External grouping for large-scale aggregations
|
||||
- **Top customers** - Find top N using external sorting when needed
|
||||
- **Real-time streaming** - Server-Sent Events for continuous analytics
|
||||
- **Complex reports** - Multi-stage report generation with checkpointing
|
||||
- **Pattern analysis** - ML-ready data processing with memory constraints
|
||||
- **Memory monitoring** - Track how the system manages memory
|
||||
|
||||
### 4. **Automatic Memory Management**
|
||||
- Adapts processing strategy based on data size
|
||||
- Spills to disk when memory pressure is detected
|
||||
- Provides memory usage statistics for monitoring
|
||||
|
||||
## Running the Sample
|
||||
|
||||
1. **Start the API:**
|
||||
```bash
|
||||
dotnet run
|
||||
```
|
||||
|
||||
2. **Access Swagger UI:**
|
||||
Navigate to `https://localhost:5001/swagger` to explore the API
|
||||
|
||||
3. **Generate Test Data:**
|
||||
The application automatically seeds the database with:
|
||||
- 1,000 customers
|
||||
- 10,000 products
|
||||
- 50,000 orders
|
||||
|
||||
A background service continuously generates new orders to simulate real-time data.
|
||||
|
||||
## Key Scenarios to Try
|
||||
|
||||
### 1. Stream Large Dataset
|
||||
```bash
|
||||
# Stream all products (10,000+) without loading into memory
|
||||
curl -N https://localhost:5001/api/products/stream
|
||||
|
||||
# The response is newline-delimited JSON (NDJSON)
|
||||
```
|
||||
|
||||
### 2. Bulk Update with Checkpointing
|
||||
```bash
|
||||
# Start a bulk price update
|
||||
curl -X POST https://localhost:5001/api/products/bulk-update-prices \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "X-Operation-Id: price-update-123" \
|
||||
-d '{"categoryFilter": "Electronics", "priceMultiplier": 1.1}'
|
||||
|
||||
# If it fails, resume with the same Operation ID
|
||||
```
|
||||
|
||||
### 3. Generate Complex Report
|
||||
```bash
|
||||
# Generate a report with automatic checkpointing
|
||||
curl -X POST https://localhost:5001/api/analytics/reports/generate \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"startDate": "2024-01-01",
|
||||
"endDate": "2024-12-31",
|
||||
"metricsToInclude": ["revenue", "categories", "customers", "products"],
|
||||
"includeDetailedBreakdown": true
|
||||
}'
|
||||
```
|
||||
|
||||
### 4. Real-Time Analytics Stream
|
||||
```bash
|
||||
# Connect to real-time analytics stream
|
||||
curl -N https://localhost:5001/api/analytics/real-time/orders
|
||||
|
||||
# Streams analytics data every second using Server-Sent Events
|
||||
```
|
||||
|
||||
### 5. Export Large Dataset
|
||||
```bash
|
||||
# Export all products to CSV (streams the file)
|
||||
curl https://localhost:5001/api/products/export/csv > products.csv
|
||||
```
|
||||
|
||||
## Memory Efficiency Examples
|
||||
|
||||
### Small Dataset (In-Memory Processing)
|
||||
When working with small datasets (<10,000 items), the API uses standard in-memory processing:
|
||||
```csharp
|
||||
// Standard LINQ operations
|
||||
var results = await query
|
||||
.Where(p => p.Category == "Books")
|
||||
.OrderBy(p => p.Price)
|
||||
.ToListAsync();
|
||||
```
|
||||
|
||||
### Large Dataset (External Processing)
|
||||
For large datasets (>10,000 items), the API automatically switches to external processing:
|
||||
```csharp
|
||||
// Automatic external sorting
|
||||
if (count > 10000)
|
||||
{
|
||||
query = query.UseExternalSorting();
|
||||
}
|
||||
|
||||
// Process in √n-sized batches
|
||||
await foreach (var batch in query.BatchBySqrtNAsync())
|
||||
{
|
||||
// Process batch
|
||||
}
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
The sample includes configurable memory limits:
|
||||
|
||||
```csharp
|
||||
// appsettings.json
|
||||
{
|
||||
"MemoryOptions": {
|
||||
"MaxMemoryMB": 512,
|
||||
"WarningThresholdPercent": 80
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Monitoring
|
||||
|
||||
Check memory usage statistics:
|
||||
```bash
|
||||
curl https://localhost:5001/api/analytics/memory-stats
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"currentMemoryUsageMB": 245,
|
||||
"peakMemoryUsageMB": 412,
|
||||
"externalSortOperations": 3,
|
||||
"checkpointsSaved": 15,
|
||||
"dataSpilledToDiskMB": 89,
|
||||
"cacheHitRate": 0.87,
|
||||
"currentMemoryPressure": "Medium"
|
||||
}
|
||||
```
|
||||
|
||||
## Architecture Highlights
|
||||
|
||||
1. **Service Layer**: Encapsulates business logic and SpaceTime optimizations
|
||||
2. **Entity Framework Integration**: Seamless integration with EF Core queries
|
||||
3. **Middleware**: Automatic checkpoint and streaming support
|
||||
4. **Background Services**: Continuous data generation for testing
|
||||
5. **Memory Monitoring**: Real-time tracking of memory usage
|
||||
|
||||
## Best Practices Demonstrated
|
||||
|
||||
1. **Know Your Data Size**: Check count before choosing processing strategy
|
||||
2. **Stream When Possible**: Use IAsyncEnumerable for large results
|
||||
3. **Checkpoint Long Operations**: Enable recovery from failures
|
||||
4. **Monitor Memory Usage**: Track and respond to memory pressure
|
||||
5. **Use External Processing**: Let the library handle large datasets efficiently
|
||||
|
||||
## Next Steps
|
||||
|
||||
- Modify the memory limits and observe behavior changes
|
||||
- Add your own endpoints using SpaceTime patterns
|
||||
- Connect to a real database for production scenarios
|
||||
- Implement caching with hot/cold storage tiers
|
||||
- Add distributed processing with Redis coordination
|
||||
Reference in New Issue
Block a user