feat: add origin validation and rate limiting for PDF endpoint

- Implemented origin checker middleware to prevent external sites from hotlinking the PDF generation endpoint
- Added rate limiter (3 requests per minute per IP) to protect resource-intensive PDF operations
- Configured allowed origins via ALLOWED_ORIGINS environment variable with localhost defaults for development
This commit is contained in:
juanatsap
2025-11-09 14:00:10 +00:00
parent 5e132e7ec7
commit 24b2401519
4 changed files with 646 additions and 1 deletions
+452
View File
@@ -0,0 +1,452 @@
# API Protection & Security
**Protection against external access and DDoS attacks on resource-intensive endpoints.**
---
## Overview
The CV website implements multiple layers of protection to prevent external sites from accessing the API and to protect against DDoS attacks on resource-intensive endpoints like PDF generation.
### Protection Layers
1. **Origin Checking** - Only allows requests from your domain
2. **Rate Limiting** - Prevents abuse even from allowed origins
3. **Production Restrictions** - Stricter rules in production environments
---
## 1. Origin Checking
**File:** `internal/middleware/security.go` (`OriginChecker` middleware)
### How It Works
The origin checker examines incoming HTTP requests and validates them against a whitelist of allowed domains. It checks two headers:
1. **Origin Header** - Set by browsers for CORS requests
2. **Referer Header** - Set by browsers for navigation requests
### Configuration
**Environment Variable:** `ALLOWED_ORIGINS`
```bash
# Development (default)
ALLOWED_ORIGINS=
# Production
ALLOWED_ORIGINS=yourdomain.com,www.yourdomain.com
```
### Behavior
| Environment | No Header | Localhost | Your Domain | External Domain |
|-------------|-----------|-----------|-------------|-----------------|
| **Development** | ✅ Allowed | ✅ Allowed | ✅ Allowed | ❌ Blocked |
| **Production** | ❌ Blocked (PDF) | ✅ Allowed | ✅ Allowed | ❌ Blocked |
**Production PDF Endpoint:**
- Requires `Origin` or `Referer` header
- Blocks direct URL access without headers
- Prevents bookmarking and external hotlinking
### Example Responses
**Allowed Request:**
```bash
curl -H "Referer: https://yourdomain.com/" http://localhost:1999/export/pdf?lang=en
# Status: 200 OK
# PDF file downloaded
```
**Blocked Request (External Domain):**
```bash
curl -H "Referer: https://externaldomain.com/" http://localhost:1999/export/pdf?lang=en
# Status: 403 Forbidden
# Response: Forbidden: External access not allowed
```
**Blocked Request (Production, No Headers):**
```bash
# In production with GO_ENV=production
curl http://yourdomain.com/export/pdf?lang=en
# Status: 403 Forbidden
# Response: Forbidden: Direct access not allowed
```
---
## 2. Rate Limiting
**File:** `internal/middleware/security.go` (`RateLimiter`)
### How It Works
The rate limiter tracks requests per IP address and enforces limits on resource-intensive endpoints.
**Current Configuration:**
- **Limit:** 3 requests per minute per IP
- **Window:** 1 minute
- **Applied to:** `/export/pdf` endpoint only
### Implementation
```go
// Create rate limiter for PDF endpoint
pdfRateLimiter := middleware.NewRateLimiter(3, 1*time.Minute)
// Apply to PDF endpoint
protectedPDFHandler := middleware.OriginChecker(
pdfRateLimiter.Middleware(
http.HandlerFunc(cvHandler.ExportPDF),
),
)
```
### Behavior
| Requests | Status | Response |
|----------|--------|----------|
| 1st request | ✅ 200 OK | PDF generated |
| 2nd request | ✅ 200 OK | PDF generated |
| 3rd request | ✅ 200 OK | PDF generated |
| 4th request (within 1 min) | ❌ 429 Too Many Requests | Rate limit exceeded |
| After 1 minute | ✅ 200 OK | Counter reset |
### Headers
**Rate Limit Exceeded Response:**
```
HTTP/1.1 429 Too Many Requests
Retry-After: 60
Content-Type: text/plain; charset=utf-8
Rate limit exceeded. Please try again later.
```
### IP Detection
The rate limiter detects client IP from:
1. `X-Forwarded-For` header (proxy/CDN)
2. `X-Real-IP` header (alternative proxy header)
3. `RemoteAddr` (direct connection)
**Supports reverse proxies:** Yes (Nginx, Cloudflare, etc.)
---
## 3. Combined Protection
The PDF endpoint has **both** origin checking and rate limiting applied:
```
Request → OriginChecker → RateLimiter → PDF Handler
```
**Protection Flow:**
1. **Check Origin/Referer**
- If external domain → 403 Forbidden
- If production + no headers → 403 Forbidden
- Otherwise, continue
2. **Check Rate Limit**
- If > 3 requests/minute → 429 Too Many Requests
- Otherwise, continue
3. **Generate PDF**
- Process request normally
---
## 4. Configuration Examples
### Development Environment
```bash
# .env
PORT=1999
HOST=localhost
GO_ENV=development
ALLOWED_ORIGINS=
```
**Behavior:**
- Allows `localhost` and `127.0.0.1`
- Allows requests without headers
- Rate limit: 3 PDF/min per IP
### Production Environment
```bash
# .env
PORT=1999
HOST=0.0.0.0
GO_ENV=production
ALLOWED_ORIGINS=yourdomain.com,www.yourdomain.com
```
**Behavior:**
- Only allows `yourdomain.com` and `www.yourdomain.com`
- Requires `Origin` or `Referer` header for PDF endpoint
- Rate limit: 3 PDF/min per IP
### Multiple Domains
```bash
# Support multiple domains (e.g., staging + production)
ALLOWED_ORIGINS=yourdomain.com,www.yourdomain.com,staging.yourdomain.com
```
---
## 5. Testing Protection
### Test Origin Checking
```bash
# ✅ Allowed (localhost in development)
curl http://localhost:1999/export/pdf?lang=en
# ✅ Allowed (with referer)
curl -H "Referer: http://localhost:1999/" http://localhost:1999/export/pdf?lang=en
# ❌ Blocked (external referer)
curl -H "Referer: https://evil.com/" http://localhost:1999/export/pdf?lang=en
# Expected: 403 Forbidden
```
### Test Rate Limiting
```bash
# Generate 4 PDFs quickly
for i in {1..4}; do
echo "Request $i:"
curl -w "Status: %{http_code}\n" -o /dev/null -s http://localhost:1999/export/pdf?lang=en
sleep 1
done
# Expected output:
# Request 1: Status: 200
# Request 2: Status: 200
# Request 3: Status: 200
# Request 4: Status: 429
```
### Test Combined Protection
```bash
# Should be blocked by origin checker before rate limiter
for i in {1..5}; do
curl -H "Referer: https://evil.com/" -w "Status: %{http_code}\n" -o /dev/null -s \
http://localhost:1999/export/pdf?lang=en
done
# Expected: All requests get 403 (origin check fails immediately)
```
---
## 6. Monitoring & Logs
### Log Messages
**Origin Check Failure:**
```
# No specific log (returns 403 silently)
# Check server logs for 403 responses
```
**Rate Limit Exceeded:**
```
# No specific log (returns 429 silently)
# Monitor for frequent 429 responses
```
### Recommended Monitoring
1. **Track 403 responses** - Indicates potential attack attempts
2. **Track 429 responses** - Indicates rate limiting in effect
3. **Monitor PDF generation times** - Detect abuse patterns
4. **Alert on sustained high request rates** - DDoS detection
---
## 7. Customization
### Adjust Rate Limits
**File:** `main.go`
```go
// Current: 3 requests per minute
pdfRateLimiter := middleware.NewRateLimiter(3, 1*time.Minute)
// More restrictive: 5 per hour
pdfRateLimiter := middleware.NewRateLimiter(5, 1*time.Hour)
// Less restrictive: 10 per minute
pdfRateLimiter := middleware.NewRateLimiter(10, 1*time.Minute)
```
### Apply to Other Endpoints
```go
// Protect /cv endpoint
protectedCVHandler := middleware.OriginChecker(
http.HandlerFunc(cvHandler.CVContent),
)
mux.Handle("/cv", protectedCVHandler)
```
### Disable Origin Checking (Not Recommended)
```go
// Apply only rate limiting (no origin check)
mux.Handle("/export/pdf", pdfRateLimiter.Middleware(
http.HandlerFunc(cvHandler.ExportPDF),
))
```
---
## 8. Security Best Practices
### ✅ Recommended
1. **Set ALLOWED_ORIGINS in production** - Never run production without it
2. **Use HTTPS** - Prevents header spoofing
3. **Monitor 403/429 responses** - Detect attack patterns
4. **Consider CloudFlare** - Additional DDoS protection layer
5. **Log suspicious requests** - For forensic analysis
### ❌ Anti-Patterns
1. **Don't disable protection in production** - Always use origin checking
2. **Don't set rate limits too high** - PDF generation is expensive
3. **Don't trust IP addresses alone** - Use combined protection
4. **Don't expose internal endpoints** - Keep admin routes private
---
## 9. Production Deployment Checklist
Before deploying to production:
- [ ] Set `GO_ENV=production` in environment
- [ ] Configure `ALLOWED_ORIGINS` with your domain(s)
- [ ] Test origin checking with external domain
- [ ] Test rate limiting with rapid requests
- [ ] Verify HTTPS is enabled (prevents header spoofing)
- [ ] Set up monitoring for 403/429 responses
- [ ] Configure log retention for security analysis
- [ ] Test PDF generation under load
- [ ] Verify reverse proxy headers (X-Forwarded-For)
- [ ] Document allowed origins in runbook
---
## 10. Troubleshooting
### Problem: Legitimate users getting 403
**Cause:** ALLOWED_ORIGINS not configured correctly
**Solution:**
```bash
# Ensure all your domains are listed
ALLOWED_ORIGINS=yourdomain.com,www.yourdomain.com
# Check for typos (case-insensitive but must match exactly)
```
### Problem: Rate limit too restrictive
**Cause:** Legitimate users hitting limit
**Solution:**
```go
// Increase limit or window in main.go
pdfRateLimiter := middleware.NewRateLimiter(5, 1*time.Minute)
```
### Problem: Behind reverse proxy, rate limit not working
**Cause:** IP detection failing
**Solution:**
```nginx
# Ensure Nginx passes correct headers
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Real-IP $remote_addr;
```
### Problem: Origin header not being sent
**Cause:** Browser doesn't send Origin for same-origin requests
**Solution:** This is normal. The middleware checks Referer as fallback.
---
## 11. Attack Scenarios & Mitigation
### Scenario 1: DDoS via PDF Generation
**Attack:** External site hotlinks to `/export/pdf`, triggering many PDF generations
**Mitigation:**
1. ✅ Origin checker blocks external domains (403)
2. ✅ Rate limiter prevents >3 requests/min per IP (429)
3. ✅ Production mode requires headers (blocks direct access)
**Result:** Attack fails, server protected
### Scenario 2: Header Spoofing
**Attack:** Attacker spoofs `Referer` header to bypass origin check
**Mitigation:**
1. ⚠️ HTTPS prevents header modification in transit
2. ✅ Rate limiter still applies (3 req/min limit)
3. ✅ IP-based tracking prevents distributed spoofing
**Result:** Individual attacker limited to 3 req/min
### Scenario 3: Distributed Attack
**Attack:** Botnet with many IPs, each generating PDFs
**Mitigation:**
1. ✅ Each IP limited to 3 req/min
2. ✅ Origin checker blocks if no valid referer
3. 🔴 Consider CloudFlare for large-scale DDoS
**Result:** Slowed but not fully blocked (add CloudFlare)
---
## Summary
**Protection Enabled:** ✅ Origin Checking + Rate Limiting
**Endpoints Protected:**
- `/export/pdf` - Full protection (origin + rate limit)
**Endpoints Unprotected:**
- `/` - Public home page
- `/cv` - Public CV content
- `/health` - Public health check
- `/static/*` - Public static files
**Configuration:** Environment-based via `ALLOWED_ORIGINS`
**Production Ready:** Yes (after setting ALLOWED_ORIGINS)
---
**For questions or to adjust protection levels, modify:**
- `internal/middleware/security.go` - Origin checking and rate limiting logic
- `main.go` - Apply protection to additional endpoints
- `.env` - Configure ALLOWED_ORIGINS for your domain