From d51e1f45204b0cc4c9e162dc5eeaca539da2e21c Mon Sep 17 00:00:00 2001 From: juanatsap Date: Sat, 6 Dec 2025 15:22:13 +0000 Subject: [PATCH] chore: remove duplicate docs from validation folder Documentation lives in docs/go-validation-system.md --- internal/validation/PERFORMANCE.md | 184 --------------- internal/validation/README.md | 348 ----------------------------- 2 files changed, 532 deletions(-) delete mode 100644 internal/validation/PERFORMANCE.md delete mode 100644 internal/validation/README.md diff --git a/internal/validation/PERFORMANCE.md b/internal/validation/PERFORMANCE.md deleted file mode 100644 index ef81ecc..0000000 --- a/internal/validation/PERFORMANCE.md +++ /dev/null @@ -1,184 +0,0 @@ -# Validation Performance Comparison - -## Benchmark Results (Apple M3 Pro, 12 cores) - -### Direct Comparison - -| Metric | Manual V1 | Struct Tag V2 | Improvement | -|--------|-----------|---------------|-------------| -| **Time/op** | 151.8 µs | 25.2 µs | **6.0x faster** ✓ | -| **Memory/op** | 283.5 KB | 90.0 KB | **68% less memory** ✓ | -| **Allocs/op** | 653 | 477 | **27% fewer allocations** ✓ | - -### Detailed Breakdown - -``` -BenchmarkValidateContactForm-12 8,001 151,817 ns/op 283,545 B/op 653 allocs/op -BenchmarkValidatorV2_GlobalValidator-12 48,710 25,199 ns/op 89,968 B/op 477 allocs/op -``` - -## Performance Characteristics - -### V1 Manual Validation -- **Approach**: Procedural validation with repeated regex compilation -- **Time**: 151.8 µs per validation -- **Memory**: 283.5 KB per validation -- **Allocations**: 653 per validation -- **Issues**: - - Regex patterns compiled on every validation - - No caching of validation logic - - High memory overhead - -### V2 Struct Tag Validation -- **Approach**: Reflection with metadata caching and pre-compiled patterns -- **Time**: 25.2 µs per validation (cached) -- **Memory**: 90.0 KB per validation -- **Allocations**: 477 per validation -- **Optimizations**: - - ✅ Struct metadata cached (first: ~500ns, subsequent: ~50ns) - - ✅ Regex patterns pre-compiled at init - - ✅ UTF-8 aware with `utf8.RuneCountInString()` - - ✅ Thread-safe caching with `sync.Map` - -## Performance by Operation - -| Operation | Time/op | Memory/op | Notes | -|-----------|---------|-----------|-------| -| Email validation (IsValidEmail) | 20.4 µs | 89.8 KB | Regex compilation overhead | -| Injection check (ContainsEmailInjection) | 150.9 ns | 32 B | Very fast string matching | -| Full form validation (V1) | 151.8 µs | 283.5 KB | Baseline | -| First validation (V2, no cache) | 25.4 µs | 92.6 KB | ~6x faster than V1 | -| Cached validation (V2) | 23.3 µs | 89.9 KB | ~6.5x faster than V1 | -| Global validator (V2, recommended) | 25.2 µs | 90.0 KB | ~6x faster than V1 | - -## Scalability - -### Throughput Comparison - -**V1 Manual Validation:** -- 8,001 operations/sec -- ~6,588 validations/second (total) - -**V2 Struct Tag Validation:** -- 48,710 operations/sec -- ~39,682 validations/second (total) - -**Result: 6.0x higher throughput** ✓ - -## Memory Efficiency - -### Allocation Breakdown - -**V1 (653 allocations):** -- Field validation: ~100 allocs -- Regex compilation: ~400 allocs (major overhead) -- String operations: ~153 allocs - -**V2 (477 allocations):** -- Reflection: ~50 allocs (one-time per struct type) -- Validation logic: ~350 allocs -- String operations: ~77 allocs - -**Savings: 176 fewer allocations per validation** ✓ - -## Real-World Impact - -### Scenario: High-Traffic Contact Form -- **Traffic**: 1,000 form submissions/hour -- **Peak**: 100 concurrent validations - -**V1 Manual Validation:** -- Total time: 151.8 ms/validation × 1,000 = 151,800 ms (~2.5 minutes) -- Peak memory: 283.5 KB × 100 = 28.3 MB - -**V2 Struct Tag Validation:** -- Total time: 25.2 ms/validation × 1,000 = 25,200 ms (~25 seconds) -- Peak memory: 90.0 KB × 100 = 9.0 MB - -**Savings:** -- ⏱️ Time saved: **126.6 seconds per 1,000 validations** -- 💾 Memory saved: **19.3 MB peak memory** -- 🔥 CPU saved: **83% reduction** in CPU time - -## Cache Performance - -### First Validation (Cold Cache) -``` -BenchmarkValidatorV2_FirstValidation-12 47,972 25,356 ns/op -``` -- Includes reflection struct parsing -- Still 6x faster than manual validation - -### Subsequent Validations (Warm Cache) -``` -BenchmarkValidatorV2_CachedValidation-12 53,360 23,283 ns/op -``` -- Uses cached struct metadata -- Cache lookup: ~500ns overhead -- 6.5x faster than manual validation - -### Cache Hit Rate -- First validation per struct type: Cache miss (~25.4 µs) -- Subsequent validations: Cache hit (~23.3 µs) -- Performance gain: ~8% faster with warm cache - -## Optimization Techniques Used - -1. **Struct Metadata Caching** - ```go - type Validator struct { - cache sync.Map // map[reflect.Type]*structMeta - } - ``` - - Cache struct metadata on first validation - - Subsequent validations reuse metadata (~500ns lookup) - -2. **Pre-compiled Regex Patterns** - ```go - func init() { - namePattern = regexp.MustCompile(`^[\p{L}\s'-]+$`) - subjectPattern = regexp.MustCompile(`^[\p{L}\p{N}\s.,!?'"()\-:;#]+$`) - companyPattern = regexp.MustCompile(`^[\p{L}\p{N}\s.,&'()\-]+$`) - } - ``` - - Patterns compiled once at startup - - Eliminates ~400 allocations per validation - -3. **UTF-8 Aware Length Validation** - ```go - runeCount := utf8.RuneCountInString(value) - ``` - - Correct handling of international characters - - Faster than `len([]rune(value))` - -4. **Thread-Safe Caching** - ```go - v.cache.Store(t, meta) // sync.Map for lock-free reads - ``` - - Lock-free reads for cached metadata - - Concurrent validations don't block - -## Comparison with Popular Libraries - -| Library | Time/op | Memory/op | Dependencies | Features | -|---------|---------|-----------|--------------|----------| -| **Our V2** | **25.2 µs** | **90.0 KB** | **0 (stdlib only)** | ✅ Custom rules, security | -| go-playground/validator | ~30 µs | ~100 KB | 5+ dependencies | ✅ Full featured | -| asaskevich/govalidator | ~80 µs | ~150 KB | 2 dependencies | ⚠️ Less flexible | -| Our V1 | 151.8 µs | 283.5 KB | 0 | ⚠️ Manual code | - -**Our V2 is competitive with industry-standard libraries while maintaining zero dependencies!** - -## Conclusion - -The struct tag validation system delivers: - -✅ **6.0x faster** validation (151.8 µs → 25.2 µs) -✅ **68% less memory** per validation (283.5 KB → 90.0 KB) -✅ **27% fewer allocations** (653 → 477) -✅ **6.0x higher throughput** (8K ops/sec → 48K ops/sec) -✅ **Zero dependencies** - pure Go stdlib -✅ **Thread-safe** with lock-free cached reads -✅ **Production-ready** with 81.3% test coverage - -The performance gains make this system ideal for high-traffic production environments while maintaining code clarity and security. diff --git a/internal/validation/README.md b/internal/validation/README.md deleted file mode 100644 index b410754..0000000 --- a/internal/validation/README.md +++ /dev/null @@ -1,348 +0,0 @@ -# Validation Package - -High-performance struct tag-based validation system with zero external dependencies. - -## Features - -- **Struct Tag Validation**: Declarative validation using Go struct tags -- **High Performance**: Reflection caching achieves ~260µs per validation (12x faster than manual validation) -- **Zero Dependencies**: Pure Go stdlib implementation -- **UTF-8 Aware**: Proper Unicode handling for international names -- **Security First**: Email injection prevention, honeypot detection, timing validation -- **Thread-Safe**: Safe for concurrent use -- **Backward Compatible**: Existing manual validation functions preserved - -## Quick Start - -```go -// Define struct with validation tags -type ContactFormRequest struct { - Name string `json:"name" validate:"required,trim,max=100,pattern=name,no_injection"` - Email string `json:"email" validate:"required,trim,max=254,email,no_injection"` - Company string `json:"company" validate:"optional,trim,max=100,pattern=company"` - Subject string `json:"subject" validate:"required,trim,max=200,pattern=subject,no_injection"` - Message string `json:"message" validate:"required,trim,max=5000,sanitize"` - Honeypot string `json:"website" validate:"honeypot"` - Timestamp int64 `json:"timestamp" validate:"timing=2:86400"` -} - -// Validate -req := &ContactFormRequest{...} -if err := ValidateContactFormV2(req); err != nil { - // Handle validation errors - if verrs, ok := err.(ValidationErrors); ok { - for _, e := range verrs { - fmt.Printf("%s: %s\n", e.Field, e.Message) - } - } -} -``` - -## Available Validation Rules - -### Required/Optional -- **`required`** - Field must not be empty (after trimming) -- **`optional`** - Explicit marker for optional fields - -### Transformations -- **`trim`** - Auto-trim whitespace from field value -- **`sanitize`** - Remove newlines and escape HTML (for message body) - -### Length Validation (UTF-8 aware) -- **`min=N`** - Minimum rune count (e.g., `min=3`) -- **`max=N`** - Maximum rune count (e.g., `max=100`) - -### Format Validation -- **`email`** - RFC 5322 email validation -- **`pattern=TYPE`** - Predefined regex patterns: - - `pattern=name` - Letters, spaces, hyphens, apostrophes (international names) - - `pattern=subject` - Alphanumeric + safe punctuation including # - - `pattern=company` - Alphanumeric + business punctuation (&, -, etc.) - -### Security Rules -- **`no_injection`** - Prevent email header injection attacks -- **`honeypot`** - Must be empty (bot detection) -- **`timing=min:max`** - Timestamp validation in seconds (e.g., `timing=2:86400` = 2s to 24h) - -## Rule Execution Order - -1. **Transformations** applied first: `trim`, `sanitize` -2. **Validations** executed in tag order: `required`, `min`, `max`, `email`, `pattern`, `no_injection`, etc. -3. **Multiple errors** can be returned for a single field - -## Validation Errors - -```go -// FieldError - single field error -type FieldError struct { - Field string `json:"field"` // Field name - Tag string `json:"tag"` // Validation tag that failed - Param string `json:"param,omitempty"` // Optional parameter - Message string `json:"message"` // Human-readable message -} - -// ValidationErrors - collection of field errors -type ValidationErrors []FieldError - -// Helper methods -func (ve ValidationErrors) HasErrors() bool -func (ve ValidationErrors) GetFieldError(field string) *FieldError -func (ve ValidationErrors) GetFieldErrors(field string) []FieldError -``` - -## Performance - -Benchmarks on Apple M-series (12 cores): - -``` -Operation Time/op Speed vs Manual -────────────────────────────────────────────────────────────── -Manual Validation 3,089 µs 1x (baseline) -First Validation (no cache) 718 µs 4.3x faster -Cached Validation 363 µs 8.5x faster -Global Validator (cached) 260 µs 11.9x faster ✓ -``` - -**Key Performance Features:** -- Struct metadata caching (sync.Map) - first validation ~500ns, cached ~50ns -- Pre-compiled regex patterns at init time -- UTF-8 aware length validation with `utf8.RuneCountInString()` -- Minimal allocations: ~90KB, ~477 allocs per validation - -## Advanced Usage - -### Custom Validator Instance - -```go -// Create reusable validator with cached metadata -v := NewValidator() - -// Validate multiple structs (metadata cached per type) -err1 := v.Validate(&user1) -err2 := v.Validate(&user2) -``` - -### Error Handling - -```go -err := ValidateContactFormV2(req) -if err != nil { - // Type assert to ValidationErrors - verrs, ok := err.(ValidationErrors) - if !ok { - // Not a validation error - return err - } - - // Get specific field error - if nameErr := verrs.GetFieldError("name"); nameErr != nil { - fmt.Printf("Name error: %s\n", nameErr.Message) - } - - // Get all errors for a field - emailErrors := verrs.GetFieldErrors("email") - - // Iterate all errors - for _, e := range verrs { - fmt.Printf("%s: %s (tag=%s)\n", e.Field, e.Message, e.Tag) - } -} -``` - -### Multiple Validations - -```go -// Tags are executed left to right -validate:"required,trim,max=100,pattern=name,no_injection" - -// Execution order: -// 1. trim (transform) -// 2. required (validate) -// 3. max=100 (validate) -// 4. pattern=name (validate) -// 5. no_injection (validate) -``` - -## Security Features - -### Email Header Injection Prevention -Detects and blocks attempts to inject email headers via newlines or header patterns: - -```go -"Name\nBcc: evil@example.com" // ❌ Blocked by no_injection -"Bcc: evil@example.com" // ❌ Blocked by no_injection -"Normal Name" // ✓ Valid -``` - -### Bot Detection -Two-layer bot protection: - -```go -// Honeypot field (must be empty) -Honeypot string `validate:"honeypot"` - -// Timing validation (2 seconds minimum, 24 hours maximum) -Timestamp int64 `validate:"timing=2:86400"` -``` - -### Input Sanitization -Message field automatically sanitized: - -```go -// Before: "" -// After: "<script>alert('XSS')</script>" -Message string `validate:"sanitize"` -``` - -## Backward Compatibility - -Legacy manual validation functions remain available: - -```go -// V1 - Manual validation (still works) -err := ValidateContactForm(req) - -// V2 - Struct tag validation (recommended) -err := ValidateContactFormV2(req) - -// Helper functions (still available) -IsValidEmail(email string) bool -IsValidName(name string) bool -IsValidSubject(subject string) bool -IsValidCompany(company string) bool -ContainsEmailInjection(s string) bool -SanitizeContactForm(req *ContactFormRequest) -``` - -## Testing - -```bash -# Run all tests -go test ./internal/validation/... - -# Run with coverage -go test -cover ./internal/validation/... - -# Run benchmarks -go test -bench=. -benchmem ./internal/validation/... - -# Run specific test -go test -run TestValidatorV2_EmailValidation ./internal/validation/... -``` - -## Examples - -### Example 1: Contact Form - -```go -req := &ContactFormRequest{ - Name: " John O'Connor ", // Will be trimmed - Email: "john@example.com", - Company: "Acme Corp", // Optional - Subject: "Question #123", - Message: "Hello", // Will be sanitized - Honeypot: "", // Must be empty - Timestamp: time.Now().Unix() - 10, -} - -if err := ValidateContactFormV2(req); err != nil { - log.Fatal(err) -} - -// req.Name is now "John O'Connor" (trimmed) -// req.Message is now "<b>Hello</b>" (sanitized) -``` - -### Example 2: Handling Multiple Errors - -```go -req := &ContactFormRequest{ - Name: strings.Repeat("a", 101), // Too long - Email: "invalid-email", // Invalid format - Subject: "", // Missing required - Message: "Valid message", -} - -err := ValidateContactFormV2(req) -verrs := err.(ValidationErrors) - -// Output all errors -// name: name must be 100 characters or less (max=100) -// email: Invalid email address format (email) -// subject: subject is required (required) -``` - -### Example 3: International Names - -```go -// All valid international names -names := []string{ - "José María", // Spanish - "François Dubois", // French - "Müller", // German - "田中太郎", // Japanese - "Anne-Marie", // Hyphenated - "O'Connor", // Apostrophe -} - -for _, name := range names { - req := &ContactFormRequest{Name: name, ...} - if err := ValidateContactFormV2(req); err != nil { - log.Printf("Valid name rejected: %s", name) - } -} -``` - -## Thread Safety - -All validation functions are thread-safe: - -```go -// Safe for concurrent use -var wg sync.WaitGroup -for i := 0; i < 100; i++ { - wg.Add(1) - go func() { - defer wg.Done() - err := ValidateContactFormV2(req) - // Process err... - }() -} -wg.Wait() -``` - -## Architecture - -``` -validator.go - Core reflection engine with caching -rules.go - Built-in validation rules -errors.go - Error types and helpers -contact.go - Contact form struct and legacy functions -contact_test.go - Legacy validation tests -validator_test.go - Struct tag validation tests -``` - -## Design Principles - -1. **Performance First**: Reflection caching, pre-compiled regex, minimal allocations -2. **Security First**: Defense in depth against injection attacks -3. **Developer Experience**: Clear error messages, type-safe APIs -4. **Zero Dependencies**: Pure Go stdlib for easy maintenance -5. **Backward Compatible**: Existing code continues to work -6. **UTF-8 Aware**: Proper international character support - -## Future Extensions - -The validator can be extended with custom rules: - -```go -// Custom rule example (not yet implemented) -validationRules["custom"] = func(field, value, param string) *FieldError { - // Your custom validation logic - return nil -} -``` - -## License - -Part of the CV Site project - see project LICENSE file.