Files
cv-site/doc/15-SEO.md
T

402 lines
11 KiB
Markdown
Raw Normal View History

# SEO Implementation Guide
**Project:** CV Interactive Website
**Last Updated:** 2026-04-09
**Status:** Production Ready
---
## Overview
This document describes the comprehensive SEO (Search Engine Optimization) implementation for the CV website, including traditional search engine optimization and modern AI-era optimizations for LLM crawlers and AI Overviews.
---
## SEO Architecture
### 1. Traditional SEO Elements
#### Meta Tags (`templates/index.html`)
```html
<!-- Primary Meta Tags -->
<title>{{.CV.Personal.Name}} - {{.CV.SEO.PageTitle}}</title>
<meta name="title" content="...">
<meta name="description" content="...">
<meta name="keywords" content="...">
<meta name="author" content="...">
<meta name="robots" content="index, follow">
<link rel="canonical" href="{{.CanonicalURL}}">
```
#### International SEO (Hreflang)
```html
<link rel="alternate" hreflang="en" href="{{.AlternateEN}}">
<link rel="alternate" hreflang="es" href="{{.AlternateES}}">
<link rel="alternate" hreflang="x-default" href="https://juan.andres.morenorub.io/?lang=en">
```
#### Social Media Integration
| Platform | Meta Type | Implementation |
|----------|-----------|----------------|
| Facebook | Open Graph | `og:type`, `og:title`, `og:description`, `og:image` |
| Twitter/X | Twitter Cards | `twitter:card`, `twitter:title`, `twitter:description` |
| LinkedIn | Open Graph | Uses same `og:*` tags |
---
### 2. Structured Data (JSON-LD)
The site implements multiple Schema.org types for comprehensive semantic understanding:
#### Person Schema (Primary)
```json
{
"@type": "Person",
"@id": "{{.CV.Personal.Website}}/#person",
"name": "...",
"jobTitle": "...",
"description": "...",
"knowsAbout": [...],
"knowsLanguage": [...],
"worksFor": [...],
"hasOccupation": [...]
}
```
**Fields included:**
- Basic info: name, givenName, familyName, jobTitle
- Contact: email, telephone, url
- Demographics: birthDate, birthPlace, nationality
- Location: address with locality and country
- Social: sameAs (LinkedIn, GitHub, Domestika)
- Education: alumniOf
- Skills: knowsAbout (array of expertise areas)
- Languages: knowsLanguage (with Language type)
- Employment: worksFor (multiple organizations)
- Occupations: hasOccupation (dynamically generated from experience)
#### WebSite Schema
```json
{
"@type": "WebSite",
"name": "... - Professional CV",
"url": "...",
"inLanguage": ["en", "es"],
"potentialAction": { "@type": "SearchAction", ... }
}
```
#### BreadcrumbList Schema
```json
{
"@type": "BreadcrumbList",
"itemListElement": [
{ "position": 1, "name": "Home", "item": "..." },
{ "position": 2, "name": "CV (English/Español)", "item": ".../?lang=..." }
]
}
```
#### ProfilePage Schema
```json
{
"@type": "ProfilePage",
"mainEntity": { "@id": ".../#person" },
"dateCreated": "...",
"dateModified": "...",
"inLanguage": "..."
}
```
#### EducationalOccupationalCredential Schema
Generated dynamically for each education entry:
```json
{
"@type": "EducationalOccupationalCredential",
"name": "{{.Degree}}",
"description": "{{.Field}}",
"educationalLevel": "Bachelor's Degree",
"credentialCategory": "degree",
"recognizedBy": { "@type": "CollegeOrUniversity", ... }
}
```
#### Course Schema
Generated dynamically for each course/certification:
```json
{
"@type": "Course",
"name": "{{.Title}}",
"description": "...",
"provider": { "@type": "Organization", ... },
"hasCourseInstance": { "@type": "CourseInstance", ... },
"timeRequired": "{{.Duration}}"
}
```
---
### 3. AI-Era SEO Optimizations
#### llms.txt File (`static/llms.txt`)
A dedicated file for AI crawlers following the [llmstxt.org](https://llmstxt.org/) standard:
```
# llms.txt - AI Crawler Information
name: Juan Andrés Moreno Rubio - Professional CV
description: Interactive curriculum vitae...
## Professional Summary
- Senior Technical Consultant...
## Key Expertise
- SAP Customer Data Cloud...
## Contact
- Website: ...
- LinkedIn: ...
```
**Purpose:** Provides AI systems (ChatGPT, Claude, Perplexity, etc.) with structured, human-readable information about the site content.
#### Plain Text Auto-Detection (`/text` endpoint)
The site automatically detects text-based browsers and CLI tools, serving a clean 80-character plain text version:
**Auto-detected clients:**
| Client | Type |
|--------|------|
| curl | CLI tool |
| wget | CLI tool |
| HTTPie | CLI tool |
| Lynx | Text browser |
| w3m | Text browser |
| Links/ELinks | Text browser |
| Browsh | Terminal browser |
| Carbonyl | Terminal browser |
**Usage:**
```bash
# Auto-detected (serves plain text):
curl https://juan.andres.morenorub.io/
# Explicit endpoint:
curl https://juan.andres.morenorub.io/text?lang=en
# With Accept header:
curl -H "Accept: text/plain" https://juan.andres.morenorub.io/
```
**Output features:**
- 80-character line wrapping
- ASCII art section headers
- Clean, structured text
- All CV content preserved
#### Duplicate Content Prevention (April 2026)
**Problem discovered:** Google was indexing `/text` instead of the main HTML page, causing the plain text version to appear as the primary search result.
**Root cause:** The `/text` endpoint served the same CV content as the HTML page but with no SEO signals (no meta tags, no canonical, no noindex). Google favored it because plain text is easier to crawl and has dense keyword content.
**Solution implemented:**
1. **`X-Robots-Tag: noindex, nofollow`** HTTP header on `/text` responses
- Tells search engines not to index the plain text version
- Does NOT block crawling — LLMs and text browsers can still access it
- Implementation: `internal/handlers/cv_text.go`
2. **`Link: canonical`** HTTP header on `/text` responses
- Points to the HTML version: `<https://juan.andres.morenorub.io/?lang=en>; rel="canonical"`
- Tells search engines which version is the "official" one
3. **robots.txt comment** (not a Disallow — intentionally crawlable for LLMs)
- `/text` remains accessible for AI crawlers, curl, and text browsers
- Only search engine indexing is prevented via the HTTP header
4. **Google Search Console verification**
- `<meta name="google-site-verification">` tag added to `<head>`
- Manual re-indexation requested for `/?lang=en` and `/?lang=es`
- Manual removal of `/text` from search index
**Verification:**
```bash
# Check that /text has noindex header:
curl -sI 'https://juan.andres.morenorub.io/text?lang=en' | grep X-Robots
# → X-Robots-Tag: noindex, nofollow
# Check canonical points to HTML version:
curl -sI 'https://juan.andres.morenorub.io/text?lang=en' | grep Link
# → Link: <https://juan.andres.morenorub.io/?lang=en>; rel="canonical"
```
**Key principle:** The `/text` endpoint is for **consumption** (LLMs, terminals), not for **discovery** (search engines). Search results should always point to the rich HTML version with structured data, icons, and the AI chat agent.
---
#### robots.txt AI Bot Rules (`static/robots.txt`)
Explicit permissions for AI crawlers:
| Bot | Service | Status |
|-----|---------|--------|
| GPTBot | OpenAI/ChatGPT | Allowed |
| ChatGPT-User | OpenAI | Allowed |
| ClaudeBot | Anthropic | Allowed |
| Claude-Web | Anthropic | Allowed |
| anthropic-ai | Anthropic | Allowed |
| Google-Extended | Google AI/Gemini | Allowed |
| PerplexityBot | Perplexity AI | Allowed |
| cohere-ai | Cohere | Allowed |
| CCBot | Common Crawl | Allowed |
| Amazonbot | Amazon/Alexa | Allowed |
| Applebot | Apple/Siri | Allowed |
| Copilot | Microsoft | Allowed |
| YouBot | You.com | Allowed |
| BraveBot | Brave Search | Allowed |
---
## E-E-A-T Signals
The implementation supports Google's E-E-A-T (Experience, Expertise, Authority, Trust) framework:
### Experience
- Detailed work history with responsibilities
- Real project descriptions
- Duration and dates for credibility
### Expertise
- Skills categorized by domain
- Technologies listed per job
- Certifications and courses
### Authority
- Links to LinkedIn, GitHub, portfolio
- Company associations (SAP, Olympic Broadcasting)
- Client count and project metrics in summary
### Trust
- Canonical URLs prevent duplicate content
- HTTPS enforced
- Clear contact information
- Privacy-respecting analytics (Matomo)
---
## Files Overview
| File | Purpose |
|------|---------|
| `templates/partials/layout/head.html` | Meta tags, canonical, hreflang, Google verification |
| `templates/partials/layout/head-structured-data.html` | JSON-LD schemas (Person, WebSite, etc.) |
| `static/robots.txt` | Search engine and AI bot directives |
| `static/llms.txt` | AI crawler information file (llmstxt.org) |
| `static/sitemap.xml` | XML sitemap for search engines |
| `data/cv-en.json` | SEO fields (pageTitle, metaTitle, etc.) |
| `data/cv-es.json` | Spanish SEO fields |
| `internal/handlers/cv_text.go` | Plain text endpoint with noindex + canonical headers |
| `templates/cv-text.txt` | Plain text template |
---
## SEO Data Model
The SEO-specific fields in `data/cv-{lang}.json`:
```json
{
"seo": {
"pageTitle": "Curriculum Vitae",
"metaTitle": "Professional CV",
"metaDescription": "18 years of experience in...",
"ogDescription": "Senior Technical Consultant...",
"keywords": "CV, Resume, FullStack Developer, SAP CDC..."
}
}
```
---
## Validation & Testing
### Schema Validation
Test structured data at:
- [Google Rich Results Test](https://search.google.com/test/rich-results)
- [Schema.org Validator](https://validator.schema.org/)
### Expected Schema Count
The site generates **12+ JSON-LD blocks**:
- 1 Person schema
- 1 WebSite schema
- 1 BreadcrumbList schema
- 1 ProfilePage schema
- N EducationalOccupationalCredential schemas (1 per education)
- N Course schemas (1 per course)
### robots.txt Validation
Test at: [Google Robots.txt Tester](https://www.google.com/webmasters/tools/robots-testing-tool)
---
## Best Practices Implemented
### Content Structure
- [ ] Clear H1-H6 heading hierarchy
- [x] Semantic HTML5 elements (article, section, nav)
- [x] Alt text for images
- [x] Descriptive link text
### Technical SEO
- [x] Mobile-responsive design
- [x] Fast page load (bundled CSS, preload fonts)
- [x] Canonical URLs
- [x] Hreflang for multilingual
- [x] Sitemap.xml
- [x] robots.txt with AI bot rules
### Modern SEO (AI-Era)
- [x] llms.txt file
- [x] Comprehensive JSON-LD schemas
- [x] AI bot permissions in robots.txt
- [x] Clear, parseable content structure
- [x] AI chat agent (Gemini) for interactive CV queries
- [x] Plain text endpoint for LLM consumption (noindex for search engines)
- [x] Google Search Console verified and monitored
### Duplicate Content Prevention
- [x] `/text` endpoint: `X-Robots-Tag: noindex, nofollow`
- [x] `/text` endpoint: `Link: canonical` pointing to HTML version
- [x] Sitemap only contains HTML pages (not `/text`)
- [x] Canonical URLs on all HTML pages
---
## Maintenance
### When to Update
1. **Content changes**: Update `data/cv-{lang}.json` SEO fields
2. **New sections**: Add corresponding Schema.org types
3. **New AI bots**: Add to `robots.txt`
4. **Annual review**: Update `llms.txt` with current info
### Monitoring
- Google Search Console for traditional SEO
- Matomo Analytics for traffic patterns
- Manual testing in AI chat interfaces (ChatGPT, Claude, Perplexity)
---
## References
- [Schema.org](https://schema.org/)
- [Google Search Central](https://developers.google.com/search)
- [llmstxt.org Standard](https://llmstxt.org/)
- [WPBeginner SEO Guide 2025](https://www.wpbeginner.com/opinion/does-seo-still-work/)