- Document /text noindex + canonical header solution - Add duplicate content prevention checklist - Document Google Search Console verification setup - Update files overview table with correct paths - Add AI chat agent as modern SEO signal
11 KiB
SEO Implementation Guide
Project: CV Interactive Website Last Updated: 2026-04-09 Status: Production Ready
Overview
This document describes the comprehensive SEO (Search Engine Optimization) implementation for the CV website, including traditional search engine optimization and modern AI-era optimizations for LLM crawlers and AI Overviews.
SEO Architecture
1. Traditional SEO Elements
Meta Tags (templates/index.html)
<!-- Primary Meta Tags -->
<title>{{.CV.Personal.Name}} - {{.CV.SEO.PageTitle}}</title>
<meta name="title" content="...">
<meta name="description" content="...">
<meta name="keywords" content="...">
<meta name="author" content="...">
<meta name="robots" content="index, follow">
<link rel="canonical" href="{{.CanonicalURL}}">
International SEO (Hreflang)
<link rel="alternate" hreflang="en" href="{{.AlternateEN}}">
<link rel="alternate" hreflang="es" href="{{.AlternateES}}">
<link rel="alternate" hreflang="x-default" href="https://juan.andres.morenorub.io/?lang=en">
Social Media Integration
| Platform | Meta Type | Implementation |
|---|---|---|
| Open Graph | og:type, og:title, og:description, og:image |
|
| Twitter/X | Twitter Cards | twitter:card, twitter:title, twitter:description |
| Open Graph | Uses same og:* tags |
2. Structured Data (JSON-LD)
The site implements multiple Schema.org types for comprehensive semantic understanding:
Person Schema (Primary)
{
"@type": "Person",
"@id": "{{.CV.Personal.Website}}/#person",
"name": "...",
"jobTitle": "...",
"description": "...",
"knowsAbout": [...],
"knowsLanguage": [...],
"worksFor": [...],
"hasOccupation": [...]
}
Fields included:
- Basic info: name, givenName, familyName, jobTitle
- Contact: email, telephone, url
- Demographics: birthDate, birthPlace, nationality
- Location: address with locality and country
- Social: sameAs (LinkedIn, GitHub, Domestika)
- Education: alumniOf
- Skills: knowsAbout (array of expertise areas)
- Languages: knowsLanguage (with Language type)
- Employment: worksFor (multiple organizations)
- Occupations: hasOccupation (dynamically generated from experience)
WebSite Schema
{
"@type": "WebSite",
"name": "... - Professional CV",
"url": "...",
"inLanguage": ["en", "es"],
"potentialAction": { "@type": "SearchAction", ... }
}
BreadcrumbList Schema
{
"@type": "BreadcrumbList",
"itemListElement": [
{ "position": 1, "name": "Home", "item": "..." },
{ "position": 2, "name": "CV (English/Español)", "item": ".../?lang=..." }
]
}
ProfilePage Schema
{
"@type": "ProfilePage",
"mainEntity": { "@id": ".../#person" },
"dateCreated": "...",
"dateModified": "...",
"inLanguage": "..."
}
EducationalOccupationalCredential Schema
Generated dynamically for each education entry:
{
"@type": "EducationalOccupationalCredential",
"name": "{{.Degree}}",
"description": "{{.Field}}",
"educationalLevel": "Bachelor's Degree",
"credentialCategory": "degree",
"recognizedBy": { "@type": "CollegeOrUniversity", ... }
}
Course Schema
Generated dynamically for each course/certification:
{
"@type": "Course",
"name": "{{.Title}}",
"description": "...",
"provider": { "@type": "Organization", ... },
"hasCourseInstance": { "@type": "CourseInstance", ... },
"timeRequired": "{{.Duration}}"
}
3. AI-Era SEO Optimizations
llms.txt File (static/llms.txt)
A dedicated file for AI crawlers following the llmstxt.org standard:
# llms.txt - AI Crawler Information
name: Juan Andrés Moreno Rubio - Professional CV
description: Interactive curriculum vitae...
## Professional Summary
- Senior Technical Consultant...
## Key Expertise
- SAP Customer Data Cloud...
## Contact
- Website: ...
- LinkedIn: ...
Purpose: Provides AI systems (ChatGPT, Claude, Perplexity, etc.) with structured, human-readable information about the site content.
Plain Text Auto-Detection (/text endpoint)
The site automatically detects text-based browsers and CLI tools, serving a clean 80-character plain text version:
Auto-detected clients:
| Client | Type |
|---|---|
| curl | CLI tool |
| wget | CLI tool |
| HTTPie | CLI tool |
| Lynx | Text browser |
| w3m | Text browser |
| Links/ELinks | Text browser |
| Browsh | Terminal browser |
| Carbonyl | Terminal browser |
Usage:
# Auto-detected (serves plain text):
curl https://juan.andres.morenorub.io/
# Explicit endpoint:
curl https://juan.andres.morenorub.io/text?lang=en
# With Accept header:
curl -H "Accept: text/plain" https://juan.andres.morenorub.io/
Output features:
- 80-character line wrapping
- ASCII art section headers
- Clean, structured text
- All CV content preserved
Duplicate Content Prevention (April 2026)
Problem discovered: Google was indexing /text instead of the main HTML page, causing the plain text version to appear as the primary search result.
Root cause: The /text endpoint served the same CV content as the HTML page but with no SEO signals (no meta tags, no canonical, no noindex). Google favored it because plain text is easier to crawl and has dense keyword content.
Solution implemented:
-
X-Robots-Tag: noindex, nofollowHTTP header on/textresponses- Tells search engines not to index the plain text version
- Does NOT block crawling — LLMs and text browsers can still access it
- Implementation:
internal/handlers/cv_text.go
-
Link: canonicalHTTP header on/textresponses- Points to the HTML version:
<https://juan.andres.morenorub.io/?lang=en>; rel="canonical" - Tells search engines which version is the "official" one
- Points to the HTML version:
-
robots.txt comment (not a Disallow — intentionally crawlable for LLMs)
/textremains accessible for AI crawlers, curl, and text browsers- Only search engine indexing is prevented via the HTTP header
-
Google Search Console verification
<meta name="google-site-verification">tag added to<head>- Manual re-indexation requested for
/?lang=enand/?lang=es - Manual removal of
/textfrom search index
Verification:
# Check that /text has noindex header:
curl -sI 'https://juan.andres.morenorub.io/text?lang=en' | grep X-Robots
# → X-Robots-Tag: noindex, nofollow
# Check canonical points to HTML version:
curl -sI 'https://juan.andres.morenorub.io/text?lang=en' | grep Link
# → Link: <https://juan.andres.morenorub.io/?lang=en>; rel="canonical"
Key principle: The /text endpoint is for consumption (LLMs, terminals), not for discovery (search engines). Search results should always point to the rich HTML version with structured data, icons, and the AI chat agent.
robots.txt AI Bot Rules (static/robots.txt)
Explicit permissions for AI crawlers:
| Bot | Service | Status |
|---|---|---|
| GPTBot | OpenAI/ChatGPT | Allowed |
| ChatGPT-User | OpenAI | Allowed |
| ClaudeBot | Anthropic | Allowed |
| Claude-Web | Anthropic | Allowed |
| anthropic-ai | Anthropic | Allowed |
| Google-Extended | Google AI/Gemini | Allowed |
| PerplexityBot | Perplexity AI | Allowed |
| cohere-ai | Cohere | Allowed |
| CCBot | Common Crawl | Allowed |
| Amazonbot | Amazon/Alexa | Allowed |
| Applebot | Apple/Siri | Allowed |
| Copilot | Microsoft | Allowed |
| YouBot | You.com | Allowed |
| BraveBot | Brave Search | Allowed |
E-E-A-T Signals
The implementation supports Google's E-E-A-T (Experience, Expertise, Authority, Trust) framework:
Experience
- Detailed work history with responsibilities
- Real project descriptions
- Duration and dates for credibility
Expertise
- Skills categorized by domain
- Technologies listed per job
- Certifications and courses
Authority
- Links to LinkedIn, GitHub, portfolio
- Company associations (SAP, Olympic Broadcasting)
- Client count and project metrics in summary
Trust
- Canonical URLs prevent duplicate content
- HTTPS enforced
- Clear contact information
- Privacy-respecting analytics (Matomo)
Files Overview
| File | Purpose |
|---|---|
templates/partials/layout/head.html |
Meta tags, canonical, hreflang, Google verification |
templates/partials/layout/head-structured-data.html |
JSON-LD schemas (Person, WebSite, etc.) |
static/robots.txt |
Search engine and AI bot directives |
static/llms.txt |
AI crawler information file (llmstxt.org) |
static/sitemap.xml |
XML sitemap for search engines |
data/cv-en.json |
SEO fields (pageTitle, metaTitle, etc.) |
data/cv-es.json |
Spanish SEO fields |
internal/handlers/cv_text.go |
Plain text endpoint with noindex + canonical headers |
templates/cv-text.txt |
Plain text template |
SEO Data Model
The SEO-specific fields in data/cv-{lang}.json:
{
"seo": {
"pageTitle": "Curriculum Vitae",
"metaTitle": "Professional CV",
"metaDescription": "18 years of experience in...",
"ogDescription": "Senior Technical Consultant...",
"keywords": "CV, Resume, FullStack Developer, SAP CDC..."
}
}
Validation & Testing
Schema Validation
Test structured data at:
Expected Schema Count
The site generates 12+ JSON-LD blocks:
- 1 Person schema
- 1 WebSite schema
- 1 BreadcrumbList schema
- 1 ProfilePage schema
- N EducationalOccupationalCredential schemas (1 per education)
- N Course schemas (1 per course)
robots.txt Validation
Test at: Google Robots.txt Tester
Best Practices Implemented
Content Structure
- Clear H1-H6 heading hierarchy
- Semantic HTML5 elements (article, section, nav)
- Alt text for images
- Descriptive link text
Technical SEO
- Mobile-responsive design
- Fast page load (bundled CSS, preload fonts)
- Canonical URLs
- Hreflang for multilingual
- Sitemap.xml
- robots.txt with AI bot rules
Modern SEO (AI-Era)
- llms.txt file
- Comprehensive JSON-LD schemas
- AI bot permissions in robots.txt
- Clear, parseable content structure
- AI chat agent (Gemini) for interactive CV queries
- Plain text endpoint for LLM consumption (noindex for search engines)
- Google Search Console verified and monitored
Duplicate Content Prevention
/textendpoint:X-Robots-Tag: noindex, nofollow/textendpoint:Link: canonicalpointing to HTML version- Sitemap only contains HTML pages (not
/text) - Canonical URLs on all HTML pages
Maintenance
When to Update
- Content changes: Update
data/cv-{lang}.jsonSEO fields - New sections: Add corresponding Schema.org types
- New AI bots: Add to
robots.txt - Annual review: Update
llms.txtwith current info
Monitoring
- Google Search Console for traditional SEO
- Matomo Analytics for traffic patterns
- Manual testing in AI chat interfaces (ChatGPT, Claude, Perplexity)