Files
cv-site/doc/15-SEO.md
T
juanatsap f8b48b92a3 docs: update SEO guide — duplicate content fix, Search Console, AI-era strategy
- Document /text noindex + canonical header solution
- Add duplicate content prevention checklist
- Document Google Search Console verification setup
- Update files overview table with correct paths
- Add AI chat agent as modern SEO signal
2026-04-09 12:56:22 +01:00

11 KiB

SEO Implementation Guide

Project: CV Interactive Website Last Updated: 2026-04-09 Status: Production Ready


Overview

This document describes the comprehensive SEO (Search Engine Optimization) implementation for the CV website, including traditional search engine optimization and modern AI-era optimizations for LLM crawlers and AI Overviews.


SEO Architecture

1. Traditional SEO Elements

Meta Tags (templates/index.html)

<!-- Primary Meta Tags -->
<title>{{.CV.Personal.Name}} - {{.CV.SEO.PageTitle}}</title>
<meta name="title" content="...">
<meta name="description" content="...">
<meta name="keywords" content="...">
<meta name="author" content="...">
<meta name="robots" content="index, follow">
<link rel="canonical" href="{{.CanonicalURL}}">

International SEO (Hreflang)

<link rel="alternate" hreflang="en" href="{{.AlternateEN}}">
<link rel="alternate" hreflang="es" href="{{.AlternateES}}">
<link rel="alternate" hreflang="x-default" href="https://juan.andres.morenorub.io/?lang=en">

Social Media Integration

Platform Meta Type Implementation
Facebook Open Graph og:type, og:title, og:description, og:image
Twitter/X Twitter Cards twitter:card, twitter:title, twitter:description
LinkedIn Open Graph Uses same og:* tags

2. Structured Data (JSON-LD)

The site implements multiple Schema.org types for comprehensive semantic understanding:

Person Schema (Primary)

{
  "@type": "Person",
  "@id": "{{.CV.Personal.Website}}/#person",
  "name": "...",
  "jobTitle": "...",
  "description": "...",
  "knowsAbout": [...],
  "knowsLanguage": [...],
  "worksFor": [...],
  "hasOccupation": [...]
}

Fields included:

  • Basic info: name, givenName, familyName, jobTitle
  • Contact: email, telephone, url
  • Demographics: birthDate, birthPlace, nationality
  • Location: address with locality and country
  • Social: sameAs (LinkedIn, GitHub, Domestika)
  • Education: alumniOf
  • Skills: knowsAbout (array of expertise areas)
  • Languages: knowsLanguage (with Language type)
  • Employment: worksFor (multiple organizations)
  • Occupations: hasOccupation (dynamically generated from experience)

WebSite Schema

{
  "@type": "WebSite",
  "name": "... - Professional CV",
  "url": "...",
  "inLanguage": ["en", "es"],
  "potentialAction": { "@type": "SearchAction", ... }
}

BreadcrumbList Schema

{
  "@type": "BreadcrumbList",
  "itemListElement": [
    { "position": 1, "name": "Home", "item": "..." },
    { "position": 2, "name": "CV (English/Español)", "item": ".../?lang=..." }
  ]
}

ProfilePage Schema

{
  "@type": "ProfilePage",
  "mainEntity": { "@id": ".../#person" },
  "dateCreated": "...",
  "dateModified": "...",
  "inLanguage": "..."
}

EducationalOccupationalCredential Schema

Generated dynamically for each education entry:

{
  "@type": "EducationalOccupationalCredential",
  "name": "{{.Degree}}",
  "description": "{{.Field}}",
  "educationalLevel": "Bachelor's Degree",
  "credentialCategory": "degree",
  "recognizedBy": { "@type": "CollegeOrUniversity", ... }
}

Course Schema

Generated dynamically for each course/certification:

{
  "@type": "Course",
  "name": "{{.Title}}",
  "description": "...",
  "provider": { "@type": "Organization", ... },
  "hasCourseInstance": { "@type": "CourseInstance", ... },
  "timeRequired": "{{.Duration}}"
}

3. AI-Era SEO Optimizations

llms.txt File (static/llms.txt)

A dedicated file for AI crawlers following the llmstxt.org standard:

# llms.txt - AI Crawler Information
name: Juan Andrés Moreno Rubio - Professional CV
description: Interactive curriculum vitae...

## Professional Summary
- Senior Technical Consultant...

## Key Expertise
- SAP Customer Data Cloud...

## Contact
- Website: ...
- LinkedIn: ...

Purpose: Provides AI systems (ChatGPT, Claude, Perplexity, etc.) with structured, human-readable information about the site content.

Plain Text Auto-Detection (/text endpoint)

The site automatically detects text-based browsers and CLI tools, serving a clean 80-character plain text version:

Auto-detected clients:

Client Type
curl CLI tool
wget CLI tool
HTTPie CLI tool
Lynx Text browser
w3m Text browser
Links/ELinks Text browser
Browsh Terminal browser
Carbonyl Terminal browser

Usage:

# Auto-detected (serves plain text):
curl https://juan.andres.morenorub.io/

# Explicit endpoint:
curl https://juan.andres.morenorub.io/text?lang=en

# With Accept header:
curl -H "Accept: text/plain" https://juan.andres.morenorub.io/

Output features:

  • 80-character line wrapping
  • ASCII art section headers
  • Clean, structured text
  • All CV content preserved

Duplicate Content Prevention (April 2026)

Problem discovered: Google was indexing /text instead of the main HTML page, causing the plain text version to appear as the primary search result.

Root cause: The /text endpoint served the same CV content as the HTML page but with no SEO signals (no meta tags, no canonical, no noindex). Google favored it because plain text is easier to crawl and has dense keyword content.

Solution implemented:

  1. X-Robots-Tag: noindex, nofollow HTTP header on /text responses

    • Tells search engines not to index the plain text version
    • Does NOT block crawling — LLMs and text browsers can still access it
    • Implementation: internal/handlers/cv_text.go
  2. Link: canonical HTTP header on /text responses

    • Points to the HTML version: <https://juan.andres.morenorub.io/?lang=en>; rel="canonical"
    • Tells search engines which version is the "official" one
  3. robots.txt comment (not a Disallow — intentionally crawlable for LLMs)

    • /text remains accessible for AI crawlers, curl, and text browsers
    • Only search engine indexing is prevented via the HTTP header
  4. Google Search Console verification

    • <meta name="google-site-verification"> tag added to <head>
    • Manual re-indexation requested for /?lang=en and /?lang=es
    • Manual removal of /text from search index

Verification:

# Check that /text has noindex header:
curl -sI 'https://juan.andres.morenorub.io/text?lang=en' | grep X-Robots
# → X-Robots-Tag: noindex, nofollow

# Check canonical points to HTML version:
curl -sI 'https://juan.andres.morenorub.io/text?lang=en' | grep Link
# → Link: <https://juan.andres.morenorub.io/?lang=en>; rel="canonical"

Key principle: The /text endpoint is for consumption (LLMs, terminals), not for discovery (search engines). Search results should always point to the rich HTML version with structured data, icons, and the AI chat agent.


robots.txt AI Bot Rules (static/robots.txt)

Explicit permissions for AI crawlers:

Bot Service Status
GPTBot OpenAI/ChatGPT Allowed
ChatGPT-User OpenAI Allowed
ClaudeBot Anthropic Allowed
Claude-Web Anthropic Allowed
anthropic-ai Anthropic Allowed
Google-Extended Google AI/Gemini Allowed
PerplexityBot Perplexity AI Allowed
cohere-ai Cohere Allowed
CCBot Common Crawl Allowed
Amazonbot Amazon/Alexa Allowed
Applebot Apple/Siri Allowed
Copilot Microsoft Allowed
YouBot You.com Allowed
BraveBot Brave Search Allowed

E-E-A-T Signals

The implementation supports Google's E-E-A-T (Experience, Expertise, Authority, Trust) framework:

Experience

  • Detailed work history with responsibilities
  • Real project descriptions
  • Duration and dates for credibility

Expertise

  • Skills categorized by domain
  • Technologies listed per job
  • Certifications and courses

Authority

  • Links to LinkedIn, GitHub, portfolio
  • Company associations (SAP, Olympic Broadcasting)
  • Client count and project metrics in summary

Trust

  • Canonical URLs prevent duplicate content
  • HTTPS enforced
  • Clear contact information
  • Privacy-respecting analytics (Matomo)

Files Overview

File Purpose
templates/partials/layout/head.html Meta tags, canonical, hreflang, Google verification
templates/partials/layout/head-structured-data.html JSON-LD schemas (Person, WebSite, etc.)
static/robots.txt Search engine and AI bot directives
static/llms.txt AI crawler information file (llmstxt.org)
static/sitemap.xml XML sitemap for search engines
data/cv-en.json SEO fields (pageTitle, metaTitle, etc.)
data/cv-es.json Spanish SEO fields
internal/handlers/cv_text.go Plain text endpoint with noindex + canonical headers
templates/cv-text.txt Plain text template

SEO Data Model

The SEO-specific fields in data/cv-{lang}.json:

{
  "seo": {
    "pageTitle": "Curriculum Vitae",
    "metaTitle": "Professional CV",
    "metaDescription": "18 years of experience in...",
    "ogDescription": "Senior Technical Consultant...",
    "keywords": "CV, Resume, FullStack Developer, SAP CDC..."
  }
}

Validation & Testing

Schema Validation

Test structured data at:

Expected Schema Count

The site generates 12+ JSON-LD blocks:

  • 1 Person schema
  • 1 WebSite schema
  • 1 BreadcrumbList schema
  • 1 ProfilePage schema
  • N EducationalOccupationalCredential schemas (1 per education)
  • N Course schemas (1 per course)

robots.txt Validation

Test at: Google Robots.txt Tester


Best Practices Implemented

Content Structure

  • Clear H1-H6 heading hierarchy
  • Semantic HTML5 elements (article, section, nav)
  • Alt text for images
  • Descriptive link text

Technical SEO

  • Mobile-responsive design
  • Fast page load (bundled CSS, preload fonts)
  • Canonical URLs
  • Hreflang for multilingual
  • Sitemap.xml
  • robots.txt with AI bot rules

Modern SEO (AI-Era)

  • llms.txt file
  • Comprehensive JSON-LD schemas
  • AI bot permissions in robots.txt
  • Clear, parseable content structure
  • AI chat agent (Gemini) for interactive CV queries
  • Plain text endpoint for LLM consumption (noindex for search engines)
  • Google Search Console verified and monitored

Duplicate Content Prevention

  • /text endpoint: X-Robots-Tag: noindex, nofollow
  • /text endpoint: Link: canonical pointing to HTML version
  • Sitemap only contains HTML pages (not /text)
  • Canonical URLs on all HTML pages

Maintenance

When to Update

  1. Content changes: Update data/cv-{lang}.json SEO fields
  2. New sections: Add corresponding Schema.org types
  3. New AI bots: Add to robots.txt
  4. Annual review: Update llms.txt with current info

Monitoring

  • Google Search Console for traditional SEO
  • Matomo Analytics for traffic patterns
  • Manual testing in AI chat interfaces (ChatGPT, Claude, Perplexity)

References