Files

T

juanatsap f8b48b92a3 docs: update SEO guide — duplicate content fix, Search Console, AI-era strategy

- Document /text noindex + canonical header solution
- Add duplicate content prevention checklist
- Document Google Search Console verification setup
- Update files overview table with correct paths
- Add AI chat agent as modern SEO signal

2026-04-09 12:56:22 +01:00

11 KiB

Raw Blame History

SEO Implementation Guide

Project: CV Interactive Website Last Updated: 2026-04-09 Status: Production Ready

Overview

This document describes the comprehensive SEO (Search Engine Optimization) implementation for the CV website, including traditional search engine optimization and modern AI-era optimizations for LLM crawlers and AI Overviews.

SEO Architecture

1. Traditional SEO Elements

Meta Tags (`templates/index.html`)

<!-- Primary Meta Tags -->
<title>{{.CV.Personal.Name}} - {{.CV.SEO.PageTitle}}</title>
<meta name="title" content="...">
<meta name="description" content="...">
<meta name="keywords" content="...">
<meta name="author" content="...">
<meta name="robots" content="index, follow">
<link rel="canonical" href="{{.CanonicalURL}}">

International SEO (Hreflang)

<link rel="alternate" hreflang="en" href="{{.AlternateEN}}">
<link rel="alternate" hreflang="es" href="{{.AlternateES}}">
<link rel="alternate" hreflang="x-default" href="https://juan.andres.morenorub.io/?lang=en">

Platform	Meta Type	Implementation
Facebook	Open Graph	`og:type`, `og:title`, `og:description`, `og:image`
Twitter/X	Twitter Cards	`twitter:card`, `twitter:title`, `twitter:description`
LinkedIn	Open Graph	Uses same `og:*` tags

2. Structured Data (JSON-LD)

The site implements multiple Schema.org types for comprehensive semantic understanding:

Person Schema (Primary)

{
  "@type": "Person",
  "@id": "{{.CV.Personal.Website}}/#person",
  "name": "...",
  "jobTitle": "...",
  "description": "...",
  "knowsAbout": [...],
  "knowsLanguage": [...],
  "worksFor": [...],
  "hasOccupation": [...]
}

Fields included:

Basic info: name, givenName, familyName, jobTitle
Contact: email, telephone, url
Demographics: birthDate, birthPlace, nationality
Location: address with locality and country
Social: sameAs (LinkedIn, GitHub, Domestika)
Education: alumniOf
Skills: knowsAbout (array of expertise areas)
Languages: knowsLanguage (with Language type)
Employment: worksFor (multiple organizations)
Occupations: hasOccupation (dynamically generated from experience)

WebSite Schema

{
  "@type": "WebSite",
  "name": "... - Professional CV",
  "url": "...",
  "inLanguage": ["en", "es"],
  "potentialAction": { "@type": "SearchAction", ... }
}

BreadcrumbList Schema

{
  "@type": "BreadcrumbList",
  "itemListElement": [
    { "position": 1, "name": "Home", "item": "..." },
    { "position": 2, "name": "CV (English/Español)", "item": ".../?lang=..." }
  ]
}

ProfilePage Schema

{
  "@type": "ProfilePage",
  "mainEntity": { "@id": ".../#person" },
  "dateCreated": "...",
  "dateModified": "...",
  "inLanguage": "..."
}

EducationalOccupationalCredential Schema

Generated dynamically for each education entry:

{
  "@type": "EducationalOccupationalCredential",
  "name": "{{.Degree}}",
  "description": "{{.Field}}",
  "educationalLevel": "Bachelor's Degree",
  "credentialCategory": "degree",
  "recognizedBy": { "@type": "CollegeOrUniversity", ... }
}

Course Schema

Generated dynamically for each course/certification:

{
  "@type": "Course",
  "name": "{{.Title}}",
  "description": "...",
  "provider": { "@type": "Organization", ... },
  "hasCourseInstance": { "@type": "CourseInstance", ... },
  "timeRequired": "{{.Duration}}"
}

3. AI-Era SEO Optimizations

llms.txt File (`static/llms.txt`)

A dedicated file for AI crawlers following the llmstxt.org standard:

# llms.txt - AI Crawler Information
name: Juan Andrés Moreno Rubio - Professional CV
description: Interactive curriculum vitae...

## Professional Summary
- Senior Technical Consultant...

## Key Expertise
- SAP Customer Data Cloud...

## Contact
- Website: ...
- LinkedIn: ...

Purpose: Provides AI systems (ChatGPT, Claude, Perplexity, etc.) with structured, human-readable information about the site content.

Plain Text Auto-Detection (`/text` endpoint)

The site automatically detects text-based browsers and CLI tools, serving a clean 80-character plain text version:

Auto-detected clients:

Client	Type
curl	CLI tool
wget	CLI tool
HTTPie	CLI tool
Lynx	Text browser
w3m	Text browser
Links/ELinks	Text browser
Browsh	Terminal browser
Carbonyl	Terminal browser

Usage:

# Auto-detected (serves plain text):
curl https://juan.andres.morenorub.io/

# Explicit endpoint:
curl https://juan.andres.morenorub.io/text?lang=en

# With Accept header:
curl -H "Accept: text/plain" https://juan.andres.morenorub.io/

Output features:

80-character line wrapping
ASCII art section headers
Clean, structured text
All CV content preserved

Duplicate Content Prevention (April 2026)

Problem discovered: Google was indexing /text instead of the main HTML page, causing the plain text version to appear as the primary search result.

Root cause: The /text endpoint served the same CV content as the HTML page but with no SEO signals (no meta tags, no canonical, no noindex). Google favored it because plain text is easier to crawl and has dense keyword content.

Solution implemented:

X-Robots-Tag: noindex, nofollow HTTP header on /text responses
- Tells search engines not to index the plain text version
- Does NOT block crawling — LLMs and text browsers can still access it
- Implementation: internal/handlers/cv_text.go
Link: canonical HTTP header on /text responses
- Points to the HTML version: <https://juan.andres.morenorub.io/?lang=en>; rel="canonical"
- Tells search engines which version is the "official" one
robots.txt comment (not a Disallow — intentionally crawlable for LLMs)
- /text remains accessible for AI crawlers, curl, and text browsers
- Only search engine indexing is prevented via the HTTP header
Google Search Console verification
- <meta name="google-site-verification"> tag added to <head>
- Manual re-indexation requested for /?lang=en and /?lang=es
- Manual removal of /text from search index

Verification:

# Check that /text has noindex header:
curl -sI 'https://juan.andres.morenorub.io/text?lang=en' | grep X-Robots
# → X-Robots-Tag: noindex, nofollow

# Check canonical points to HTML version:
curl -sI 'https://juan.andres.morenorub.io/text?lang=en' | grep Link
# → Link: <https://juan.andres.morenorub.io/?lang=en>; rel="canonical"

Key principle: The /text endpoint is for consumption (LLMs, terminals), not for discovery (search engines). Search results should always point to the rich HTML version with structured data, icons, and the AI chat agent.

robots.txt AI Bot Rules (`static/robots.txt`)

Explicit permissions for AI crawlers:

Bot	Service	Status
GPTBot	OpenAI/ChatGPT	Allowed
ChatGPT-User	OpenAI	Allowed
ClaudeBot	Anthropic	Allowed
Claude-Web	Anthropic	Allowed
anthropic-ai	Anthropic	Allowed
Google-Extended	Google AI/Gemini	Allowed
PerplexityBot	Perplexity AI	Allowed
cohere-ai	Cohere	Allowed
CCBot	Common Crawl	Allowed
Amazonbot	Amazon/Alexa	Allowed
Applebot	Apple/Siri	Allowed
Copilot	Microsoft	Allowed
YouBot	You.com	Allowed
BraveBot	Brave Search	Allowed

E-E-A-T Signals

The implementation supports Google's E-E-A-T (Experience, Expertise, Authority, Trust) framework:

Experience

Detailed work history with responsibilities
Real project descriptions
Duration and dates for credibility

Expertise

Skills categorized by domain
Technologies listed per job
Certifications and courses

Authority

Links to LinkedIn, GitHub, portfolio
Company associations (SAP, Olympic Broadcasting)
Client count and project metrics in summary

Trust

Canonical URLs prevent duplicate content
HTTPS enforced
Clear contact information
Privacy-respecting analytics (Matomo)

Files Overview

File	Purpose
`templates/partials/layout/head.html`	Meta tags, canonical, hreflang, Google verification
`templates/partials/layout/head-structured-data.html`	JSON-LD schemas (Person, WebSite, etc.)
`static/robots.txt`	Search engine and AI bot directives
`static/llms.txt`	AI crawler information file (llmstxt.org)
`static/sitemap.xml`	XML sitemap for search engines
`data/cv-en.json`	SEO fields (pageTitle, metaTitle, etc.)
`data/cv-es.json`	Spanish SEO fields
`internal/handlers/cv_text.go`	Plain text endpoint with noindex + canonical headers
`templates/cv-text.txt`	Plain text template

SEO Data Model

The SEO-specific fields in data/cv-{lang}.json:

{
  "seo": {
    "pageTitle": "Curriculum Vitae",
    "metaTitle": "Professional CV",
    "metaDescription": "18 years of experience in...",
    "ogDescription": "Senior Technical Consultant...",
    "keywords": "CV, Resume, FullStack Developer, SAP CDC..."
  }
}

Validation & Testing

Schema Validation

Test structured data at:

Expected Schema Count

The site generates 12+ JSON-LD blocks:

1 Person schema
1 WebSite schema
1 BreadcrumbList schema
1 ProfilePage schema
N EducationalOccupationalCredential schemas (1 per education)
N Course schemas (1 per course)

robots.txt Validation

Test at: Google Robots.txt Tester

Best Practices Implemented

Content Structure

Clear H1-H6 heading hierarchy
Semantic HTML5 elements (article, section, nav)
Alt text for images
Descriptive link text

Technical SEO

Mobile-responsive design
Fast page load (bundled CSS, preload fonts)
Canonical URLs
Hreflang for multilingual
Sitemap.xml
robots.txt with AI bot rules

Modern SEO (AI-Era)

llms.txt file
Comprehensive JSON-LD schemas
AI bot permissions in robots.txt
Clear, parseable content structure
AI chat agent (Gemini) for interactive CV queries
Plain text endpoint for LLM consumption (noindex for search engines)
Google Search Console verified and monitored

11 KiB

Raw Blame History

SEO Implementation Guide

Overview

SEO Architecture

1. Traditional SEO Elements

Meta Tags (`templates/index.html`)

International SEO (Hreflang)

2. Structured Data (JSON-LD)

Person Schema (Primary)

WebSite Schema

BreadcrumbList Schema

ProfilePage Schema

EducationalOccupationalCredential Schema

Course Schema

3. AI-Era SEO Optimizations

llms.txt File (`static/llms.txt`)

Plain Text Auto-Detection (`/text` endpoint)

Duplicate Content Prevention (April 2026)

robots.txt AI Bot Rules (`static/robots.txt`)

E-E-A-T Signals

Experience

Expertise

Authority

Trust

Files Overview

SEO Data Model

Validation & Testing

Schema Validation

Expected Schema Count

robots.txt Validation

Best Practices Implemented

Content Structure

Technical SEO

Modern SEO (AI-Era)

Duplicate Content Prevention

Maintenance

When to Update

Monitoring

References

11 KiB Raw Blame History

SEO Implementation Guide

Overview

SEO Architecture

1. Traditional SEO Elements

Meta Tags (templates/index.html)

International SEO (Hreflang)

Social Media Integration

2. Structured Data (JSON-LD)

Person Schema (Primary)

WebSite Schema

BreadcrumbList Schema

ProfilePage Schema

EducationalOccupationalCredential Schema

Course Schema

3. AI-Era SEO Optimizations

llms.txt File (static/llms.txt)

Plain Text Auto-Detection (/text endpoint)

Duplicate Content Prevention (April 2026)

robots.txt AI Bot Rules (static/robots.txt)

E-E-A-T Signals

Experience

Expertise

Authority

Trust

Files Overview

SEO Data Model

Validation & Testing

Schema Validation

Expected Schema Count

robots.txt Validation

Best Practices Implemented

Content Structure

Technical SEO

Modern SEO (AI-Era)

Duplicate Content Prevention

Maintenance

When to Update

Monitoring

References

11 KiB

Raw Blame History

Meta Tags (`templates/index.html`)

llms.txt File (`static/llms.txt`)

Plain Text Auto-Detection (`/text` endpoint)

robots.txt AI Bot Rules (`static/robots.txt`)