Files
cv-site/static/robots.txt
T
juanatsap ae430e6ea7 feat: Implement comprehensive AI-era SEO optimizations
- Add llms.txt file for AI crawlers (llmstxt.org standard)
- Enhance robots.txt with 15+ AI bot rules (GPTBot, ClaudeBot, etc.)
- Expand JSON-LD structured data from 1 to 12+ schema blocks:
  - Person (enhanced with occupations, languages, employers)
  - WebSite, BreadcrumbList, ProfilePage
  - EducationalOccupationalCredential (dynamic per education)
  - Course (dynamic per certification)
- Create doc/15-SEO.md with comprehensive SEO documentation
- Update MODERN-WEB-TECHNIQUES.md with SEO section (techniques 11-13)

Based on WPBeginner 2025 SEO recommendations for AI Overviews,
structured data, and E-E-A-T signals.
2025-11-30 13:23:22 +00:00

117 lines
2.4 KiB
Plaintext

# robots.txt for juan.andres.morenorub.io
# Last Updated: 2025-11-30
# =============================================================================
# DEFAULT RULES - Allow all search engines
# =============================================================================
User-agent: *
Allow: /
# Disallow admin/internal paths
Disallow: /admin/
Disallow: /api/internal/
Disallow: /.git/
Disallow: /.env
# =============================================================================
# SITEMAPS & AI CONTENT
# =============================================================================
Sitemap: https://juan.andres.morenorub.io/static/sitemap.xml
# LLMs.txt for AI crawlers (standard: https://llmstxt.org/)
# Location: https://juan.andres.morenorub.io/static/llms.txt
# =============================================================================
# TRADITIONAL SEARCH ENGINES
# =============================================================================
User-agent: Googlebot
Allow: /
User-agent: Bingbot
Allow: /
User-agent: Slurp
Allow: /
User-agent: DuckDuckBot
Allow: /
User-agent: Baiduspider
Allow: /
User-agent: YandexBot
Allow: /
# =============================================================================
# AI CRAWLERS & LLM BOTS - Explicitly allowed
# =============================================================================
# OpenAI - ChatGPT, GPT-4
User-agent: GPTBot
Allow: /
User-agent: ChatGPT-User
Allow: /
# Anthropic - Claude
User-agent: ClaudeBot
Allow: /
User-agent: Claude-Web
Allow: /
User-agent: anthropic-ai
Allow: /
# Google AI - Bard, Gemini
User-agent: Google-Extended
Allow: /
# Meta AI
User-agent: FacebookBot
Allow: /
User-agent: Meta-ExternalAgent
Allow: /
User-agent: meta-externalagent
Allow: /
# Perplexity AI
User-agent: PerplexityBot
Allow: /
# Cohere AI
User-agent: cohere-ai
Allow: /
# Common Crawl (used by many AI models)
User-agent: CCBot
Allow: /
# Amazon/Alexa
User-agent: Amazonbot
Allow: /
# Apple - Applebot (for Siri, Spotlight)
User-agent: Applebot
Allow: /
# Microsoft Copilot
User-agent: Copilot
Allow: /
# You.com AI
User-agent: YouBot
Allow: /
# Brave Search
User-agent: BraveBot
Allow: /
# =============================================================================
# CRAWL RATE LIMITS (Optional)
# =============================================================================
# Uncomment if needed to prevent server overload
# Crawl-delay: 1