# robots.txt for quotewise.io (Authoritative Domain) # # quotewise.io is the canonical domain for all quote content, API access, # and AI/LLM integrations. All other domains canonicalize to quotewise.io. # ============================================ # AI & LLM Crawler Guidelines (IETF AIPREF / Content Signals) # ============================================ # We welcome responsible AI access for quote verification. # # Domain Authority: # quotewise.io = Authoritative domain for AI, API, and professional use # quotosaurus.com = Consumer discovery (canonicalized to quotewise.io) # api.quotewise.io = REST API for integrations # mcp.quotewise.io = MCP server for AI agents # # For structured API access optimized for LLMs: # - llms.txt: https://quotewise.io/llms.txt # - Full docs: https://quotewise.io/llms-full.txt # - REST API: https://api.quotewise.io/docs/ # # Commercial AI training or large-scale harvesting? # Contact: q@quotewise.io # ============================================ # Content Signals (IETF AIPREF draft / Cloudflare Content Signals) # As a condition of accessing this website, you agree to abide by # the following content signals: # (a) If a content-signal = yes, you may collect content for the # corresponding use. # (b) If a content-signal = no, you may not collect content for # the corresponding use. # (c) If absent, the website operator neither grants nor restricts # permission via content signal for that use. # # Categories: # search: building search index, returning hyperlinks/excerpts # ai-input: RAG, grounding, real-time AI answers (NOT training) # ai-train: training or fine-tuning AI models # AI Training Crawlers - Rate Limited, Search/AI-Input OK, Training Restricted User-agent: GPTBot User-agent: ChatGPT-User User-agent: ClaudeBot User-agent: Claude-Web User-agent: anthropic-ai User-agent: PerplexityBot User-agent: Google-Extended User-agent: CCBot User-agent: Bytespider Content-Signal: ai-train=no, search=yes, ai-input=yes Crawl-delay: 5 Allow: / Disallow: /admin/ Disallow: /accounts/ Disallow: /api/ Disallow: /htmx/ Disallow: /collections/private/ # Standard crawlers User-agent: * Content-Signal: ai-train=no, search=yes, ai-input=yes # Allow indexing of main content Allow: / Allow: /o/ Allow: /q/ Allow: /privacy/ Allow: /terms/ Allow: /plans/ Allow: /developers/ # Disallow specific content sections # Disallow: /originators/ - handled with noindex, follow tags header block on template # Disallow: /trending/ Disallow: /tags/ Disallow: /sources/ Disallow: /similar/ Disallow: /q/*/similar/ Disallow: /search/ # Disallow admin, accounts, API endpoints, and other non-user-facing areas Disallow: /admin/ Disallow: /accounts/ Disallow: /api/ Disallow: /oembed/ Disallow: /htmx/ Disallow: /qrawler/ Disallow: /invitations/ Disallow: /moderator/ Disallow: /collections/private/ Disallow: /404/ # Block search params (consistent with /search/ being blocked) Disallow: /*?q=* Disallow: /*?search_text=* Disallow: /*?search_type=* # Block sort/ordering (same content, different presentation) Disallow: /*?sort=* Disallow: /*?ordering=* Disallow: /*?method=* # Block UI preferences and limits Disallow: /*?filter=* Disallow: /*?paginate_by=* Disallow: /*?limit=* # Block auth/session parameters Disallow: /*?next=* Disallow: /*?auth=* Disallow: /*?new=* Disallow: /*?username=* Disallow: /*?current=* Disallow: /*?collection_slug=* # Pagination (?page=) is NOT blocked - Google needs to crawl these # Letter filter (?letter=) is NOT blocked - unique content subsets # Sitemap location (authoritative domain) Sitemap: https://quotewise.io/sitemap.xml