Our Methodology

This page explains how Quotewise approaches the hardest problems in quote attribution: finding sources, evaluating evidence, and building trust when AI can generate any quote and attribute it to anyone. Transparency about our methods builds more credibility than claiming perfection.

Christopher Q Alexander February 17, 2026 Updated February 17, 2026

How does semantic search actually work?

Instead of matching exact words, we convert every quote into a mathematical fingerprint of its meaning, what AI researchers call a "vector embedding." Think of it like this: if quotes were songs, keyword search would match notes, but semantic search matches the melody. When you describe what you're thinking ("courage during setbacks"), our system finds quotes that capture that concept, even if they use completely different words.

We use 1024-dimensional embeddings generated by OpenAI's text-embedding-3-large model, stored in PostgreSQL with pgvector for fast similarity search. The difference is we show you real quotes from our curated corpus, not AI-generated text.

Technical note: Search results return quotes with similarity scores. You're seeing retrieval, not generation. Every result came from somewhere real.

How does Quotewise verify who said what?

Quote attribution isn't simple. Quotes evolve through retelling, translation, and cultural adaptation. Someone may have said something similar but not those exact words. A speechwriter may have crafted the phrase, but the speaker made it famous. A quote widely attributed to Einstein might actually be from someone else entirely.

We classify every quote using 8 attribution types, each signaling different levels of certainty.

A direct attribution means a primary source exists: a manuscript, recording, or verified transcript. This is as close to certainty as quote attribution gets. When no primary source has been found but multiple independent sources agree, we call that attributed, which is where most historical quotes fall. Some quotes are popularized by a well-known figure who didn't originate them; in those cases, we maintain both the original attribution and the person who made the quote famous.

Translations and imperfect memory produce paraphrased quotes, where the core meaning survives but the exact words have shifted. When multiple plausible sources offer conflicting evidence, the quote is disputed. Churchill quotes are notorious for this. We show all the evidence and let you evaluate.

Apocryphal quotes are widely attributed to someone but probably aren't authentic, with the true source unknown. "Let them eat cake" attributed to Marie Antoinette is the classic example. Misattributed quotes go further: we know who didn't say it, and we know who did. "Insanity is doing the same thing and expecting different results" is not from Einstein, but millions of people think it is. These stay in our database to correct the record. Finally, traditional covers cultural wisdom, proverbs, and folk sayings without an individual author. No single person said "Don't count your chickens before they hatch." It's collective wisdom.

We're building attribution confidence scoring (0.0-1.0 scale, launching 2026) that weighs multiple factors: earliest documented appearance, number of independent sources, source reputation, temporal consistency, and verifiability. Today we classify quotes into types. Soon every quote will have a numerical confidence score with transparent evidence trails.

Where do the quotes come from?

Our 611K quotes come from diverse sources.

Primary sources include books (with ISBN and page numbers), speeches, interviews, letters, and manuscripts. When we say "page 347 of Abraham Lincoln: The War Years by Carl Sandburg," you can verify it yourself.

We also import from curated platforms like WikiQuote (community-validated), Goodreads (reader collections), and specialty sources. Each source type receives different weight in our verification process. We capture contemporary voices too, from social media insights to podcast conversations, before they become famous. Quotable before the book deal.

We recognize pre-digital reference works (Bartlett's Familiar Quotations 1855, Oxford Dictionary of Quotations 1941) as valuable historical evidence, while respecting copyright for modern editions.

Every quote has a QuoteSightings section showing where we found it: the URL, the book, the platform, the date. This is the evidence layer that separates us from quote sites that hide their sources.

What we don't do: generate quotes with AI, accept submissions without source documentation, or claim every quote is verified. We show you what we know and what we're still researching.

When we discover misattributions, we update records and maintain history. We track source reputation over time. Sources that frequently misattribute lose credibility in our scoring system.

What makes Quotewise different from AI-generated quotes?

ChatGPT and Claude generate text that sounds like quotes, sometimes real, sometimes hallucinated, with no way to tell which. Quotewise finds and verifies existing quotes. Every result comes from our curated corpus with source documentation. That's the core difference: retrieval, not generation.

When you search Quotewise, you're querying a database of 611K quotes with documented sources from 32K originators. When you search ChatGPT, you're generating plausible text from a neural network trained on the internet.

Consider the Ryan Holiday problem. In February 2026, author Ryan Holiday spent hours verifying a Lincoln quote he'd handwritten on a notecard years earlier. ChatGPT first said it was Tolstoy, then claimed it was from Lincoln's secretaries, then said the quote didn't exist. Holiday went page by page through an 800-page Sandburg biography to confirm his notecard was correct. It was a 19th century journalist, easily findable in newspaper databases.

Quotewise exists to solve this. The Sandburg citation should be discoverable in seconds, not hours. Make the verification artifact findable so the next person doesn't repeat the research.

We don't claim algorithmic magic. We show you the QuoteSightings, the sources, the attribution type, and (coming 2026) the confidence score with its component factors. You see the same evidence we see.

The Quotekey Standard: Open Identifiers for Quote Attribution

Every quote in Quotewise has a unique identifier called a Quotekey, a permanent, open standard that works like DOIs for academic papers or Placekeys for physical locations.

Example: Socrates' quote "The definition of terms is the beginning of wisdom" has Quotekey qk-4eqg9-q913-7xyb5

The format encodes both what was said (text hash) and who said it (Wikidata Q-ID for Socrates = Q913). This creates three practical advantages.

First, deduplication across platforms. When the same quote appears on Twitter, Goodreads, and WikiQuote, Quotekey identifies them as the same entity, enabling cross-platform evidence aggregation. Second, permanent citability. Link to /qk/4eqg9-q913-7xyb5/ and it resolves forever. No more broken quote links when sites redesign or disappear. Third, joinable data. Researchers, AI systems, and fact-checkers can reference quotes by Quotekey and access our verification layer via API. The identifier is free and open. Verification depth scales with engagement.

Why open? Inspired by SafeGraph's Placekey standard (500+ enterprises adopted the free identifier, making SafeGraph's paid data more valuable), we're making the identifier open because verification is the hard part, and we're building strong verification tools around it.

Progressive disclosure: free tier shows basic sources (book title, ISBN). Premium tier adds chapter and page numbers. Professional tier provides full citation format and change history. The Quotekey itself is always free, always permanent.

What is Quotewise building next?

We frame our methodology as "Working Draft v0.8," transparent about current state and honest about future direction.

The biggest 2026 milestone is attribution confidence scoring. We're moving from categorical types (Direct, Attributed, Disputed) to numerical scores on a 0.0-1.0 scale. Each score will decompose into source authority (0-30 points, covering primary evidence, reference works, and platform credibility), temporal consistency (0-25 points, for earliest appearance and timeline validation), corroboration (0-25 points, for independent source count), and pattern analysis (0-20 points, for consistency with the author's known corpus).

To prevent gaming, no single evidence type can contribute more than 60% of the score. High confidence requires multiple independent sources. Low-verifiability evidence like private conversations or inaccessible archives receives modifiers, because easily verifiable evidence is harder to fake.

We're also building community verification workflows where contributors submit evidence artifacts (photos of book pages, Archive.org links, database screenshots) that others can independently verify. Reputation systems will track both source reliability and contributor accuracy over time.

Quotes from oral tradition, family sayings, and personal wisdom lack documentary evidence by nature. Rather than rejecting these, we're building systems to acknowledge them with appropriate confidence scores while maintaining verification standards where evidence exists.

As AI inference costs drop through 2026, we're developing automated pipelines for two-source verification, citation chain resolution, and cross-language consistency checking. AI handles bulk processing; humans calibrate and validate.

What we won't do: claim 100% accuracy, hide our limitations, or promise features before they're ready. This page documents where we are today and where we're honestly going.

How to Contribute

See a quote attributed to the wrong person? Flag it. We track misattributions for correction.

Found a quote in a book with page numbers? Submit the citation. We're building contributor reputation systems to reward quality evidence.

Developers can access our verification layer through the API to build quote-checking tools, fact-verification systems, or AI grounding mechanisms. Free tier for experimentation, paid tiers for production scale.

We're seeking partnerships with academic institutions, libraries, and quote researchers (especially Quote Investigator's Garson O'Toole) to establish evidence standards and improve verification methods.

When you share quotes on social media, blogs, or articles, include the Quotekey. This creates a discovery flywheel: we crawl the web for Quotekey patterns, find new sightings, and strengthen our evidence base.