Matched passages
Every sentence with overlap, colored by match confidence, linked to source URLs.
Plagiarism Checker · 47B pages indexed
Cross-reference an essay against the open web, academic preprints, and a 2.8M-essay corpus. See exactly which sentences match, with sources and percentage overlap.
Check for plagiarism → How it works
A decade ago, catching academic dishonesty meant checking for copied Wikipedia passages. Today it means checking for three things simultaneously: copied sources, AI-generated prose, and paraphrased but uncited passages (the trickiest category). Our checker handles all three, but you should know which one you're dealing with, because the conversation with the student is different in each case.
Direct copying is the easiest case. The student lifted a paragraph from a published source without attribution. Our scan catches it; the source-attribution report shows where it came from; the conversation with the student is brief. AI-generated prose is harder because there's no source document to point at, only a statistical signature. That's what the AI Detector tab handles. Paraphrased-but-uncited is the most difficult: the student read a source, rewrote the ideas in their own words, and didn't cite. Our paraphrase detector identifies semantic similarity even when the surface wording differs, but it cannot tell you whether the missing citation was an oversight or a deliberate omission. That's a conversation, not a verdict.
Our open-web index spans 47 billion pages refreshed daily, including news, blogs, course-material repositories (the open ones), and freely accessible academic content. Our preprint coverage includes arXiv, SSRN, ResearchGate's open portion, and the major OAI-PMH archives. The opted-in student-essay corpus is 2.8 million submissions from a consortium of cooperating institutions, growing roughly 8% per year. We index Wikipedia in full, Project Gutenberg, and the public-domain Internet Archive scholarly collections.
What we do not index: Turnitin's paywalled student-essay database (no vendor outside Turnitin can access it), proprietary academic databases behind paywalls (JSTOR, Wiley, Springer), and commercial textbooks. For an institution that needs paywalled-corpus matching, the typical setup is Turnitin for that workflow and us for everything else, both LMS plugins run alongside each other.
Citation handling matters more than people realize. Block quotes (5+ lines) are recognized and excluded from the overall similarity score by default, since they're explicitly attributed. In-line quoted passages with proper citation markers are also excluded. The score you see represents unattributed overlap, the part that requires a conversation. Bibliography entries, reference lists, and works-cited pages are excluded entirely; matching them is noise. The full citation-handling logic is documented on the methodology page.
Every sentence with overlap, colored by match confidence, linked to source URLs.
Percentage of the submitted text that matches any indexed source.
We identify semantically similar passages even when the wording is rewritten.
Properly cited block quotes are marked and excluded from the overall score.
Top 10 most-matched sources with overlap percentages, for quick triage.
Optionally scan an essay against a cohort of classmates' submissions for collusion patterns.
Index coverage