Plagiarism Checker · 47B pages indexed

Find matched passages in three seconds.

Cross-reference an essay against the open web, academic preprints, and a 2.8M-essay corpus. See exactly which sentences match, with sources and percentage overlap.

Check for plagiarism → How it works

Web pages indexed: 47B
Student essays (opted-in): 2.8M
Median scan time: 2.1s

v2.4 · Free · No signup

0 / 5,000 characters 🔒 Processed in-memory, then deleted · ⌘/Ctrl+Enter to scan

Plagiarism and AI, two different problems.

A decade ago, catching academic dishonesty meant checking for copied Wikipedia passages. Today it means checking for three things simultaneously: copied sources, AI-generated prose, and paraphrased but uncited passages (the trickiest category). Our checker handles all three, but you should know which one you're dealing with, because the conversation with the student is different in each case.

Direct copying is the easiest case. The student lifted a paragraph from a published source without attribution. Our scan catches it; the source-attribution report shows where it came from; the conversation with the student is brief. AI-generated prose is harder because there's no source document to point at, only a statistical signature. That's what the AI Detector tab handles. Paraphrased-but-uncited is the most difficult: the student read a source, rewrote the ideas in their own words, and didn't cite. Our paraphrase detector identifies semantic similarity even when the surface wording differs, but it cannot tell you whether the missing citation was an oversight or a deliberate omission. That's a conversation, not a verdict.

What our index covers, and what it doesn't.

Our open-web index spans 47 billion pages refreshed daily, including news, blogs, course-material repositories (the open ones), and freely accessible academic content. Our preprint coverage includes arXiv, SSRN, ResearchGate's open portion, and the major OAI-PMH archives. The opted-in student-essay corpus is 2.8 million submissions from a consortium of cooperating institutions, growing roughly 8% per year. We index Wikipedia in full, Project Gutenberg, and the public-domain Internet Archive scholarly collections.

What we do not index: Turnitin's paywalled student-essay database (no vendor outside Turnitin can access it), proprietary academic databases behind paywalls (JSTOR, Wiley, Springer), and commercial textbooks. For an institution that needs paywalled-corpus matching, the typical setup is Turnitin for that workflow and us for everything else, both LMS plugins run alongside each other.

Citation handling matters more than people realize. Block quotes (5+ lines) are recognized and excluded from the overall similarity score by default, since they're explicitly attributed. In-line quoted passages with proper citation markers are also excluded. The score you see represents unattributed overlap, the part that requires a conversation. Bibliography entries, reference lists, and works-cited pages are excluded entirely; matching them is noise. The full citation-handling logic is documented on the methodology page.

What the plagiarism scan returns.

Matched passages

Every sentence with overlap, colored by match confidence, linked to source URLs.

Overall similarity

Percentage of the submitted text that matches any indexed source.

Paraphrase detection

We identify semantically similar passages even when the wording is rewritten.

Citation-aware

Properly cited block quotes are marked and excluded from the overall score.

Source ranking

Top 10 most-matched sources with overlap percentages, for quick triage.

Cross-essay matching

Optionally scan an essay against a cohort of classmates' submissions for collusion patterns.

Index coverage

What's actually in our match index.

47B

Web pages

Open web + Common Crawl, refreshed weekly.

2.8M

Academic essays

Public preprints, ProQuest abstracts, OA repositories.

180

Languages

Cross-language paraphrase matching for translated reuse.

94%

Citation recall

Properly quoted passages auto-excluded from score.

Frequently asked questions

What databases do you check against?

Open web (47B pages indexed), Common Crawl, major academic preprint servers (arXiv, SSRN, ResearchGate open), a curated database of 2.8M student essays (opted-in), and the public portions of Wikipedia, Project Gutenberg, and course-material repositories. We do not scan Turnitin's paywalled corpus.

How is this different from Turnitin?

Turnitin has a bigger paywalled student-essay database built over 20 years. For matching against that specific corpus, Turnitin wins. For open-web and AI-generated content, we win on recency and speed, our index refreshes daily; Turnitin's refreshes less frequently. Many institutions use both.

Does the plagiarism check also flag AI-written text?

No, these are separate checks. AI-generated text isn't 'plagiarized' in the traditional sense (there's no source document to match). Use the AI Detector tab for AI detection; use this tab for source-matching. Both run on the same submission.

What counts as plagiarism vs. acceptable citation?

We flag passages with substantial lexical overlap. Whether that's plagiarism depends on your institution's policy, properly cited quotations are usually fine; unattributed paraphrasing is usually not. We surface the matches; you make the call.

Can I export a matched-source report?

Yes. Signed accounts get a PDF report with every matched passage, the source URL, percentage overlap, and citation suggestions.

Scan an essay for plagiarism.

Start a scan →