New: Institutional Licensing, deploy across your district or college. Read the framework →
A aiessaydetector.ai

How-to · 6 steps · ~7 minutes

How to check an essay for AI.

A 6-step process that uses detector output as one of several pieces of evidence, not the verdict.

Published 2026-02-18 · Updated 2026-04-14 · Editorial Team

Tools alone don't answer the question. This is the workflow teachers we talk to say works, it combines detector evidence with writing-process signals and a short student conversation.

Step-by-step

  1. 1

    Run the essay through a detector with sentence-level highlighting.

    Don't just look at the essay-level percentage. Look at which sentences are flagged. Three specific paragraphs being flagged is actionable evidence; a '68% overall' number with no breakdown is not.

  2. 2

    Run it through a second detector.

    If the two detectors disagree by more than 30 percentage points, neither score is reliable. Treat the essay as 'unclear' and move to the process-evidence steps.

  3. 3

    Ask the student for their draft history.

    Google Docs version history or Word 'track changes' are both strong evidence. A single final .docx with no edit history isn't damning, but it removes one of the stronger authenticity signals.

  4. 4

    Compare the writing voice to a known baseline.

    If you've seen the student's in-class writing, a timed essay, a short response, compare sentence rhythm, vocabulary, and argument style. Voice changes are hard to fake across a full essay.

  5. 5

    Have a 2-minute conversation about the thesis.

    Ask the student to explain the argument of their essay in plain language. Students who wrote their essay can usually do this easily; students who didn't tend to stumble over specific claims.

  6. 6

    Make a decision using all four lines of evidence.

    If detector + voice baseline + conversation + missing draft history all point the same direction, you have a case. If only the detector is flagging, you have a prompt for a longer conversation, not a finding.

Frequently asked questions

How long should this process take per essay?
If everything looks normal, about 30 seconds, detector pass, no flags, move on. If the essay is flagged, budget 15 minutes for the follow-up: detector cross-check, draft history review, student conversation.
What if the student refuses to share their draft history?
Refusing isn't evidence of cheating on its own, but it removes one of the stronger authenticity signals. A school's academic-integrity policy typically allows requiring process artifacts as part of an investigation.

Why a single detector score isn't enough.

The temptation, especially when you have eighty essays to grade by Friday, is to paste each one into a detector, look at the percentage, and act on it. That workflow gets you in trouble for two reasons.

First, every detector has a false-positive rate. On well-controlled academic corpora, the best detectors land around 3% FPR at meaningful confidence thresholds. In a class of 80 students, that's roughly two essays falsely flagged per assignment, every assignment, all year. If your only evidence is "the detector said so," you'll spend the year either accusing innocent students or losing the trust of the ones you don't accuse but should have.

Second, a single percentage hides the structure of the evidence. An essay that's 80% AI in the first three paragraphs and 20% AI everywhere else tells a very different story than an essay that's 50% AI throughout. The first looks like a student who got stuck on the introduction and used a model to draft it. The second looks like a hybrid where the student used AI as a writing partner. Both are policy questions, but they're different policy questions, and the page-level number flattens them.

The six-step process above is built on a simple idea: detector output is one signal among several. Combine it with the writing-process signals (draft history, voice baseline) and a brief conversational check, and you'll be right far more often, with far less risk of false accusation.

What 'evidence' actually looks like in an integrity hearing.

If your check escalates to an academic-integrity panel, the conversation isn't going to be about a percentage. It's going to be about whether your evidence package would convince a skeptical third party. Three things matter:

  • The detector report itself, signed and exportable. A screenshot of a detector page is weak evidence. A timestamped PDF report from a detector that publishes its methodology is strong evidence. Our methodology page exists exactly so panel chairs can audit the basis of any score we publish.
  • A second independent detector run. Cross-tool agreement is the single strongest signal you can show. If two detectors built on different model architectures both flag the same paragraphs, that's persuasive. If one flags 90% and another flags 12%, you don't have a case yet.
  • Process evidence from the student's writing environment. Google Docs version history, Word's tracked changes, browser activity logs (if your school's LMS captures them), or even just the student's class notebook. Process artifacts answer a different question than detector output: not "does this look like AI?" but "did this student plausibly write this?"

Notice that none of those three is the detector's percentage by itself. The percentage gets you to the conversation; the package gets you through the panel.

When to skip steps 3 through 5.

The full six-step process is for ambiguous cases, not for every essay. If the detector returns a clean low score and nothing in the writing voice surprises you, you're done at step one. The 30-second pass is meant to be 30 seconds. The follow-up steps exist for the 5–10% of essays that don't return a clear answer.

The question to ask after step two is: do I need to be sure? If the stakes are low (a daily homework check, a draft submission), you can stop after detector cross-check. If the stakes are high (a final essay, a personal statement, a graded thesis chapter), keep going through process evidence and the conversation. The point of the workflow isn't bureaucracy. It's matching evidence to consequence.