Reference

Glossary of AI-detection terms.

Plain-language definitions for the vocabulary you'll see on our reports, in our methodology, and across the field. We link to source material where relevant.

A

Accuracy

The share of predictions a detector gets right. Alone, it hides bias, pair with precision and recall.

Adversarial evasion

Tactics used to make AI-generated text score lower, paraphrase tools, humanizers, character substitution. Most don't survive a generation of detector updates.

AUC (area under the ROC curve)

A single number from 0 to 1 that summarizes how well a detector separates AI from human across all thresholds.

B

Baseline

Your own writing, over time, the reference point a teacher uses to judge whether a flagged essay really isn't yours.

Burstiness

How much sentence length and complexity vary across a passage. Human writing is usually more bursty than AI writing.

C

Classifier

The machine-learning model that takes text features (perplexity, burstiness, embeddings) and returns an AI-likelihood score.

D

Draft history

The version history of your document. Google Docs version history, Word track-changes, Git commits.

E

Embedding

A numerical representation of text that captures meaning. Modern detectors use embeddings as a core feature.

F

F1 score

The harmonic mean of precision and recall, a single number that balances both.

False negative

AI-generated text that the detector missed.

False positive

Human-written text that the detector flagged as AI. The most damaging error in academic-integrity contexts.

False positive rate

The percentage of human-written texts that the detector wrongly flags. Published as a core vendor-accountability metric.

Feature

A measurable property of text that feeds into the classifier. Perplexity and burstiness are classical features.

Fine-tuning

Training a pre-existing model on a specific dataset to specialize it. Detectors are often fine-tuned on AI-vs-human text pairs.

H

Hallucination

When an AI model confidently invents something, a false citation, a fake quote, a wrong fact. Not the same as detection error.

Humanizer

A tool that rewrites AI-generated text to evade detection. Not what we do, for reasons explained on /humanizer-policy.

Hybrid draft

A document that contains both AI-generated and human-written sections. The most common real-world case, and the hardest to score.

L

LLM (large language model)

The kind of model that produces AI writing. GPT, Claude, Gemini, Llama, and others.

M

Model family

A group of related LLM versions that share training lineage. GPT-3.5 and GPT-4 are one family; Claude 3 and Claude 4 are another.

P

Perplexity

How surprising a passage is, word-by-word, to a reference language model. Low perplexity is an AI signal.

Precision

When the detector flags an essay, how often is the flag right?

R

Recall

Of the AI-generated essays in a batch, how many did the detector catch?

S

Sentence-level scoring

Reporting an AI-likelihood for each sentence, not just one score for the whole essay.

T

Threshold

The score cutoff above which a detector flags a passage as AI. Movable, with tradeoffs.

Training data

The dataset a detector learned from. Determines what it generalizes to, and where it fails.

W

Watermarking

A pattern embedded in AI output that later detection can recognize. Promising in theory, rare in practice.

Z

Zero-shot

A detection approach that works without being trained on examples of the specific target model. Less accurate but more robust to new LLMs.

See these concepts in practice.

Run a sample essay through the detector to see perplexity, burstiness, and sentence-level scoring in action.

Open the detector