Glossary

Baseline.

Your own writing, over time, the reference point a teacher uses to judge whether a flagged essay really isn't yours.

In academic-integrity conversations, "baseline" means the body of your previous writing: earlier essays in the class, in-class writes, timed exam answers. Detectors measure an essay against statistical patterns of AI output; teachers measure it against your patterns. A flag on a single essay is suggestive; a flag on an essay that reads nothing like your baseline is stronger evidence.

This is why pedagogical guidance (and our own policy templates) recommends establishing a writing baseline early in the term.

How a baseline gets used in a real conversation

A teacher meeting with a student about a flagged essay typically asks: "Walk me through this paragraph, what made you choose this argument?" Students who wrote the essay can answer in their own voice, often using phrasing similar to their previous work. Students who didn't usually can't sustain the explanation past the first specific question. The baseline isn't a forensic tool; it's a comparison reference that makes the conversation productive.

Building a baseline at the start of term

The strongest baselines come from in-class writes during the first two weeks of a course, a writing diagnostic, a short reflection on the syllabus, a hand-written exam answer. These give you authentic samples of each student's voice before any AI temptation is in play. Pedagogical guidance in our policy templates includes a recommended diagnostic to anchor the baseline.

Where this concept is most often misunderstood

The most common mistake users make is treating baseline as a static threshold rather than a contextual reference point. Many assume that if a text scores above baseline, it is definitively AI-generated, when in fact baseline represents the central tendency of human writing under specific conditions. A score of 0.6 in one domain may fall above baseline while the same score in another domain falls below it. The baseline for creative fiction differs substantially from the baseline for technical documentation because human writing patterns vary by genre, purpose, and audience.

Another frequent misunderstanding involves conflating baseline with accuracy. A detection system may establish a highly reliable baseline for its training corpus yet perform poorly on out-of-distribution samples. For example, a baseline calibrated on undergraduate essays written in 2023 may not generalize to professional content, non-native English writing, or texts produced after a major model update. Practitioners must recalibrate baselines when the population or context shifts, yet many institutions continue applying outdated reference points to new writing samples.

Practical implications for institutions and educators

Institutions relying on AI detection tools must establish discipline-specific baselines rather than applying universal cutoffs. A nursing program's baseline for clinical reflection essays will differ from an engineering program's baseline for lab reports because the underlying human writing exhibits distinct lexical density, sentence structure, and coherence patterns. Schools that ignore these differences generate higher false positive rates, penalizing students whose natural writing deviates from an inappropriate reference standard. Effective implementation requires collecting representative samples from each major program and periodically validating that the baseline remains aligned with current student populations.

Educators should communicate baseline methodology transparently when discussing detection results with students. Explaining that a flagged essay scored 0.8 when the baseline for that assignment type is 0.5 provides actionable context, whereas simply labeling it as AI-generated obscures the statistical reasoning. This transparency also helps students understand that revision, collaboration, or use of writing assistance tools may shift their text closer to or further from baseline in ways that do not reflect dishonesty. Institutions that document and share their baseline calibration process build trust and reduce appeals rooted in perceived arbitrariness.

Back to the full glossary.

All terms