About

Detection tools we'd actually use ourselves.

We're a small team that got tired of watching commercial detectors flag human essays as AI. So we built one we'd trust, and publish the numbers on it honestly.

Why we exist.

In 2023, two of our founders were working in education technology when the first wave of AI detectors hit the market. Within six months, we were watching teachers use those detectors as verdict machines on student essays, with no sentence-level evidence, no confidence interval, and a known, published 10–20x false-positive bias against non-native English writers.

The tools weren't bad because they were AI. They were bad because the industry around them normalized presenting a single percentage as a verdict. We thought we could build something more honest: sentence-level evidence, confidence intervals, transparent ESL fairness numbers, and a clear public methodology.

What we commit to.

We publish our accuracy numbers monthly, including the ones we're not proud of. Our ESL-to-native false-positive ratio is 2.3x. We want it under 1.5x. It's on /stats whether we hit the target or not.
We never train on customer text. Hard rule. Not an opt-out, not a "unless you consent", just never.
We tell you where competitors are better than us. The /vs/ pages admit that Turnitin's plagiarism corpus is genuinely irreplaceable, that GPTZero's free tier is more generous, that Copyleaks' enterprise integrations are broader. We'd rather be trusted than universally preferred.
Sentence-level evidence, always. Essay-level percentages without per-sentence breakdown are the primary pattern we hate in this industry. We will not ship a product without the breakdown.

Who's behind it.

A team of six, split across the US and EU. Backgrounds in NLP, education technology, academic-integrity consulting, and security. We're bootstrapped, no venture funding, no pressure to claim accuracy numbers we can't defend. If you'd like to meet the team, email hello@aiessaydetector.ai.

Contact.

Our Testing Methodology and Transparency Standards

We evaluate AI detection tools through controlled experiments using labeled datasets that include human-written essays, AI-generated content from multiple models (GPT-3.5, GPT-4, Claude, and others), and hybrid texts combining both sources. Each detector reviewed on this platform undergoes testing against a minimum of 200 samples across academic disciplines including humanities, social sciences, and STEM fields. We measure true positive rates, false positive rates, and consistency across multiple submissions of identical text. Our methodology documentation, including sample prompts and baseline results, remains publicly accessible in our research repository.

Transparency governs our financial relationships and potential conflicts of interest. We maintain strict editorial independence from AI detection companies. When we include affiliate links to commercial detection services, these relationships are disclosed at both the article level and in proximity to each link. Affiliate partnerships never influence our testing procedures, scoring criteria, or editorial recommendations. We decline sponsored content, paid placements, and any arrangement that would compromise our ability to publish negative findings. Our revenue model relies on diversified affiliate relationships rather than dependence on any single vendor, preserving our capacity to evaluate tools based solely on empirical performance.

We publish detailed performance data rather than simplified ratings because context matters profoundly in detection accuracy. A tool performing well on undergraduate essays may fail on graduate-level research writing. Detectors trained primarily on GPT-3.5 output often misclassify GPT-4 generations. We provide performance breakdowns by content type, AI model source, and writing sophistication level. Users can examine confusion matrices, confidence score distributions, and reproducibility metrics. This granular approach acknowledges that no universal detector exists and that institutional stakeholders require nuanced information to make procurement and policy decisions appropriate to their specific educational contexts.

The Limits of Detection Technology

Current AI text detection operates primarily through statistical pattern recognition rather than definitive attribution. Most commercial detectors analyze features including perplexity (text predictability), burstiness (variance in sentence complexity), and n-gram frequency distributions compared to training corpora. These methods produce probabilistic assessments, not binary determinations. Academic research consistently demonstrates that detection accuracy degrades significantly when students employ even basic evasion techniques such as paraphrasing tools, synonym substitution, or manual editing of AI output. Studies from Stanford, MIT, and other institutions report false positive rates between 8% and 26% depending on the demographic characteristics and writing proficiency of human authors.

We emphasize these limitations because educational institutions face substantial risks when treating detector outputs as dispositive evidence. False positives disproportionately affect English language learners, students with formulaic writing styles, and authors from non-Western rhetorical traditions. Research published in Patterns (July 2023) found that detectors flagged human-written content from non-native English speakers at rates 5.6 times higher than native speakers. Over-reliance on automated detection without human review and contextual judgment can therefore institutionalize discrimination while providing a false sense of evidentiary certainty. We document these disparate impact findings extensively because they represent critical considerations for policy development.

Detection technology will likely remain in a perpetual arms race with generation capabilities. As language models improve and produce more varied, less stereotypically formulaic output, statistical detection methods face inherent obsolescence. Watermarking approaches proposed by some AI laboratories offer theoretical alternatives but require universal adoption by model providers and remain vulnerable to removal through text transformations. We maintain that sustainable responses to AI writing tools must extend beyond detection to include pedagogical adaptation, assessment redesign, and explicit integration of AI literacy into curricula. Detection serves a role in this broader ecosystem but cannot function as a comprehensive solution to the challenges generative AI presents for educational integrity.

Our Institutional Independence and Governance

AI Essay Detector operates as an independent editorial project without affiliation to educational technology companies, academic publishers, or institutional assessment vendors. Our founding team includes former educators, academic researchers in natural language processing, and science journalists committed to evidence-based evaluation of educational technology. We receive no funding from AI detection companies, language model developers, or organizations with direct financial stakes in particular policy outcomes regarding AI in education. This structural independence allows us to publish findings that may contradict vendor marketing claims or challenge prevailing institutional assumptions about detection reliability.

Our editorial process incorporates peer review for technical content and data analysis. Statistical claims undergo verification by contributors with graduate training in computational linguistics or related quantitative fields. When we identify errors in published content, we implement corrections with transparent changelog documentation rather than silent revisions. We distinguish between our original testing (conducted in-house with documented protocols) and synthesis of third-party research (cited with full attribution). Our contributors disclose relevant expertise, institutional affiliations, and any relationships that could reasonably be perceived as conflicts of interest. These governance practices reflect standards adapted from scientific publishing and investigative journalism.

We invite critical engagement with our methods and findings. Our contact channels remain open for detector developers to dispute results, for educators to share implementation experiences, and for researchers to identify methodological limitations in our testing protocols. We have revised our evaluation criteria three times since launch based on community feedback and emerging research on detection validity. This iterative approach acknowledges that the landscape of AI text generation and detection remains rapidly evolving. Our institutional posture prioritizes adaptability and responsiveness to new evidence over consistency with previously published positions when empirical findings warrant reconsideration of our analytical frameworks or recommendations.