What we get right
Our primary strength lies in detection speed and transparent probability scoring. While competitors like Turnitin often gate results behind institutional accounts, we provide sentence-level confidence intervals within 3-5 seconds for documents up to 5,000 words. Our August 2024 internal benchmark against a corpus of 1,200 mixed human-written and GPT-4-generated academic essays showed 87.3% accuracy at the document level, comparable to GPTZero's reported 85-90% range but below Turnitin's claimed 98% for AI-generated content written after their October 2023 model update.
The highlighted-text interface allows educators to see which specific passages triggered detection, rather than a single binary verdict. In user testing conducted in January 2025 with 43 high school teachers, 81% reported that granular highlighting helped them initiate conversations with students about writing process rather than issuing automatic penalties. We also maintain a public changelog detailing which language models our classifier was trained on, currently covering GPT-3.5 through GPT-4, Claude 2 and 3, and Gemini Pro, though we lag 4-6 weeks behind newly released models.
Our false positive rate on technical writing sits at approximately 11%, lower than the 15-19% we measured for three competitors in a February 2025 test using 200 LaTeX-heavy physics papers from arXiv. This appears to stem from training data that included scientific corpora, though we still struggle with non-native English writers, a limitation we address in the next section.