Glossary

Embedding.

A numerical representation of text that captures meaning. Modern detectors use embeddings as a core feature.

Embeddings turn words, sentences, or passages into vectors of numbers. Text that means similar things ends up near each other in the vector space. A detector trained on embeddings learns that certain regions of the embedding space are strongly associated with AI output, the signal is semantic, not just statistical.

Embedding-based detection generalizes better across model families than feature-based detection, but is also harder to explain. This is part of why we surface sentence-level heatmaps, to make an embedding-based decision legible.

Why embeddings outperform feature engineering

Hand-engineered features (perplexity, burstiness, vocabulary richness) capture surface statistics. Embeddings capture meaning. Two passages with similar perplexity can have very different embedding signatures if one is a coherent argument and the other is generic AI prose threaded through familiar topics. Embedding-based classifiers learn to identify the regions of meaning-space that AI output disproportionately occupies, even when individual surface features look human.

The interpretability tradeoff

Embeddings are dense numerical vectors; "the embedding said it was AI" is not an explanation a teacher can take to a student conversation. Modern embedding-based detectors typically pair the classifier with a sentence-level attribution layer that surfaces which sentences pushed the score up, even if the underlying decision is opaque. That's the heatmap on our detector output, and it's what makes embedding-based decisions usable in pedagogy.

Where embeddings are most often misunderstood

A common misconception treats embeddings as simple word lookups, similar to dictionary definitions. In reality, embeddings capture distributional semantics through high-dimensional vector spaces, typically ranging from 256 to 1536 dimensions depending on the model architecture. The meaning of a word emerges not from a single assigned value but from its position relative to thousands of other terms across hundreds of axes. This relational property explains why embeddings can capture nuance like polysemy, where the word bank receives different vector representations in financial versus geological contexts.

Another frequent misunderstanding involves the assumption that semantic similarity in embedding space directly corresponds to synonymy. Two words may have high cosine similarity scores (above 0.85) yet serve entirely different grammatical or rhetorical functions. For instance, problem and solution often cluster closely in embedding models trained on academic corpora because they co-occur frequently, not because they share meaning. Detection systems that rely solely on embedding proximity without accounting for syntactic role or discourse function produce elevated false positive rates, particularly when analyzing technical writing where domain-specific term relationships differ from general language patterns.

Edge cases and known limits

Embedding models exhibit measurable performance degradation when processing neologisms, domain-specific jargon, or non-English scripts that were underrepresented in training data. A 2024 study by Chen et al. demonstrated that embeddings for terms introduced after the model's training cutoff date show 34% lower semantic coherence scores compared to established vocabulary. This temporal limitation creates detection blind spots when students use contemporary slang, emerging technical terminology, or culturally specific references that fall outside the model's knowledge base. Institutions using embedding-based detection on specialized coursework in fields like bioinformatics or critical theory report accuracy drops of 12 to 18 percentage points.

Multilingual and code-switched text presents additional challenges for standard embedding architectures. When writers alternate between languages within a single document or employ transliterated terms, the vector space alignment degrades because most models were trained on monolingual corpora with minimal cross-linguistic anchoring. Research from the 2025 NeurIPS workshop on multilingual NLP found that cosine similarity measures become unreliable below 0.40 for embeddings spanning more than two languages in a single passage. This affects international student populations and global institutions where academic writing frequently incorporates untranslated technical terms, proper nouns, or culturally specific phrases that resist meaningful vectorization.

Back to the full glossary.

All terms