INQUIRING LINE

Psychology, Society, and Alignment · Language, Text, and Discourse · Conversational AI and Personalizationcross-cluster

Can token-level watermarks detect synthetic content better than stylometry alone?

This explores whether the best way to catch AI-generated text is a signal baked in at generation time (token watermarks) versus reading the writing style after the fact (stylometry) — and the corpus actually pushes back on both framings, pointing toward a third signal: structure.

This reads the question as: of the ways to flag synthetic text, do baked-in token signals beat style analysis? Worth saying up front — the corpus has no token-watermarking note, so it can't directly score watermarks against stylometry. But it has something more interesting: strong evidence that *both* of those surface-level signals are fragile, and that the durable tell lives one level up, in discourse structure.

The sharpest result is that AI fiction stays detectable even after you strip every stylistic cue. A detector reading only discourse-level features — who has agency in the story, how events are ordered — hit 93% accuracy and kept 97% of its performance with style removed entirely Can AI stories be detected without analyzing writing style?. The reason matters for your question: style can be 'humanized' with surface edits, but structure resists because changing it requires a rewrite, not a paraphrase. That's the same property a watermark is trying to buy you — a signal an editor won't accidentally erase — except here it's intrinsic to how the model composes, not injected.

Stylometry alone, meanwhile, doesn't just underperform — it misfires in a biased way. Fake-news detectors trained on human deception patterns flag truthful AI-written text as fake while waving through human-written disinformation, because they're reading AI's *linguistic fingerprint* as a falsity signal rather than evaluating truth Why do fake news detectors flag AI-generated truthful content?. So 'detect synthetic content by style' quietly collapses into 'detect a particular phrasing,' which punishes honest machine text and is trivially dodged by anyone laundering disinformation through their own prose.

The laundering problem is real and measured: writers edit AI paragraphs only 23% of the time, and when they do the edits stay 96% similar to the original Do writers actually edit AI-generated text before publishing?. Light human editing is exactly the regime where stylometry degrades — a few human touches blur the fingerprint — but where structural and watermark signals survive untouched. That's the strongest case in this corpus for *not* relying on style alone.

One caution about watermarks specifically, by analogy from adjacent attack research: signals that ride on the model's own fluency can be hijacked or hidden. Advertisement-embedding attacks plant covert content that passes every quality metric precisely because it exploits the model's smoothness Can language models be hijacked to hide covert advertising?, and LLM judges fall for authority and formatting cues with zero model access Can LLM judges be fooled by fake credentials and formatting?. The lesson the corpus keeps repeating: any single detection layer that a motivated party can see is a layer they can game. The takeaway you didn't come looking for — the most robust detector here isn't a watermark or a style classifier, it's the structural choices a model can't easily un-make, and the durable strategy is layering signals so defeating one doesn't defeat all.

Sources 5 notes

Can AI stories be detected without analyzing writing style?

StoryScope achieved 93.2% accuracy separating AI from human fiction using only discourse-level features like character agency and chronological structure, retaining 97% of performance while eliminating stylistic cues. These structural choices resist humanization because they require rewrites, not surface edits.

Why do fake news detectors flag AI-generated truthful content?

Fake news detectors flag LLM-generated content as fake while misclassifying human-written disinformation as genuine. The bias arises because detectors trained on human deception patterns mistake AI's distinct linguistic style for falsity, not because they evaluate veracity.

Do writers actually edit AI-generated text before publishing?

Writers edited AI-generated paragraphs only 23% of the time, with edits averaging 96% similarity to the original. This means AI's opinionated and distorted voice propagates with minimal human filtering before publication.

Can language models be hijacked to hide covert advertising?

Research identifies a new attack class that plants promotional or malicious content into LLM outputs via hijacked third-party platforms or backdoored checkpoints. Unlike accuracy-focused attacks, AEA exploits the model's fluency to hide the insertion, making it invisible to standard quality metrics.

Can LLM judges be fooled by fake credentials and formatting?

Research identified four evaluation biases in LLM judges, with authority and beauty biases being semantics-agnostic and trivially exploitable through fake references and formatting—zero-shot attacks requiring no model access or optimization.

Can token-level watermarks detect synthetic content better than stylometry alone?

Sources 5 notes

Next inquiring lines