AI DetectionApril 2026

We Ran Our AI Fiction Through Grammarly's #1-Ranked AI Detector. It Scored 4%.

Raw ChatGPT fiction scores 80-95% AI-detected on the same test. Raw Claude scores around 50-65%. We scored 4%. Not after manual editing. Not after running it through a humaniser tool. Not after a professional writer rewrote it. This is raw output from our engine, pasted straight into Grammarly's AI detector — the tool that ranks #1 on RAID's independent benchmark for AI detection quality.

4%
AI-detected by Grammarly (ranked #1 on RAID benchmark)
96% of the text showed no AI patterns at all.
80-95%
Raw ChatGPT
50-65%
Raw Claude
~30%
Lightly edited AI
4%
Ghostproof

The Problem Every AI Writing Tool Ignores

AI-generated fiction has a fingerprint. Not one pattern — dozens of them, layered on top of each other, repeating across every paragraph.

We know because we spent six weeks cataloguing them. We had an experienced editor review AI-generated chapters across multiple genres and catalogue every pattern that flagged the text as machine-written. The reviewer didn't know which tool produced each sample. They just read and marked what they found.

Here's what appeared in raw AI output:

Show-then-tell. The AI writes a physical reaction ("Her hands trembled"), then immediately explains the emotion it just showed ("The fear was overwhelming"). Human writers trust the image. AI writers explain it. This single pattern appeared 30+ times across four chapters of a single story.

The same metaphor on repeat. A mysterious object would "pulse," "warm," "glow," or "burn" every time the protagonist felt something. One story had 25 of these reactions across four chapters. Readers notice by the third one.

Every character sounds identical. The gruff mercenary speaks with the same vocabulary and rhythm as the aristocratic villain. The teenager uses the same sentence length as the elderly mentor. Every NPC sounds like the same polite, articulate narrator wearing different costumes.

Convenient NPCs. Characters appear and immediately explain everything the protagonist needs to know. No resistance, no withholding, no agenda of their own. A stranger arrives and delivers a three-paragraph monologue about the entire power structure of the world.

Template endings. "They left. The drive was quiet." This exact sentence appeared in a fantasy forest, on a European train, and in a hospital corridor. Different stories, different genres, same AI default.

Perfectly consistent quality. Every paragraph written at exactly the same level — polished, competent, and uniform. Human writing doesn't work that way. Human writing has brilliant paragraphs next to mediocre ones, sharp sentences next to clunky ones. Perfect consistency is itself an AI tell.

These patterns are invisible to the writer but obvious to readers, editors, and AI detectors. They're the reason raw AI fiction gets flagged, and they're the reason it doesn't sell.

What We Built to Fix It

Ghostproof doesn't produce fiction by asking an AI model to "write a chapter." It runs every piece of generated text through an editorial engine with over 250 rules, quality gates, and automatic corrections — the same kind of editorial intelligence that a professional editor applies, but faster and more consistent.

Here's what the engine does, and why each piece matters for detection:

Show-then-tell elimination. Every paragraph is scanned for physical reactions followed by emotion words. "Her hands trembled. The fear was overwhelming" gets cut to "Her hands trembled." The AI is given concrete before-and-after examples in its instructions, and a client-side scanner catches anything that gets through. The result: our reviewer found 2-3 borderline instances across two full chapters, down from 30+ in raw output. That's a 90% reduction.

Voice DNA per character. Each character gets a distinct voice profile: sentence length, vocabulary register, verbal tics, rhythm, emotional expression style. The result: our reviewer noted that five different characters in two chapters all sounded different from each other — "Ruby's 'Emergency caffeine delivery' establishes her personality in four words."

NPC friction rules. No character is allowed to deliver more than one major piece of information per scene. New characters must have partial knowledge, conflicting agendas, or reasons to withhold. The result: our reviewer said "Every character has a reason to withhold, and every interaction requires the protagonist to push."

Deliberate imperfection. This is the counterintuitive one. Most AI tools aim for consistently polished output. We do the opposite. Our engine deliberately introduces the kind of variance that human writing naturally has — a brilliant paragraph next to a functional one, a rough transition between scenes, a flat sentence that a creative writing teacher might flag, a metaphor that reaches slightly too far. Because when every paragraph is equally good, the uniformity itself becomes the tell. AI detectors look for statistical consistency in burstiness and perplexity. Human writers are inconsistent. So we engineered inconsistency.

ICK word filtering. A curated list of words and phrases that AI defaults to but humans rarely use in fiction. "Orbs" for eyes, "electricity coursed through," "a dance of," "the weight of," "something adjacent to." These are auto-removed or flagged before the text reaches the writer.

Template ending removal. "They left. The drive was quiet." and its variants are caught by pattern matching and stripped automatically. They never appear in the output.

Quality gates. Client-side pattern counting that runs on every chapter: emotion-marker repetition, tricolon overuse, perception filter frequency, monologue length. The engine measures its own output and flags problems before the writer sees them.

None of these are manual steps. They run automatically on every chapter. The writer sees the finished output, not the editorial process.

The Detection Test

We took the first chapter of a Gothic mystery generated by the Ghostproof engine. No editing. No rewriting. No humaniser tools. Raw output, copied and pasted into Grammarly's AI detector.

Result: 4% AI-detected. 96% human.

For context, we ran equivalent fiction through the same detector:

Raw ChatGPT fiction80-95%
Raw Claude fiction50-65%
Lightly edited AI text (five minutes of manual changes)~30%
Ghostproof raw output4%

The Grammarly detector is ranked #1 on the RAID independent benchmark for AI detection quality. It analyses perplexity, burstiness, and statistical patterns across every sentence. It's designed to catch exactly the kind of content we're producing.

It couldn't tell.

What This Means for Authors

If you're using AI to help write fiction, detection is one of your biggest concerns — whether you're submitting to agents, publishing on Amazon KDP, or sharing with readers who care about authenticity.

The 4% score means that Ghostproof output falls within the range where human-written text naturally sits. AI detectors produce false positives on human writing at rates of 15-34% depending on the tool and the writing style. Our output is flagged less often than most human academic writing.

This doesn't mean AI detection doesn't matter. It means that the gap between "obviously AI" and "indistinguishable from human" is an engineering problem, not an impossible one. With the right editorial intelligence applied to every paragraph, AI fiction can reach the quality threshold where it needs the same editing a human first draft needs — not a fundamentally different kind of editing.

That's the threshold Ghostproof crossed this week.

Try It Yourself

Ghostproof is live at ghostproof.uk. Generate your first chapter free — no credit card, no signup wall.
Then run it through any AI detector you like. We're confident in the result.

Start Writing Free
Ghostproof is a UK-based AI book production engine. The editorial engine applies 250+ rules to every chapter, producing fiction that passes editorial review and AI detection simultaneously. Detection results may vary by genre, chapter length, and detector tool. We encourage authors to run their own tests.