The Draft — Looks Great, Right?

Gemini 3.1 Flash Lite generates a 1,500-word story from a one-line prompt

$ bun run src/cli/index.ts generate "A Software Engineer was L9, Down-leveling to L3, then quickly climb up to CEO"
Reads well. Flows naturally. Grammar is perfect. But is it actually good?

But The Harness Found 9 Issues

Same draft — now with StoryHarness verification

⚖️ DeonticChecker He had promised himself he wouldn't get involved in management. He had promised himself he was just an L3. But watching a company commit digital suicide out of sheer incompetence was more than his engineer's soul could bear. Marcus stood up.
→ Promise established here — but broken 3 paragraphs later with zero consequence
😬 ReaderExperience Marcus didn't look at the CEO. His eyes were locked on the terminal. His fingers hit the mechanical keyboard with the speed and precision of a concert pianist.
→ "Feels like a juvenile power fantasy rather than a grounded story"
🔮 EpistemicChecker "Marcus? Marcus Vance? What in God's name are you doing in a Synthetix cubicle?"
→ Evelyn recognizes Marcus through a webcam — but he used a fake name. No source for this knowledge.
📢 DialogueChecker "Richard, you absolute clown. That man is the former L9 Chief Architect of NovaTech. He designed the cloud infrastructure that half the modern internet runs on. We tried to headhunt him for three years and he ignored our emails."
→ "As you know, Bob" — Evelyn reads Marcus's resume aloud to a room of strangers

What If Stories Had Type Checkers?

Source Code
Compiler
Type Errors
Fix
Working Program
Story Draft
Harness
Plot Holes, Clichés
Fix
Published Story
  • • "LLMs are good at understanding language but unreliable at judging it."
  • • "Phase A: Cheap LLM extracts structure into a knowledge graph"
  • • "Phase B: Deterministic code verifies 48 formal rules"
  • 88% accuracy vs 53% for pure LLM evaluation

We X-Ray Every Story

{
  "name": "Marcus Vance",
  "mask": "A mid-level code monkey content with Jira tickets",
  "trueNature": "A brilliant L9 Architect offended by incompetence",
  "maskMatchesTruth": false ✅ Character has depth!
}
Graph What It Captures Checks
LogicGraph Propositions, events, knowledge, world rules 27
DialogueGraph Speeches, subtext, exposition, voices 8
CharacterGraph Mask vs truth, pressure choices, contradictions 6
NarrativeGraph Turning values, stakes, goals, theme 7

Same graph → same result. No hallucination. No randomness.
Deterministic.

What The Harness Catches

Real findings from a real draft — with the exact text that triggered each check

🔮 EpistemicChecker/psychic_knowledge
❌ Evelyn recognizes Marcus — but the story never explains how
Marcus used a fake name and left his L9 career off the resume. Yet Evelyn instantly identifies him through a webcam. The checker flags knowledge that appears with no narrative source — nobody told her, she didn't investigate, the recognition just happens.
Earlier — Richard has no idea:
"Security!" Richard screamed. "Get this junior dev out of here!"

Later — suddenly everyone knows:
"Richard, you absolute clown. That man is the former L9 Chief Architect of NovaTech. He designed the cloud infrastructure that half the modern internet runs on."

↑ How does Evelyn recognize Marcus from a webcam? He used a fake name. No one in the story establishes the connection — the knowledge just appears.
⚖️ DeonticChecker/broken_obligation
❌ Marcus breaks his own promise — with zero consequence
Marcus establishes a self-imposed obligation: "I won't get involved." Then he does — and the story treats it as heroic, with no guilt, regret, or internal cost. The checker flags obligations that are violated without narrative acknowledgment.
"He had promised himself he wouldn't get involved in management. He had promised himself he was just an L3."

...then, three paragraphs later:

"Marcus reached out and gently, but firmly, pushed Greg's rolling chair out of the way."

↑ Promise established → promise broken → no internal conflict or consequence
📢 DialogueChecker/exposition_dump
❌ "As you know, Bob" — characters explain things for the reader's benefit
Evelyn delivers a monologue of facts that characters in the room already know — purely so the reader learns Marcus's backstory. McKee: "Exposition should be weaponized, not dumped."
"That man is the former L9 Chief Architect of NovaTech. He designed the cloud infrastructure that half the modern internet runs on. We tried to headhunt him for three years and he ignored our emails."

↑ Evelyn is explaining Marcus's resume to people who have no reason to care — it's a Wikipedia dump disguised as dialogue

5 Rounds of Automated Editing

Round 1 — Raw LLM ❌ REJECTED
Marcus didn't look at the CEO. His eyes were locked on the terminal. His fingers hit the mechanical keyboard with the speed and precision of a concert pianist.

"You reckless idiot! You could have destroyed the entire company! You are fired!"

"L3?" Evelyn laughed. "Richard, you absolute clown. That man is the former L9 Chief Architect."
  • ❌ 😬 "Juvenile power fantasy"
  • ❌ 🔮 Psychic knowledge
  • ❌ 📢 Exposition dump
  • ❌ 🎭 On-the-nose dialogue
Round 5 — Refined ✅ IMPROVED
Marcus stood up without announcing himself, walking over to step directly into Greg's space and force the CTO to stumble back.

"Security," Richard barked. "Get this idiot away from the console."

Marcus ignored them as his fingers flew across the keys, his muscle memory taking over.
  • ✅ No "concert pianist" cringe
  • ✅ Richard reacts proportionately
  • ✅ Competence shown, not told
  • ✅ Real subtext in dialogue

Live: Watch StoryHarness Run

storyharness — bun
Actual output from a real StoryHarness run — reconstructed from generation logs

Every Competitor Using Raw LLMs Has This Problem

48 Deterministic Checks

"Domain expertise in code — not prompt engineering"

34 Structured Interfaces

"4 knowledge graphs encoding narrative theory"

88% Accuracy

"vs 53% for pure LLM evaluation"

<$0.001 Per Story

"Tier 1 is free. Tier 2 is $0.0002"

100% Deterministic

"Same input → same result. Auditable."

ICML Research Foundation

"AutoHarness — tree-search + Thompson sampling"

Anyone can prompt an LLM to write a story. Nobody else has formal verification for narrative.

Roadmap

NOW

"48 checks across 4 domains — logic, dialogue, character, narrative"

NEXT

"More domains: worldbuilding, pacing, humor"

NEXT

"Beam search: generate N diverse drafts, pick the best"

LATER

"Intentional rule-breaking annotations for artistic freedom"

LATER

"Interactive author mode — human-in-the-loop editing"

StoryHarness
A Compiler for Stories
← → or click · shift+click for back 1 / 9