We tried using AI to detect AI-generated code.
It didn't work.
Here's what it actually told us: → "This code is too perfect for a normal person" → "The folder structure is unusually clean" → "A typical developer wouldn't write this well"
That's not detection. That's bias. A good developer would get flagged. A clever cheater wouldn't.
So we scrapped it completely.
We went back to basics.
We started reading hundreds of AI-generated submissions manually. Patterns showed up fast.
→ Overly formal comments explaining obvious things → Massive block comments on every function → Repetitive code a human would just abstract away → Params style blocks on every single function → Comment style that reads like documentation for strangers
Humans comment like they're leaving notes for themselves. AI comments like it's writing a textbook.
The new detection layer
We rebuilt the detection layer from scratch.
No LLM call. Just regex pattern matching across the entire codebase. When something unusual is detected we surface the complete code block to the reviewer. Not a verdict. Not a score. Just "look at this, you decide."
The missing signal: Git commits
Then we added one more signal: Git commits.
AI-assisted commits are huge. "Implemented full authentication system with JWT, refresh tokens, middleware and role-based access control" in one commit.
Human commits are small and honest:
- "fix login bug"
- "forgot to add env"
- "wip"
The result
→ More accurate than our LLM approach → One less API call on every single assessment → Faster, cheaper, and more honest
Sometimes the smartest solution is the simplest one.