CodeVerdict is an AI-powered platform that reviews developer take-home coding assignments in under a minute. You drop the assignment brief and a GitHub URL (or zip); CodeVerdict maps every requirement to the code, detects AI-written shortcuts, runs the project in a sandbox, and produces a scored report with tailored interview questions.

How long does it take to grade a take-home assignment?

Most assignments are reviewed in 60 seconds or less. The agent reads the brief, clones the repo, runs setup / tests / execution in an E2B sandbox, and writes the report in parallel. Larger repos with heavy dependencies may take up to two minutes.

What can I submit — GitHub repos, zip files, or both?

Both. You can paste one or more GitHub repository URLs (public or private with a GitHub token) or upload a zip file of the candidate's solution. CodeVerdict handles each the same way: clone, install, run, evaluate.

Do I need an account to try it?

Yes — a free CodeVerdict account is required. Sign up takes about 10 seconds via Google one-click. Accounts let your assessments sync across devices and your team see the same dashboard.

How does CodeVerdict detect AI-written code?

It combines three signals: token-level perplexity (LLM-written code has unusually uniform probability distributions), naming and structural entropy (AI tends to use overly consistent patterns), and commit-history analysis (sudden large commits with no incremental work). The output is a calibrated 0–100 AI score, not a black-box verdict.

Is candidate code safe to run? Where does it execute?

Yes. Every repository runs inside an isolated E2B sandbox — a single-use virtual machine that is destroyed immediately after the report is generated. The candidate's code never touches your machine or CodeVerdict's production servers.

Which programming languages and stacks are supported?

CodeVerdict supports Node / TypeScript, Python, Go, static front-ends, and any Docker-based project out of the box. The agent detects the stack automatically from project hints (package.json, requirements.txt, go.mod, Dockerfile, etc.) and picks the right setup path.

Can I customise how candidates are scored?

Yes. The displayed score is computed client-side from your weightings — requirements met, code quality, test coverage, security, AI-written code. Adjust the weights in Settings and the verdict (Strong hire / Hire / No hire) updates instantly across every submission.

How to detect AI-written code in technical interviews and take-home assignments

In 2024, asking "did the candidate use AI?" was a philosophical question. In 2026, it's an operational one. Most engineering managers we talk to have stopped asking whether candidates are using AI-assisted code generation and started asking how much and whether it matters.

This guide covers what AI-generated code actually looks like, which detection signals are reliable, how to handle the grey zone, and what policy we'd recommend if we were writing your hiring guidelines today.

Why "just ban AI" doesn't work

The instinct is understandable. If you can't trust that the candidate wrote the code, the signal is tainted. So you add a line to the brief: "Do not use AI tools."

The problem:

You can't verify compliance. A candidate who ignores the rule and uses AI anyway is no worse at writing code — they're just harder to catch.
You're filtering out candidates who use AI effectively as a tool, which is increasingly a job requirement.
The best candidates are the ones most likely to resent the restriction and opt out.

The better framing: you don't care that they used AI. You care whether they can produce good work and explain every line of it. Detection is about calibrating that — not about penalising tool use.

What AI-generated code actually looks like

Forget the checklist of "too perfect formatting" or "no typos." Those aren't signals at this point; modern AI code is indistinguishable from tidy human code on superficial inspection. The real signals are statistical and behavioural.

Token-level perplexity

Language models generate code by predicting the next token. The tokens they produce are, by definition, low-perplexity — high probability given the context. Human-written code has higher variance in token probability: you'll see unusual variable names, domain-specific abbreviations, personal style quirks, and decisions that surprise the model.

This is the same technique used by tools like GPTZero — applied to code rather than prose. A consistently low-perplexity score across a whole file is a reliable signal that a model generated it. A mixed score (low in the boilerplate sections, higher in the business logic) is the normal pattern for AI-assisted human writing.

Naming and structural entropy

AI models have strong priors about naming conventions. They produce userData, fetchResults, handleSubmit — common, predictable names consistent with whatever they've seen in training. Human code written under time pressure tends toward shorter, more contextual names: ud, res, doThing. Not better — but idiosyncratic.

Structural entropy is similar. AI tends to produce balanced function lengths, consistent indentation even in unusual places, and symmetric conditional branches. Human code has more variance — rushed code more so.

Neither of these is a smoking gun. A tidy, experienced engineer will produce low-entropy code naturally. The signal matters most when combined with others.

Commit history

This is the most underused signal and the hardest to fake.

A human writing code over 3–4 hours produces a commit history that looks like work: small commits, some false starts, refactors that break things before they fix them, a flurry of commits near the end. An AI-assisted submission often has one or two large commits. The timestamps compress: an hour of wall-clock time produced 600 lines across 12 files, which is not how humans type.

Look for:

Fewer than 3 commits for a 200+ line submission
A single large commit that touches every file at once
Commit messages that look auto-generated ("feat: implement all requirements")
No commits between 11pm and 3am on a weekday (candidates claiming they built it during business hours)

You don't need tooling for this — git log --stat tells you in 30 seconds. But at scale, you do want automation.

The "explain it" test

The most reliable signal of all: ask them about their code in the debrief.

Not "explain this function" (they can re-read it and explain it). Ask about decisions: "Why did you use a map here instead of a list?" or "I notice you didn't add input validation on this endpoint — was that intentional?"

A candidate who wrote the code will have opinions, even wrong ones. A candidate who prompted AI to write the code will either have no opinion or will answer with generic best-practice language that doesn't reference their actual implementation.

This is the human layer that no automated tool replaces. Use it.

How to think about the AI score

Whether you're running automated detection or scoring manually, you'll end up with something like a 0–100 likelihood score. Here's how we'd bucket it:

Score	What it means	Action
0–40	Almost certainly human-written	No debrief adjustment needed
40–60	Human code, likely with AI assistance for boilerplate	Note it, ask one general question about tooling philosophy
60–80	Heavily AI-assisted; candidate probably wrote the architecture, AI wrote the implementation	Probe the debrief on specific decisions and edge cases
80–100	Near-total AI generation; candidate may not be able to explain key sections	Ask them to walk through a specific function live; this is your debrief focus

The 60–80 range is where reasonable people disagree. A senior engineer who uses Copilot heavily and produces clean, correct, well-explained code is probably a better hire than a mid-level engineer who wrote everything by hand but shipped something brittle. The score is an input to the debrief question, not a verdict.

The 80+ range is where most teams set their automated rejection threshold. We'd agree with that, with one caveat: always give the candidate a chance to explain before rejecting. We've seen submissions at 85 where the candidate disclosed upfront ("I used Cursor to generate the scaffolding") and could explain every line in detail. Context matters.

What to actually do at each stage

In the brief

Add one sentence: "Submissions are automatically checked for AI-generated code. We don't automatically reject on this signal, but it informs our debrief questions."

This changes behaviour more than any technical detection does. Candidates who were planning to submit pure AI output will either opt out (fine) or increase their own involvement to be able to explain the code (also fine).

During review

Run automated detection as part of your standard review pipeline. If you're doing this manually:

git log --stat — look for the commit patterns above.
Skim the functions that "feel" too clean — run them through a perplexity checker.
Note the sections you want to ask about in the debrief.

Don't make a hire/no-hire call at this stage based on AI detection alone. Make it in the debrief.

In the debrief

Open with an easy question about the implementation, then move to a specific decision point that requires genuine understanding:

"I noticed you chose [approach X] for the data layer — what alternatives did you consider?"
"This part of your code handles [edge case] but this other part doesn't — was that intentional?"
"If you had another hour, what would you add or change?"

The last question is the most informative. A candidate who wrote the code will have a specific, prioritised answer. A candidate who prompted for the code will give a generic one.

Policy recommendation

If you're writing your AI-use policy for take-homes today, here's what we'd put in writing:

Candidates may use AI coding tools (Copilot, Cursor, Claude, etc.) as they would on the job. All submissions are automatically scanned for AI-generated code. Submissions with an AI likelihood score above 80 and where the candidate cannot explain implementation decisions in the debrief will be declined on that basis alone.

That policy:

Doesn't ban AI use (unenforceable and alienating)
Creates a transparent, defensible rejection criterion
Shifts the bar from "did you write it" to "can you work with it" — which is the bar that actually matters

Frequently asked questions

Can I detect AI use if the candidate used it for a small part of the submission?

Not reliably. If a candidate used AI to generate the boilerplate and wrote the business logic themselves, the perplexity signal is mixed and the commit history will look normal. That's fine — AI-generated boilerplate is not a red flag. The detection is most accurate for submissions where AI wrote the substantive parts.

Are there false positives? What if a tidy engineer gets flagged?

Yes. Any statistical signal has false positives. A very experienced engineer who writes clean, conventional code will sometimes score in the 60–80 range. This is why the debrief is the final gate, not the score. The score is a prior; the conversation updates it.

What if the candidate used AI but was transparent about it?

Disclosure is a positive signal. A candidate who says "I used Cursor for the initial scaffolding and then refactored it" is showing tool literacy and honesty. Weight that in the debrief. They should still be able to explain every decision — but the bar for "explain it" is the same whether they used AI or not.

We're a small team. Is this worth the overhead?

If you're reviewing fewer than 10 submissions a month, the "explain it" debrief question is enough. You don't need automated detection. If you're at 10+, the overhead of manual detection adds up fast — that's where automation starts to pay off.

Does this apply to coding screens (live interviews) as well?

A live coding screen in a shared editor has a different problem: the candidate can use AI with browser tabs you can't see. The only reliable countermeasure is the debrief — ask them to extend what they just wrote, add error handling to a specific function, or explain a decision they made three minutes ago. The signal is the same: genuine work produces instant, specific answers.

CodeVerdict runs automated AI-likelihood scoring on every take-home submission as part of the standard analysis — no setup required. Try it on your next assignment.