CodeVerdict is an AI-powered platform that reviews developer take-home coding assignments in under a minute. You drop the assignment brief and a GitHub URL (or zip); CodeVerdict maps every requirement to the code, detects AI-written shortcuts, runs the project in a sandbox, and produces a scored report with tailored follow-up questions.

How long does it take to grade a take-home assignment?

Most assignments are reviewed in 60 seconds or less. The agent reads the brief, clones the repo, runs setup / tests / execution in an E2B sandbox, and writes the report in parallel. Larger repos with heavy dependencies may take up to two minutes.

What can I submit — GitHub repos, zip files, or both?

Both. You can paste one or more GitHub repository URLs (public or private with a GitHub token) or upload a zip file of the candidate's solution. CodeVerdict handles each the same way: clone, install, run, evaluate.

Do I need an account to try it?

Yes — a free CodeVerdict account is required. Sign up takes about 10 seconds via Google one-click. Accounts let your assessments sync across devices and your team see the same dashboard.

How does CodeVerdict detect AI-written code?

It combines three signals: token-level perplexity (LLM-written code has unusually uniform probability distributions), naming and structural entropy (AI tends to use overly consistent patterns), and commit-history analysis (sudden large commits with no incremental work). The output is a calibrated 0–100 AI score, not a black-box verdict.

Is candidate code safe to run? Where does it execute?

Yes. Every repository runs inside an isolated E2B sandbox — a single-use virtual machine that is destroyed immediately after the report is generated. The candidate's code never touches your machine or CodeVerdict's production servers.

Which programming languages and stacks are supported?

CodeVerdict supports Node / TypeScript, Python, Go, static front-ends, and any Docker-based project out of the box. The agent detects the stack automatically from project hints (package.json, requirements.txt, go.mod, Dockerfile, etc.) and picks the right setup path.

Can I customise how candidates are scored?

Yes. The displayed score is computed client-side from your weightings — requirements met, code quality, test coverage, security, AI-written code. Adjust the weights in Settings and the verdict (Strong hire / Hire / No hire) updates instantly across every submission.

How we built the RAG-powered chat for candidate code reviews

When you're reviewing a candidate's take-home assignment, you often have specific questions: "Why did they use this specific sorting algorithm here?" or "How are they handling edge cases in the payment service?".

While generic AI models are powerful, they lack the specific context of the candidate's code unless you provide it. But pasting 2,000 lines of code into a chat window is tedious and context-window heavy. To solve this, we built a Retrieval-Augmented Generation (RAG) pipeline specifically tuned for source code.

The Challenge: Context is everything

Large Language Models (LLMs) have limited "memory" (context windows). Even with modern models supporting 100k+ tokens, cramming an entire repository into a single prompt for every question is inefficient, slow, and expensive.

More importantly, it's noisy. A developer reviewing a React app doesn't need the AI to read the package-lock.json when asking about a specific state management hook.

Our Approach: Retrieval-Augmented Generation (RAG)

Instead of sending everything at once, we use a multi-stage pipeline:

1. Pre-indexing (The Code Parser)

When a candidate submits their code, we don't just store it as text. We run it through a parser that breaks the codebase into meaningful chunks—functions, classes, and distinct logic blocks. We then generate vector embeddings for each chunk. These embeddings are essentially mathematical representations of the meaning of that code.

2. Semantic Search (Retrieval)

When you ask the chat a question, we convert your question into a vector embedding too. We then perform a similarity search against our indexed database to find the pieces of code most relevant to your query.

If you ask about "database connection logic," we pull the db.ts or prisma/schema.ts files, not the CSS files.

3. Contextual Prompting (Generation)

We take the relevant code snippets, combine them with your question, and send a carefully structured prompt to our LLM. The AI now has exactly what it needs to answer your question accurately, without the distraction of irrelevant files.

Designing for Real-time Interaction

Wait times kill the review flow. To make the experience feel snappy, we implemented:

Streaming Responses: We stream the AI's output byte-by-byte so you can start reading the answer before it's even finished generating.
Worker-based Indexing: Heavy lifting like parsing and embedding happens in background workers, ensuring the main submission flow remains fast.
Scoped Knowledge: The AI is strictly told to only answer based on the candidate's code and the provided requirements, reducing the risk of "hallucinations" (the AI making things up).

Why this matters for hiring

By giving you a conversational interface over the candidate's code, we move from "static review" to "interactive auditing." You can probe the candidate's logic, ask the AI to find potential bugs, or even request a summary of how a specific complex feature was implemented.

It transforms the review from a chore into a high-signal conversation.

Want to see it in action? Explore a demo report and try asking the chat about the candidate's implementation.