Screening & Interviewing

Inside an AI Video Screen: A 24-Minute Technical Interview, Annotated

TL;DR: We ran an AI video screen for a Senior Backend Engineer role. The candidate looked great on paper and answered all seven questions. Their answers were technically correct but lacked specifics, the timing of their responses did not match what we would expect from a real senior engineer, and the phrasing in a couple of answers tracked closely with what an LLM produces for the same prompt. We delivered a 68 out of 100 score with a "review with caution" flag. The recruiter did a 15-minute follow-up call, the candidate stumbled on the same concepts, and they did not move forward.

This post breaks down what actually happened in those 24 minutes, with anonymized snippets from the transcript and what the recruiter saw at the end.

If you have ever wondered what the inside of an AI video interview looks like, this is it.

What this interview was for

A US-based fintech client was hiring a Senior Backend Engineer. The role required Python, distributed systems experience, and meaningful exposure to event-driven architecture. They had been getting roughly 280 applications per role and were burning two engineering hours per hire on first-round screens that mostly went nowhere.

They configured an EvoHire video screen with seven questions. The questions were drawn from our calibrated bank for that seniority and stack, plus one custom question they wrote themselves. The screen was 25 minutes long with an automatic cutoff. We are looking at one anonymized run, which we will call Candidate A.

The candidate clicked the link from a recruiting email, hit "Start Interview," granted camera and microphone access, and was on the screen with our agent inside of 90 seconds. Nothing was scheduled. No human time was used to set this up.

Minutes 0 to 3: Identity, transparency, and warmup

The first three minutes are not really a test. They are setup.

Our agent does three things in this window. It introduces itself as an automated assistant (this matters for candidate experience and for compliance in jurisdictions with AI hiring disclosure laws like New York City's Local Law 144 and the Illinois Artificial Intelligence Video Interview Act). It confirms the candidate's name and the role. And it asks a low-stakes warmup question to get a feel for how the candidate naturally speaks.

For Candidate A, the warmup was: "Briefly, what kind of backend systems have you worked on most recently?"

Their answer was clean and conversational. Nothing about it stood out. We moved into the technical questions.

Minutes 4 to 12: The first three technical questions

The first three real questions are designed to be answerable from genuine working knowledge in roughly 90 to 120 seconds each. They are not trick questions. They are the kind of thing a senior engineer should be able to talk about extemporaneously.

For this role, they were:

  1. "Walk me through what happens when a request hits your service and you need to query a database that is currently failing health checks."
  2. "Explain the difference between at-least-once and exactly-once message delivery semantics, and when you would choose one over the other."
  3. "Describe how you would design a rate limiter for a public API that needs to handle 10,000 requests per second."

Candidate A answered all three. Each answer was technically correct in broad strokes. But here is where we started to see drift.

Question 1

A clean explanation of circuit breakers, retries with exponential backoff, and graceful degradation. They mentioned the term "bulkhead pattern" without prompting, which is a positive signal.

Question 2

A textbook explanation of delivery semantics. The phrase "idempotency keys allow consumers to deduplicate messages on receipt" appeared in the response. We flagged this internally. Not because it is wrong (it is correct), but because the exact phrasing matched a top-ranked ChatGPT response to a similar prompt we had tested two weeks earlier. Could be coincidence. Could be the candidate read the same blog post we did. We logged it and moved on.

Question 3

The candidate described a sliding window rate limiter with Redis as the backing store. Solid answer. But they did not mention a single trade-off, did not anticipate any of the obvious follow-up issues (clock skew, hot keys, the cost of Redis at 10K RPS), and did not bring up alternatives like token bucket or leaky bucket without being asked.

Senior engineers volunteer trade-offs. It is one of the clearest tells we have. The answers were correct. They were not the answers of someone who had built this themselves.

Minutes 13 to 19: Adaptive follow-ups

This is where our agent earns its keep.

EvoHire does not just ask a fixed list of questions. After each answer, it generates a follow-up tailored to what the candidate said. The follow-up tries to do two things: probe deeper on a specific claim, and create a small surprise that is harder to predict in advance.

For Candidate A, the follow-up to Question 3 was: "You mentioned Redis as the backing store. In your experience, what specifically went wrong the first time you ran a Redis-backed rate limiter at scale, and how did you fix it?"

They started, stopped, restarted, and gave a generic answer about "monitoring memory usage" without referencing any specific incident, error, or fix. No story. No texture. No proper noun. No date.

For comparison, here is the kind of answer this question normally gets from someone who has actually done this:

"Yeah, the first time we ran this in production we hit hot keys really hard during a viral promotion. One Redis node was doing about 80% of the work. We ended up sharding by user ID with a hash, and we added a local in-memory cache with a 100 millisecond TTL as a first hop. It was a Sunday night, I remember because my partner was annoyed."

Specific incident. Specific fix. A throwaway personal detail. That is the texture of real experience. Candidate A had none of it.

We delivered two more follow-up questions in this window. Both produced similarly thin answers. The pattern was now clear.

Minutes 20 to 24: The behavioral question and the wrap-up

The seventh question was the client's own custom question, behavioral rather than technical: "Tell me about a time you disagreed with a senior engineer or architect about a system design decision. What did you do?"

This question is intentionally hard for an AI copilot to handle well, because it requires a personal story with internal consistency. ChatGPT can generate a plausible-sounding story, but if you press on the details (what was the engineer's name and role, what was the actual technical disagreement), the seams show.

Candidate A's answer was structurally well-formed (situation, task, action, result) but oddly generic. The senior engineer was not named. The disagreement was described as being about "architectural patterns" without specifying which patterns. The resolution was "we discussed it as a team and reached consensus."

A real story has a name, a specific point of contention, and a small detail that nobody would invent. This had none.

The interview wrapped at 24 minutes 11 seconds.

What our cheating detection flagged

By the end of the screen, our cheating detection engine had flagged Candidate A for review.

EvoHire uses a combination of lexical analysis, audio signal pattern detection, and proprietary behavioural metrics to identify when a candidate is receiving outside assistance during an interview. To protect the integrity of the system, we do not publicly disclose every metric we track. Our detection goes well beyond tab monitoring or eye tracking, and EvoHire can flag external assistance even when a candidate is using a separate device placed out of view, such as a phone hidden beneath their monitor.

For Candidate A, none of the individual signals on their own would have been conclusive. Together, combined with the lack of texture in their follow-up answers, they painted a coherent picture.

We do not tell the recruiter "this candidate cheated." That is not a call we are willing to make. What we tell them is: "Multiple signals flagged. Confidence of unassisted performance: 41%. Recommend live verification."

What the final report actually looks like

The recruiter saw a single dashboard view per candidate. For Candidate A, it contained:

  • A 68 out of 100 overall score, broken out by question with a 1-to-5 rating per response.
  • A two-paragraph executive summary highlighting strengths (terminology, structural correctness) and concerns (lack of specific examples, generic behavioral answer).
  • The cheating detection panel with the overall confidence number and a high-level summary of what was flagged.
  • The full transcript, searchable and timestamped.
  • The full audio and video recording.
  • A short list of recommended verification questions for the human follow-up call, generated based on where the AI screen flagged uncertainty.

The recruiter spent maybe four minutes on this report before deciding to do a 15-minute live call. On the live call, the candidate stumbled on the same Redis question they had answered confidently in the AI screen. That was enough for the client to pass.

Total engineering time spent on this candidate: zero. Total recruiter time: about 20 minutes. Without the AI screen, this candidate would have eaten a one-hour technical interview slot on the engineering team's calendar, and would have probably gotten through it on the strength of their resume.

A few things people get wrong about AI video screens

A few questions we get all the time, answered briefly.

"Is it not creepy to record candidates on video?" It is, if you do it badly. We disclose the recording up front. The candidate has to consent before the screen begins. The recording is encrypted, retained for the period the client configures, and not used for any other purpose. The same standard most companies already apply to video conferencing tools.

"Will candidates think it is rude to interview with a bot?" Some will. Most will not, especially if the alternative is waiting two weeks for a recruiter to schedule them. Completion rates on EvoHire video screens are above 87% across the platform. People finish.

"Is the AI biased?" Every interview process has bias. The honest answer is that an automated process replaces some kinds of bias (which side of the bed the recruiter woke up on, whether the candidate's name pattern-matches familiar) with other kinds (whatever shows up in the calibration data, whatever the question bank emphasizes). What we can do, and do, is run the same rubric for every candidate, log every score, and let the client audit the calibration set. That is more accountability than most human screens have. The EEOC's technical guidance on AI in employment decisions is the right starting point if you want to understand what defensible looks like in this space.

"Phone or video, which is better?" For technical screens, video is better. It gives the candidate a more legitimate-feeling experience and gives our detection engine more behavioural signal to work with. Phone is faster to set up and friendlier to candidates with bandwidth issues, but if you can do video, do video. Most of our customers run video screens for technical roles and reserve phone screens for non-technical ones.

What this means for how you hire

The point of an AI video screen is not to replace an engineering interview. It is to make sure the engineering interview is happening with someone worth the engineering team's time.

For Candidate A, the screen did its job. It did not say "do not hire this person." It said "this person looks better on paper than the data suggests, here is why, and here are the questions you should ask if you decide to take it further." The recruiter took it further, the questions worked, and the team did not waste an hour.

Multiply that by 280 applicants per role, and the math changes.

If you want to see what one of these screens looks like for a role you are actually hiring for, you can set one up in about ten minutes at evohire.ai. Use a real job description. We will give you a link you can send to a few candidates this week.

FAQ

Q: How long is a typical EvoHire AI video screen?
Most screens run between 18 and 28 minutes depending on the seniority of the role and the number of questions configured. Our default for senior technical roles is seven questions and a 25-minute cap.

Q: Does the AI agent give the candidate the score at the end?
No. The score and report are delivered to the recruiter, not the candidate. The candidate sees a confirmation that the screen is complete and that the recruiter will be in touch.

Q: Can recruiters customize the questions?
Yes. Every job has its own question bank. You can use our calibrated bank, write your own questions, or mix both. Custom questions are particularly useful for behavioral screening and for company-specific scenarios.

Q: What languages does EvoHire support for video screens?
The agent conducts interviews in most major languages, and our cheating detection works across languages. The transcript and report are generated in the language the recruiter selects.

Q: What happens to the video recording after the screen?
Recordings are encrypted at rest, retained for the period configured in the client account, and deleted automatically after that. Candidates can request deletion at any time under the privacy policy.

Nitish Kasturia
Founder
Published
April 26, 2026
Share on:
Try EvoHire Free

No Credit Card Required

Ready to Accelerate Your Hiring Velocity?

Leverage AI agents to eliminate false-positive interviews and save valuable engineering time