Jun 19, 2026
How to Standardize Hiring When Five Managers Run Five Different Interviews
Five managers, five interviews, five hiring qualities. The four-part SOP — one rubric, one question set, two raters, one written decision rule — closes the variance gap in a week.
You run an operations-driven business. You have inventory systems, ordering systems, scheduling systems, P&L by location, and you can tell me to the dollar what last Tuesday's labor variance was. And yet, when you ask five managers how they pick a new hire, you get five different answers — and your turnover and your unit economics show it.
This post is about closing the variance gap on the one process you've left to gut feel: hiring. Same logic as standardizing any other operational SOP, just applied to interviews.
TL;DR
Five managers running five different interviews is an inter-rater reliability problem. The fix is the same as any operational standardization: one scoring rubric, one set of questions in one order, two raters per finalist, and a written decision rule. Structured interviews predict job performance at roughly r = 0.42 (Sackett et al., Journal of Applied Psychology, 2022 — a downward correction from Schmidt & Hunter's earlier r = 0.51 that accounts for range-restriction effects). Unstructured interviews — the kind your managers are running today — predict at roughly half that. And the A-Player you'd get with the higher-validity process is worth 2–4× the output of an average hire, per Bradford Smart, Topgrading (2012). The standardization is a one-week project that pays back on the next hire.
The hidden cost of five different interviews
Here's the operational picture most owners don't draw. You have five managers. Manager A asks behavioral questions and takes notes. Manager B asks "tell me about yourself" and goes off-script. Manager C tests for "fit." Manager D talks for 40 minutes about the company. Manager E hires whoever showed up on time. Each one believes they're the one with good judgment. None of them can prove it, because none of them are measuring the same thing.
The output you see downstream is variance:
- Some managers turn over staff at meaningfully higher rates than others.
- Some locations or shifts produce A-Players consistently. Some never do.
- Cost-per-hire and time-to-fill vary meaningfully between managers, per the SHRM Talent Acquisition Benchmarking baselines most SMBs eventually map to.
- The same role pays the same wage and produces wildly different results — because the selection upstream wasn't the same.
If your inventory shrink varied that much between managers, you'd fix it that week. The same fix applies here. You don't need everyone to be a great interviewer. You need everyone to run the same interview.
The four-part SOP
1. One scoring rubric, written down
Pick the 4–6 attitudes that predict success in the role. Not "skills" — those are easier to assess in a trial shift or on a résumé. Attitudes. Coachability. Ownership. Pace under pressure. Customer empathy. Reliability. Whatever your top performers have in common.
For each attitude, write a 1–5 scale where each level describes an observable behavior. A "5" on coachability looks like "gave a specific recent example of feedback they disagreed with and named what changed in their behavior afterward." A "1" looks like "vague, no example, or said they disagreed and didn't change." These are called behaviorally anchored rating scales and they're the entire point — they make two managers score the same answer the same way.
We walk through a full BARS build, minute by minute, in our structured interview scoring rubric guide.
2. One set of questions, in one order
One behavioral question per attitude. Past-tense, specific, story-shaped. "Tell me about a time you got feedback you disagreed with." "Tell me about a project that went wrong where you stepped in without being asked."
The same questions, in the same order, every candidate, every manager. No improvisation. No skipping. No "I had a feeling so I went a different direction." Variability in the interview is variability in the data, and the data is the hiring decision. See our behavioral interview questions guide for the full question bank.
3. Two raters per finalist, scored independently
This is the cheapest reliability check you'll ever run. Two managers each score the candidate on the rubric before they talk to each other. Inter-rater agreement — how often they land within 1 point on the same attitude — is your direct read on whether the rubric is working. Aim for 80%+. Below that, your anchors aren't specific enough and need a rewrite.
The two-rater rule also removes the single most expensive failure mode in SMB hiring: one manager falls in love with a candidate, hires them on the spot, and the rest of the team has to live with it. With two raters and a rubric, the conversation in the debrief is about the scores, not the candidate's personality.
4. One written decision rule, set before the interview
"Anyone scoring under a 3 on more than one attitude is a no. Anyone scoring a 5 on at least three attitudes is a strong yes. Anything else gets a second-round." Write it on the rubric. Print it. Hand it to the managers.
Founders and managers break their own rules when a charming candidate walks in. The written rule is what protects you from yourself.
Why the standardization is worth your time
Two numbers. First: structured interviews predict performance at roughly r = 0.42 (Sackett 2022), vs. roughly half that for the unstructured interviews your managers are running now. Second: the difference between an A-Player and an average hire in the same role is 2–4× the output, per Bradford Smart, Topgrading (2012). Combine the two and the math is straightforward — you get a meaningful uplift in the probability of hiring an A-Player, and an A-Player is worth multiple times what an average hire is. The standardization is a one-week project that compounds on every hire after.
A documented selection process is also what the EEOC Uniform Guidelines on Employee Selection Procedures (29 CFR §1607) expect when a hiring decision is ever challenged. A rubric and signed scoresheets are the artifact. "Manager Joe had a feeling" is not.
How to roll this out across five managers in one week
- Monday: You and one other operator define the 4–6 attitudes (see step 1). 90 minutes.
- Tuesday: You write the BARS scale + the question for each attitude. 90 minutes.
- Wednesday: 60-minute meeting with all five managers. Walk through the rubric. Calibrate on two past candidates each manager remembers — score them independently and compare. The disagreements are the conversation.
- Thursday: Each manager runs one interview with the new rubric on their next open req. Two raters per finalist.
- Friday: 30-minute debrief. What scored cleanly. What didn't. Adjust anchors. The rubric is now your SOP.
A useful diagnostic before you start: have each manager describe the last hire they made. If the descriptions sound like five different processes, you have your business case. If they sound like one process already, you have a different problem.
FAQ
My managers will resist this — they think they're good at hiring. Most of them are wrong, but you don't have to lead with that. Lead with the data. Show them the turnover variance between their stores or shifts. Show them the cost-per-hire variance. The rubric is a tool that makes their good instincts defensible, and protects them from their bad ones. Frame it as standardization, not surveillance.
What if a manager wants to hire someone who scored badly on the rubric? That's exactly the situation the written decision rule is designed for. The manager can override, but the override is documented in writing alongside the rubric. After 6 months, you'll have the data on whether overrides outperform rubric hires (they almost never do). The exercise of having to write down the override is itself a discipline.
Does this work for hourly roles or just management? Both. Hourly roles benefit even more, because hourly hires happen more often and the volume is what compounds. The BARS anchors are different — coachability and reliability matter more than strategic thinking — but the four-part SOP is identical.
How long does the rubric stay valid? Refresh the attitudes when the role meaningfully changes (new equipment, new customer segment, new compensation model), or annually as a forcing function. The behavioral anchors themselves often stay stable for years.
What if I only have one manager doing the interviews? Single-rater processes lose the inter-rater calibration check, which is the cheapest reliability signal you'll ever run. The fix: score immediately after the interview, not 24 hours later, and have a second person (you, or an experienced operator) review the scoresheet against the candidate's notes before any offer goes out. Worse than two raters; still defensible.
What to do next
Block 90 minutes on Monday. Identify the 4–6 attitudes. Send your managers this post on Tuesday. You'll be running the same interview by Friday.