Build a Structured Interview Scoring Rubric in 90 Minutes (No HRIS Required)

Q: How is BARS different from the 1–5 star ratings most ATSes default to?

Generic 1–5 ratings have no anchor - '5' means whatever the rater thinks 'excellent' means. BARS replaces the abstract label with a written description of an observable behavior, so two raters score the same answer the same way. BARS was developed by Smith and Kendall in 1963 specifically because abstract ratings produced inter-rater agreement so low the data was unusable.

You are the only person at your company who thinks about hiring as a discipline. The CEO wants to move faster. The hiring managers want to "trust their gut." Legal wants documentation. You sit in the middle and produce the artifact that makes every other stakeholder defensible - and right now, you're being asked to produce it without budget, without an HRIS, and without a year-long project plan.

This is a 90-minute project. You'll come out the other end with a one-page scoring rubric tied to a real validity coefficient and a paper trail any leadership review or EEOC inquiry will accept. The hardest part is not technical - it's getting four hiring managers to score the same candidate the same way. The rubric is what does that.

TL;DR

A scoring rubric pairs each interview question with a 1–5 scale whose levels are anchored in observable behavior - what a "5" answer literally sounds like vs. a "3" vs. a "1." With anchored scoring, your structured interview reaches r ≈ 0.42 validity for predicting job performance - the corrected current best estimate (Sackett et al., Journal of Applied Psychology, 2022), revising Schmidt & Hunter's 1998 r = 0.51 downward for range-restriction effects. Without anchored scoring, unstructured interviews sit closer to r ≈ 0.20. The rubric below takes 90 minutes to build, uses no software, and is the document you hand to leadership when they ask "how do we know the hire was defensible?"

Why a rubric - and not just better questions

Behavioral questions ("tell me about a time...") are necessary but not sufficient. Two interviewers can hear the same answer and rate it differently because they're measuring against different mental models. The validity gain across 85 years of personnel-selection research isn't from the questions alone - it's from structure: same questions, same order, same scoring scale, written before the interview. The original Schmidt & Hunter 1998 meta-analysis put structured-interview validity at r = 0.51 (APA / Psychological Bulletin, 1998); the 2022 Sackett et al. correction adjusts that downward to ≈ r = 0.42 after fixing systematic over-correction for range restriction (Journal of Applied Psychology, 2022). Either number is more than double unstructured-interview validity.

EEOC's Uniform Guidelines on Employee Selection Procedures (29 CFR §1607) don't require a numeric rubric, but they require that selection procedures be "job-related and consistent with business necessity" and that disparate-impact analysis be possible. A documented rubric makes both defensible. A "I just had a good feeling about her" does not.

The 90-minute build

Minutes 0–20: Pick the 4–6 attitudes that predict success

You can't score what you haven't defined. Ask three current top performers (anyone you'd hire again tomorrow): "When a new hire on this team works out, what is consistently true about them in week one? When one doesn't work out, what was wrong from week one?" Write down what they say. Group it. You'll surface 4–6 named attitudes - coachability, ownership, pace under ambiguity, customer empathy, attention to detail, and similar. These are the things your rubric scores.

This is the only step that's company-specific. Skip it and you'll end up with a generic "humble, hungry, smart" rubric that doesn't predict success at your company.

Minutes 20–50: Write a Behaviorally Anchored Rating Scale (BARS) for each attitude

For each attitude, write a 1–5 scale where each level describes an observable behavior. Generic 1–5 scales ("excellent / good / average / weak / poor") are worthless because two interviewers will calibrate them differently. BARS - first formalized in 1963 - fixes this by making each level a verbal anchor:

Example: Coachability

Score	What the candidate did during the interview
5	Gave a specific, recent example of feedback they initially disagreed with, named what changed in their behavior over the following 30 days, and explained what they'd do differently the next time similar feedback came up.
3	Gave a feedback example but framed it as "they were right and I just listened" without describing what they changed.
1	Gave a vague answer ("I'm always open to feedback") with no example, or named feedback they'd received but said they disagreed and didn't change.

Write one BARS like this for each of your 4–6 attitudes. Anchors should describe what the candidate did during the interview, not the candidate's character. This is the entire defensibility play.

Minutes 50–70: Write one behavioral question per attitude

One question, past-tense, specific, story-shaped. Examples that pair with the rubric:

Coachability: "Tell me about feedback you received in the last 6 months that you initially disagreed with. Walk me through the 30 days after."
Ownership: "Describe a project where something went wrong and you took action no one specifically asked you to take."
Pace under ambiguity: "Walk me through the last week you had where the priorities weren't handed to you. How did you decide what to work on?"
Customer empathy: "Tell me about a time a customer was upset for what you thought was an unreasonable reason. What did you do?"

Same questions, same order, every candidate. No improvisation, no skipping. Variability in the interview is variability in the data.

Minutes 70–90: Two-rater calibration + decision rule

Two interviewers per finalist, scoring independently using the BARS, then comparing before the debrief. Inter-rater agreement - how often two raters land within 1 point on the same attitude - is the single best signal of whether the rubric is working. Aim for 80%+ agreement; below that, your anchors aren't specific enough and need a rewrite.

Set the decision rule before the first interview: "Anyone scoring under a 3 on more than one attitude is a no. Anyone scoring a 5 on at least three attitudes is a strong yes." Write it on the rubric. Founders and hiring managers break their own rules when a charming candidate walks in; the written rule is what protects you.

What this costs you and what it produces

Time: 90 minutes one-time per role. The rubric is reusable for every hire into that role until the role itself meaningfully changes.

Defensibility deliverable: a one-page scoring sheet per interview, signed and dated. If a hiring decision is ever challenged - internally, by a candidate's attorney, or by EEOC - you produce the rubric, the scoresheets, and the decision rule. The artifact is what you need.

Validity gain: from unstructured-interview r ≈ 0.20 to structured-interview r ≈ 0.42 - more than double the predictive accuracy, per the corrected meta-analysis (Sackett et al., 2022, revising Schmidt & Hunter's 1998 r = 0.51 downward for range restriction). That's the gap between "I had a good feeling" and "I have evidence."

FAQ

Does the rubric need to be legally reviewed before I use it? The rubric itself doesn't usually require legal review, but the underlying attitudes should be job-related and not proxies for protected characteristics (age, race, national origin, disability, etc.). Review the attitudes with your employment counsel if you're in a regulated industry or scaling past 50 employees. EEOC's Uniform Guidelines (29 CFR §1607) describe the analytical standards selection procedures need to meet.

What if I only have one interviewer? You lose the inter-rater calibration check, which is the cheapest reliability signal you'll ever run. If you must run a single-interviewer process, score immediately after the interview (not 24 hours later) and have a second person review the scoresheet against the audio or notes before the decision. It's a worse process than two raters, but it's still defensible.

How is BARS different from the 1–5 star ratings most ATSes default to? Generic 1–5 ratings have no anchor - "5" means whatever the rater thinks "excellent" means. BARS replaces the abstract label with a written description of an observable behavior, so two raters score the same answer the same way. BARS was developed by Smith and Kendall in 1963 specifically because abstract ratings produced inter-rater agreement so low the data was unusable.

How often do I need to revisit the rubric? Refresh the attitudes when the role meaningfully changes (new tools, new customer segment, new team size), or annually as a forcing function. The behavioral anchors themselves often stay stable for years.

Can I use this rubric for promotions and performance reviews too? Yes, with adjustments. The same BARS approach works for performance reviews and promotion decisions, and the consistency across hire and performance is itself a defensibility win. Just rewrite the anchors to describe on-the-job behavior rather than interview behavior.

What to do next

Block 90 minutes on your calendar tomorrow. Identify three top performers, send them the two-sentence question above by email, and read their answers. By the end of the 90 minutes you'll have one printable scoring rubric for one role - which is more than most companies 5× your size can produce.

Build a Structured Interview Scoring Rubric in 90 Minutes (No HRIS Required)

TL;DR

Why a rubric - and not just better questions

The 90-minute build

Minutes 0–20: Pick the 4–6 attitudes that predict success

Minutes 20–50: Write a Behaviorally Anchored Rating Scale (BARS) for each attitude

Minutes 50–70: Write one behavioral question per attitude

Minutes 70–90: Two-rater calibration + decision rule

What this costs you and what it produces

FAQ

What to do next

The 30-60-90 Day Culture Onboarding Plan: Turn a Culture-Fit Hire Into a Culture-Integrated Hire

Your Company Culture Is Cracking: Find the Real Problem Fast

The Client Intake Questions Most Recruiters Skip (And Why They Cost You Placements)

TL;DR

Why a rubric - and not just better questions

The 90-minute build

Minutes 0–20: Pick the 4–6 attitudes that predict success

Minutes 20–50: Write a Behaviorally Anchored Rating Scale (BARS) for each attitude

Minutes 50–70: Write one behavioral question per attitude

Minutes 70–90: Two-rater calibration + decision rule

What this costs you and what it produces

FAQ

What to do next

More from the blog

The 30-60-90 Day Culture Onboarding Plan: Turn a Culture-Fit Hire Into a Culture-Integrated Hire

Your Company Culture Is Cracking: Find the Real Problem Fast

The Client Intake Questions Most Recruiters Skip (And Why They Cost You Placements)