What I'm Actually Looking for When I Interview a Data QA/QE Engineer

I've been interviewing QA/QE candidates lately, and I want to be honest: the bar is not what I expected to find. Most people can tell me they've checked for nulls. Few can tell me why that's barely the start.

Here's the list of things I'm actually evaluating - shared in the spirit of helping the right people level up.

Security (Yes, This Should Be First)

I almost buried this section, but it probably matters most. You don't need to be a compliance officer. But you do need to know enough to raise a red flag.

If your standard move is to download a sample to Excel on your local machine for testing, be prepared to explain why that might be a problem. Know the difference between PI (personal information) and PII (personally identifiable information). Understand in which environments masking should be applied and who should have access to unmasked data. If production PI values are flowing into a QA environment, you should be uncomfortable - and vocal about it.

Data Quality as a Discipline

Checking for nulls and duplicates is table stakes. A strong QA/QE candidate can articulate what a mature data quality practice looks like across an organization.

That means understanding the data quality dimensions - completeness, accuracy, consistency, timeliness, uniqueness, validity - and knowing that different teams often have different names and groupings for them. Part of the job is helping the org align on which dimensions matter for which data products. Ideally this gets codified in a data product spec: what does "good" actually look like for this dataset?

Many orgs aren't mature here. That's actually an opportunity. A QA/QE who understands the principles, processes, tools, and measurement approaches can grow into a QA lead role with a clear, stable path.

Contracts

Five years ago I wouldn't have used the word "contract" in a data quality conversation. Now it's inseparable from a data product approach to enterprise data.

Know what a data contract is, where you can implement one, and what common configuration attributes look like (owner, SLA, schema expectations, quality thresholds). Bonus points: if you can suggest managing contracts outside of any specific pipeline so they can be reused across ingestion and enrichment patterns - that's the move. Contracts tied to a specific tool or pipeline are contracts waiting to break when something gets refactored.

Frameworks

I'd rather hear you talk about managing test cases and contracts in a framework than hear you describe what you did ad hoc for a single pipeline.

If you've used one - Great Expectations, Soda, dbt tests, MonteCarlo - talk about it. If you haven't, at least know what they are and what value they provide. The ability to configure quality checks at scale, version them, and audit them over time is what separates a QA practice from a QA habit.

Many orgs are still figuring this out. Knowing the landscape of frameworks and being able to advocate for one is honestly enough to stay in the running.

Technical Skills

You don't need to write production data pipelines. But you need more technical depth than a technical BA.

SQL is your primary language - not just writing queries but debugging them, knowing why a test is failing and how to fix it. YAML configs are table stakes if you're working with any modern quality framework. Python is a bonus, but it matters if you're expected to integrate unit tests into a CI/CD pipeline or maintain a regression library.

The key distinction: you should be able to debug a failing test, not just run it and escalate.

Migrations vs. New Data Products

This one separates candidates who've been in the trenches from those who haven't. Be able to explain the difference between validating data for a new data product versus validating data in a migration.

Hint: migration validation is harder. Why? Because you're not just asking "is this data correct?" - you're asking "does this data match what it used to be, and should it?" You're reconciling two systems, potentially with different business logic, different transformations, and different definitions of truth. New data products have a clean slate. Migrations have history, and history is messy.

AI in QA

If you're using AI for quality checks, be prepared to explain how - specifically. Where do you trust it? Anomaly detection and pattern recognition on large volumes of data: reasonable. Deciding whether a business rule is correct: probably not without human validation in the loop.

Vague answers here are a red flag. "I use AI to help" is not an answer.


The candidates who impress me aren't necessarily the ones with the most experience. They're the ones who've thought about data quality as a system - not a checklist. If you're preparing for a QA/QE role on a data platform team, this is the lens I'm using.

0 Comments

Leave a Comment