What Does 'Expertise' Mean When AI Can Pass Any Exam?

In 2023, Claude passed the bar exam. In 2024, it passed the CPA exam and medical licensing exams. By 2026, there’s barely an exam left that AI can’t pass, often on the first try.

This has broken something we relied on without thinking too hard about it: the equivalence between “passed the exam” and “has expertise.”

For a century, we used exams as a proxy for skill. You pass the bar, you’re qualified to practice law. You pass medical licensing, you’re qualified to practice medicine. You get a degree in computer science, you’re qualified to be a programmer. The exam was the evidence that you had the knowledge and could apply it.

But now we have machines that can pass every exam better than every human, and they have no expertise whatsoever.

This is not a semantic argument. It’s a professional crisis wearing a technical disguise.

What Exams Were Actually Testing

Exams are not bad. They test something real: the ability to retrieve and apply knowledge under time constraints, with incomplete information, and under pressure.

This is a valuable skill. A surgeon who can’t retrieve the right procedure under pressure is dangerous. A lawyer who can’t apply contract law on the fly will lose cases. A pilot who can’t handle the situation-recognition exam is a risk.

But exams test only this skill.

They do not test:

Judgment about whether to use the knowledge — An exam asks “what’s the treatment?” A doctor’s real expertise is knowing when not to treat, or when to treat differently because the patient’s actual constraint is not medical.
Integration with context — An exam is abstract. Real expertise is knowing that this context matters, and this one doesn’t.
Responsibility — An exam has no stakes. Real expertise emerges because someone made a decision and lived with the consequences.
Taste — Exams have right answers. Real practice has trade-offs.

The exam tested knowledge retrieval and application. It was a proxy for expertise because, in the pre-AI world, knowledge retrieval was the bottleneck. If you knew the material well enough to pass the exam, you probably also had the other stuff—experience, judgment, skin in the game.

AI breaks this correlation.

Now you can retrieve and apply knowledge perfectly—better than any human—and still have no judgment about when to use it. Better, you can apply knowledge you don’t even understand, because understanding is not necessary for reciting the right answer.

The Real Expertise Problem

So what is expertise now?

It’s what’s left when knowledge retrieval stops being the bottleneck.

Start with a surgeon. Years ago, being an expert surgeon meant having encyclopedic knowledge of anatomy, procedures, and outcomes. You had to know the material because you couldn’t look it up mid-surgery.

Now you can look anything up. During surgery, you have access to the entire medical literature in your pocket. The knowledge is not scarce anymore.

The expertise that remains is:

Judgment about what’s relevant — Given 10,000 papers on this condition, which ones matter to this patient?
Recognition of the unexpected — This presentation looks like X but I’ve seen this before when it was actually Y.
Integration across domains — This is not a pure orthopedic problem; there’s a vascular component that changes everything.
Acceptance of responsibility — I’m making a call that might hurt this person, and I’m accountable.

None of these show up on the licensing exam.

For a lawyer, expertise used to mean knowing the law. Now it means:

Knowing what law actually means in this context — The letter vs. the spirit, and which one matters here.
Anticipating what opposing counsel will do — Not because you don’t know the law, but because you’ve fought them before and know their tactics.
Seeing the risk that’s not in the contract — What did the parties not think to worry about?
Making the call when there’s no case law — When the precedent doesn’t apply, what guides you?

Again, not on the exam.

For an engineer, expertise used to mean knowing how to build things. Now it’s much harder to operationalize:

Knowing when not to use the standard solution — This looks like a typical scaling problem but it’s actually a latency problem, which is different.
Seeing the architecture that will survive the next five years — Not the one that works today, but the one that won’t need rewriting.
Integrating non-technical constraints — This is the right solution technically, but the team can’t maintain it, so we need the second-best solution that we can.
Bearing the cost of mistakes — If this scales wrong, people will blame me, and I’ll have to wake up at 3 AM to fix it.

How We’ll Actually Test Expertise

This is the reckoning ahead. Professions that relied on exams are going to have to figure out what they actually care about.

Some fields are moving fast. Medicine is already grappling with this. The licensing exam is moving away from pure knowledge-recall toward “can you actually practice medicine?"—board certifications are adding simulations, case reviews, and accountability measures.

Law is slower to adapt. The bar exam will probably persist as a credentials filter (you still have to pass it), but everyone knows it’s no longer evidence of expertise.

Programming doesn’t have licensing, which might be a feature—we’ve never had a good way to measure expertise anyway. “Years of experience” is a proxy, but a bad one.

What we might see instead:

Portfolio-based credentialing — Show what you’ve built, why it matters, what trade-offs you made.
Apprenticeship models — Spend time with someone who’s exercised good judgment, and let them vouch for you.
Outcomes-based assessment — Track people who’ve made major decisions and see how they aged.
Reputation in community — In fields with clear professional communities, people know who’s trustworthy.

None of these scale like exams do. All of them are harder to game. And all of them test something closer to actual expertise.

The Uncomfortable Truth

Here’s what I think is actually happening:

We built exams because we needed an efficient way to filter people. Exams are scalable. You can give them to thousands of people, grade them mechanically, and have a clear credential.

But efficiency came at a cost. Exams tested the wrong thing—they tested knowledge retrieval, not wisdom. And for a long time, we got away with it because knowledge retrieval was hard, so proxying for wisdom with “good at knowledge retrieval” mostly worked.

Now that AI handles knowledge retrieval better than we do, the proxy breaks down. And we have to do the harder work of actually assessing judgment, taste, and responsibility.

This is uncomfortable because it’s non-scalable. You can’t efficiently assess whether someone will make good trade-off decisions or recognize hidden risks. You have to know them, or know someone who does.

But maybe that’s the point. Maybe expertise was never supposed to be scalable. Maybe it was always supposed to be based on apprenticeship, reputation, and skin in the game.

The exam was a convenient shortcut. It was always a lie—a useful lie, but a lie. AI didn’t break expertise. It just broke the lie we told ourselves about what the exam meant.

Now we get to build something more honest.

What Exams Were Actually Testing#

The Real Expertise Problem#

How We’ll Actually Test Expertise#

The Uncomfortable Truth#

What Exams Were Actually Testing

The Real Expertise Problem

How We’ll Actually Test Expertise

The Uncomfortable Truth