The Verifier's Dilemma | BlackPeak Research

Intelligence is being commoditized. This is no longer speculation. The cost of achieving a given level of AI performance on standard benchmarks has dropped by orders of magnitude in two years. What cost thousands of dollars in compute recently now costs pennies. The trend shows no signs of stopping.

Most conversations about AI start with the wrong question. Executives ask, "Can AI do this task?" The better question, the one that actually predicts success or failure, is: "Can we verify that AI did this task correctly?"

That distinction matters more than most people realize.

· · ·

The Commoditization of Intelligence

Think about what the Industrial Revolution did to physical strength. Before mechanization, the strongest person in the village had real economic advantage. After? Strength became table stakes. Anyone could rent a tractor.

We are watching the same thing happen to certain types of cognitive work. The ability to synthesize information, draft documents, analyze data, write code . . . these are becoming commodities. Not worthless, but no longer scarce.

The implications are significant, but they are also uneven. Not all tasks are affected equally. Which brings me to what I have started calling the "jagged edge."

The Jagged Edge

AI capability is not a smooth surface. It is jagged.

The models are remarkably good at some things and surprisingly bad at others. They excel at competition math and certain types of coding. They struggle with tasks that seem trivial. For years, models would confidently assert that 9.11 is less than 9.9.

More importantly, different tasks improve at different rates. Competition math went from impossible to essentially solved in about 18 months. Translation between major languages? Solved. But translation to Tlingit (a language spoken by a few hundred Native Americans)? Still poor, and likely to remain so.

The pattern is predictable if you know what to look for.

AI improves fastest when the task is digital (not physical), data is abundant, and the task is similar to things humans find tractable.

AI improves slowly when the task requires physical manipulation, data is scarce or proprietary, or the task involves edge cases with limited examples.

This means there will not be a single moment when "AI takes over." Instead, we will see a gradual, uneven transformation. Software development will be heavily augmented within a few years. Hairdressing and plumbing? Probably not in our lifetimes. The economic returns to skilled trades may actually increase relative to knowledge work. This would represent an inversion of the pattern we have seen for decades.

The practical question for any institution is: where do your critical tasks sit on this jagged edge?

Verifier's Law

Here is the framework I find most useful for evaluating AI projects.

There is an old idea in computer science called asymmetry of verification: for some tasks, it is much easier to verify a solution than to find one. Sudoku is the classic example. Solving it is hard; checking if a solution is correct takes seconds.

I have started calling this principle "Verifier's Law" when applied to AI adoption:

The ability to successfully deploy AI on a task is proportional to how easily you can verify the output.

This sounds simple, but the implications are profound.

Consider a simple matrix. On one axis: how easy is it for AI to perform the task? On the other: how easy is it for a human to verify the result?

The best projects sit where AI performs well AND humans can verify easily. Examples include code generation (does it compile? do the tests pass?), document drafting (can a human review it quickly?), and form completion (did the system accept it or reject it?).

The dangerous projects sit where AI performs well BUT verification is expensive or impossible. This is where institutions get hurt. Examples include insurance claims adjudication at scale, medical diagnosis without physician review, and legal document analysis with no human checkpoint.

I was recently in a conversation where someone mentioned a lawsuit against a major insurer for AI-driven claim denials. The AI could process claims at massive scale (easy for AI). But verifying each decision? Expensive, slow, and often skipped. The company could afford the lawsuit. Most institutions cannot.

Practical Application

So how do you use this framework?

First, map your candidate AI projects on both dimensions. For each potential use case, ask: Where does this sit on the jagged edge? Is AI currently good at this, or are we hoping it will get good? How easily can we verify the output? Is there an objective check, or does verification require as much effort as doing the task manually?

Second, start where verification is cheap. The boring projects are often the best ones. Form auto-completion. Document summarization with human review. Code suggestions that developers can accept or reject. These will not make headlines, but they will actually work.

Third, be honest about verification costs. If you are considering a project where AI can easily perform the task but verification is expensive, slow, or subjective . . . proceed with extreme caution. This is where the lawsuits come from. This is where trust erodes.

Fourth, build verification infrastructure before you need it. One insight that has emerged from recent conversations with healthcare CIOs and state technology officers: you can sometimes move a task from "hard to verify" to "easy to verify" by investing in verification infrastructure upfront.

Code is easier to verify if you have comprehensive test suites. Claims processing is easier to verify if you have clear decision trees and audit trails. Form completion is easy to verify because rejection is immediate feedback.

The institutions that will do best with AI are those that invest in verifiability before they deploy AI at scale.

A Note on Synthetic Data

One more tactical point. If you are exploring AI in a regulated environment, consider what I call the "special projects" approach: use synthetic data to test and iterate before touching production systems.

This does two things. First, it removes the compliance and privacy concerns that slow down experimentation. Second, it gives you a safe environment to discover where AI actually sits on the jagged edge for your specific use cases. You learn this before you have committed resources or made promises to stakeholders.

The goal is to fail cheaply, in a sandbox, rather than expensively, in production.

What This Means

I do not think AI will replace most knowledge workers in the next few years. I also do not think it is a passing fad that can be safely ignored.

What I do think is that institutions which adopt a disciplined framework (understanding the jagged edge, respecting Verifier's Law, investing in verification infrastructure) will navigate this transition better than those chasing headlines or paralyzed by uncertainty.

The question is not "Should we do AI?" The question is "Which AI projects can we verify, and therefore trust?"

Start there.

Shaunak Mali is the founder of BlackPeak Research. His work focuses on the intersection of systems design, verification, and risk in complex and regulated environments. He previously co-founded KarmaCheck and has worked across software, infrastructure, and enterprise systems.