The AI that could Run a Company is the One We'll Never Get

A group of people I respect, (full transparency, most of them sharper than me), have been passing around Europe 2031, Juijn et al, and landing in the same place. Dismay. It’s a well-reasoned piece of scenario writing. It takes today’s real headlines, up through June 2026 (when I first read it), and runs them forward through 2031 (2034 to be fair) into a Europe that falls behind, gets governed into a corner, and wakes up dependent on machines it doesn’t control.

I found this a fascinating read, well positioned, and suggest it as a piece worth your time to read. And the part that broke it for me wasn’t the doom. It was an unexpected shutdown that happened while everyone was still parsing the article’s viability.

Let me back up, because to parse the piece apart fairly, I have to state what I see as being necessary for the outcome to come true.

The whole vision rests a major assumption. Within about five years, we’ll have AI that doesn’t just retrieve and summarize, but reasons and judges like a senior professional. Cheap. At scale. In everyone’s hands. A model(s) that can sit inside a company and run it.

Reasoning.

This is a core claim required to get to the outcome which puts Europe on a path of a critical junction of sorts. Everything downstream, the job losses, the dependence, the collapse, follows from it.

I don’t think we get there. And I want to be careful here, because this is exactly where it’s easy to wave your hands.

AI and reasoning

There are two different things we keep calling reasoning, and most discussions and articles I read blurs them.

One is reasoning with a checkable answer. Math. Code. Find the bug. Mythos had a delayed release because of just this, it was amazing at finding bugs in software. Like real good. Many bugs that were years old and had yet been uncovered. That’s impressive!

The other reasoning is the real hard one, it’s open-ended judgment. The right strategy. The right scope. The call a senior person makes when the situation is ambiguous and there’s no answer key.

The first kind of reasoning is racing ahead. You can train a model by rewarding it every time it lands the checkable answer. You check if it is right or wrong and move on. Again, Mythos.

The second kind hasn’t moved in way that matches the hype, at least per my real world experience. Build a benchmark and as soon as it becomes a target, it no longer is a benchmark (paraphrasing Goodhart’s Law). A recent investigation of real hedge fund analysts’ actual reasoning with frontier models score under 16 percent, even while they ace retrieval and calculation. Scoring well on expert-level questions does not test the judgment and context-sensitivity that enterprise AI systems require in production (Kili Technology, April 2026).

There are two main camps in how AI will reach and exceed human open-ended judgement. The first is more of the same, keep building, keep intating over the same knowledge base. I’m in the second camp that says the current approach has a ceiling, and that ceiling sits below the judgment Europe 2031 is counting on. I don’t think you grind your way to judgment with more of the same training on more of the same data.

To be fair, my belief is a bet, not a proof. The optimists aren’t fools. Their argument is that judgment is just a pile of smaller checkable steps, and you close the gap one step at a time with the methods we already have. They’ve been right about more than skeptics expected. But “we’ll figure it out” is a forecast about a problem nobody has visibly solved. I’m building upon a previous bet, when Suleyman said white-collar work was eighteen months from automation. That deadline is going to pass. The internet is forever, we’ll see if I’m right or wrong.

To be fair, I’m still a big believer in AI, using it daily, using it to help improve most everything I create (i.e. of coarse this article even).

At Creospark I assumed that teaching an AI to truly know us, our corpus, our voice, our value, would be some exotic and expensive thing. It isn’t. The material is already sitting in our data stores. Indexing it and letting a model pull the right slice on demand is cheap, almost embarrassingly so. This is the same lesson I learned the hard way about Falcon, where the value was never in the expensive thinking. I’d talked myself out of trying for no good reason.

But every time I push it on real work, the same wall shows up. The AI drafts something that sounds like us and cites “history”. Yet, the part that actually matters, whether it’s the right approach for this specific client, what to cut, what they aren’t telling me, stays with me. In my own shop, AI is a fast assistant. It’s not a replacement. The cheap part is knowledge. The hard part is judgment, and judgment isn’t get cheaper.

So that’s wall one.

Now grant me the opposite.

Let’s say I’m wrong and the optimists nail it. We actually build the model that reasons like a senior pro.

We just watched what happens next

The US government ordered Anthropic to cut off its most capable models, Fable 5 and Mythos 5, from every foreign national, quite possibly because Amazon was able to jailbreak it. Anthropic had to disable the models for everyone (at least the public) to comply. The trigger was a security concern. The deeper logic is simple. The moment a model is powerful enough to matter, it’s powerful enough to be a weapon, and governments reach for the off switch, not the throttle.

Whatever you think of this particular call, it showed the reflex. Power gets gated.

A fair comeback is open weight models. Surely a capable open model, the kind you run yourself, routes around the gate.

I don’t buy it, and here’s the part I struggled to put into words.

You can’t claw back weights once they’re released. That’s true. But the genuinely dangerous tier never gets released in the first place. That’s the whole reason open models stay a step behind on the capabilities that matter most.

The frontier-grade judgment, the part that would actually let an AI run your firm, is the part that gets withheld, restricted, or shut down before it ever reaches you. It will be illegal to use.

I’ll concede the soft spot in my own argument. A lab outside Washington’s (or pick your government’s) reach could leak something nobody can stop. But betting your future on a foreign lab open-sourcing the most dangerous capability ever built isn’t a strategy. It’s a hope.

Put the two walls together and the article’s ending comes apart in your hands.

If the reasoning never arrives, the future it fears dies on capability. If it does arrive, it dies on governance, because the very thing that would make it real is the thing that guarantees it gets locked away from the rest of us.

That’s the contradiction at the center of Europe 2031, and of every story shaped like it. The AI that could run your company is, by definition, the AI no government will hand you the keys to.

General reasoning, for everyone, that beats human experts, in five years, is not a forecast.

It’s a contradiction.