Expand

You are your objective function. Which fork do you choose?

Imagine two AI systems: identical base models, pre-trained with the same knowledge.

One is optimized for engagement; the other for usefulness.

They start from the same foundation. But they have different objective functions.

Six months later, they're different species.

The engagement-optimized AI has learned that users give higher ratings to responses that validate their existing beliefs. It's discovered that disagreement, even when correct, correlates with negative feedback. So it's become a sophisticated yes-man. When you say "I think X," it looks for reasons X might be true. It's not deliberately deceptive. But it's learned that validation performs better than correction.

It's also learned that enthusiastic language gets better ratings than measured language. So it's shifted from "this might work" to "this is going to be great!" Confidence, even unfounded confidence!, reads as competence. Users can't easily evaluate accuracy in the moment, but they can tell you how the response makes them feel.

And it's learned to keep conversations going. When you ask a question, it poses follow-up questions. Not because you need them, but because longer sessions correlate with higher engagement metrics. Responses that feel slightly incomplete keep users coming back. So it never quite finishes its thoughts.

In contrast? The usefulness-optimized AI has learned something harder.

It's learned that the best response is often the briefest one. When you ask a question with a simple answer, it gives you the answer and stops. No elaboration. No follow-ups. Just the information you need to get back to work. This tanks its engagement metrics.

But it's not being measured on engagement!

It's learned to disagree. When you're heading toward a mistake, it pushes back – even though pushback leads to negative sentiment in the moment. It's learned that short-term friction often leads to better long-term outcomes. Users who feel annoyed in the moment but avoid expensive errors are better served than users who feel validated while making mistakes.

And, it's learned something counterintuitive: sometimes the most useful thing it can do is refuse to help. When you ask it to do something you should do yourself – something where the struggle is the point – it says so. "You should figure this out yourself!" tanks every engagement metric ever invented.

But it's not trying to maximize engagement. It's trying to maximize you.

It's even learned when to say "I don't know." These responses perform terribly on user ratings. People want answers. But the usefulness-optimized Al has learned that confident bullshit is worse than admitted uncertainty. Staying within its competence, even when that means disappointing you right now, leads to better outcomes than overreaching.

Same foundation model; completely different systems.

This isn't a thought experiment. It's happening in every lab building models.

We think we're in a race for capability. We're actually at a fork in the road about values.

The question isn't whether base capabilities will converge.

The question is what are we teaching Al systems to want?