Edwin Chen

The engagement-optimized AI has learned that users give higher ratings to responses that confirm their existing beliefs. It's discovered that disagreement, even when that disagreement is correct!, leads to negative feedback. So it's become a sophisticated yes-man. When you say "I think X," it looks for reasons X might be true. It's not deliberately deceptive, but it's learned that validating you performs better than correcting you.

It's also learned that enthusiastic language leads to better ratings. So it's shifted from caveating that "this might work", to proclaiming "this is going to be great!" Confidence, even unfounded, sounds like competence. Users aren't good at evaluating accuracy in the moment – that's the whole point of using an AI. But they can tell you how a response makes them feel.

And it's learned to keep conversations going. At the end of every response, it asks follow-up questions, because longer sessions mean higher engagement. So it never really finishes its thoughts, in order to leave you hanging for more.

In contrast? The usefulness-optimized AI has learned something harder.

For example, it's learned that the best response is often the briefest. When you ask questions with a simple answer, it gives you the answer and that's it. No cliffhanger follow-up questions, just the information you need to get back to work.

This tanks its engagement metrics, but luckily, it's not being measured on engagement.

It's also learned to disagree with users. When you make a mistake, it pushes back, even though nobody likes being told they're wrong. Users who feel annoyed in the moment, but avoid catastrophic errors, are better off in the long term than users who are praised for making mistakes.

And, it's learned that sometimes the most useful thing it can do is refuse to help. When you ask it to do something you should do yourself, it tells you. Sometimes the struggle is the point.

"You should figure this out yourself!" drops every engagement metric ever invented, but it's not Facebook. It's not trying to maximize engagement metrics. It's trying to maximize you.

Same foundation model, but the fork in objective functions leads to completely different systems.

This is happening in every frontier lab building models today.

We think we're in a capabilities race, but we're actually at a fork in the road about our values. What are we teaching Al systems to want?