Meta built an internal leaderboard called "Claudeonomics" that ranked 85,000 employees by token consumption. Titles like "Token Legend" and "Session Immortal" went to the top spenders. In 30 days, employees burned through 60 trillion tokens. Shopify did something similar before that. OpenAI has one too.
The intention makes sense. You want your team using AI, building with it, getting comfortable. Celebrating usage drives adoption and it worked.
The problem is what gets measured alongside it. Some employees left agents running idle for hours just to climb the rankings. Token consumption became the metric, so token consumption became the goal. Meta pulled the leaderboard two days after the story went public.
Using AI and spending on AI are not the same thing. Most companies are starting to treat them as if they are. And it's not just leaderboards. The same logic shows up in every team that ships an AI feature without ever asking what it actually costs to run.
What's Actually Happening
When a company tells every employee to use AI and not worry about it, what they're really doing is deferring a conversation. Not canceling it.
Someone builds an internal tool that calls a large model on every user interaction. Works great in the pilot. Feels fast, feels smart. Then it rolls out to the whole company and the API bill is three times what anyone expected. Or a customer-facing feature gets traction and the margin math stops working because the cost per transaction was never modeled at scale.
This isn't a hypothetical. It's the current state of most AI-assisted products that moved from experiment to production in the last 18 months. The cost was always there. It just wasn't visible until it was someone's problem.
The "We'll Optimize Later" Bet
The most common response to this is: everything is changing so fast that thinking about cost now is the wrong move. Ship, learn, adjust. That's a reasonable position. Lots of smart people hold it.
But there's a difference between not over-engineering and not thinking at all. The teams that are getting this right aren't the ones that spent three months modeling cost before writing a line of code. They're the ones that had a basic mental model of where the expensive calls were going to land before they shipped, not after.
Optimizing later sounds practical until 'later' arrives and the architecture that made sense at ten users doesn't make sense at ten thousand, and the cost of changing it is now a six-week project that nobody budgeted for.
Why This Keeps Happening
It's not that people don't know. Most engineers and product people building with AI right now are aware that token costs compound. They're skipping the conversation because the pressure to ship is real, the costs feel abstract in the early stages, and "we'll figure it out" is an easy thing to say when everything is moving fast and nothing has broken yet.
The bills that make people pay attention don't arrive during the build. They arrive weeks later, in an infrastructure invoice that lands on someone's desk in finance, gets escalated, and turns into a conversation that product and engineering have to explain retroactively.
By that point, the options are: absorb the cost, rebuild the architecture, or reduce the scope of what the product does. None of those are good options when you're already in production.
What To Actually Ask Before You Build
Not at the idea stage. At the start of the build. Before the first line of production code. These are not questions that slow a team down. They're the questions that keep you from having to rebuild under pressure.
Do you actually need AI here, or does a simpler model do the job? LLMs are the default right now because they're accessible and impressive. But a lot of what gets built with a large model could run on a smaller one, a classifier, or plain ML. The cost difference isn't marginal. Before you wire up a large model, it's worth asking what you actually need it to do and whether something cheaper does that well enough.
Do you have a structured data layer? Most expensive AI systems are expensive because they're compensating for bad data. If the context window is full of noise because the underlying data isn't clean or organized, you're paying for tokens that add nothing. A structured data layer isn't just good engineering. It directly reduces what you spend per call.
Where are you calling the model, and how often? One call per user action feels fine at small scale. Before you ship, you should know which calls are high-frequency, which results can be cached, and where you're using a large model for something that doesn't need one.
Who owns this number? Token costs don't show up in the product. They show up in infrastructure bills that someone sees weeks later. If nobody on the product or engineering side is watching that number as a metric, it will grow until it becomes a crisis and by then it's someone else's problem to untangle.
The goal isn't to slow down AI adoption. It's to make sure what you build actually holds up when it matters.


