It’s Open Source, so open measurements are hard to come by, but given that they don’t offer their own platform APIs (apart from a “limited free trial on waitlist for US based developers), OpenRouter is probably a decent proxy.
It’s pretty bad. Both DeepSeek and Qwen have surpassed Llama on daily token generation by a wide margin (~7x for DeepSeek) and both of these offer official platform APIs with very competitive pricing, so OpenRouter is only a tiny fraction of their trafffic.
Llama 4, widely derided for benchmark juicing only accounts for about 40% of tokens served by Meta models, the much older llama70B still dominating the distribution. Whatever you could do with that model, you can do at a fraction of the compute cost and better performance either way any of the newer Open Source models.
Anyone in the industry long enough (or has once been on the Meta partnership team) sees the signs of Zucks ruthless prioritisation here: Llama is dead animal walking. As predicted in January, the Open Source whales from China have taken over and without trade barriers, the crown isn’t going back to Meta.
In a way it probably doesn’t have to. Model benchmark virtue signalling to investors has been replaced with vastly cheaper researcher hire virtue signalling, a much easier metric to game as the company with the most cash to spend will just win … a beautiful allegory on Meta’s product and the illusion of speech it offers.