38 trillion tokens. That's the scale of data Liquid AI used to train LFM2.5-8B-A1B, the latest addition to the company's Liquid Foundation Model line, announced on their blog. For context, Meta's Llama 3 70B - a model nearly nine times larger by parameter count - was trained on 15 trillion tokens. Liquid AI is making a substantial bet on data volume for a relatively small model.
The name unpacks as: 8 billion total parameters, 1 billion active. That last part matters. The model uses a mixture-of-experts (MoE) architecture, which means it routes each query through only a fraction of its total parameters rather than engaging all of them at once. In practice, this makes the model cheaper to run at inference time - the moment when it's actually answering a question or generating output - than a standard 8 billion parameter model, while the dormant parameter groups still contribute to what the model has learned.
The Data Volume Bet
Training a smaller model on significantly more data is a deliberate strategy. Research from DeepMind in 2022 - commonly called the Chinchilla scaling laws - showed that model quality scales with both parameter count and training data volume, and that many large models at the time were undertrained relative to their size. Running a smaller model on more data can produce results that match or beat a larger model trained on less.
At 38 trillion tokens on an 8 billion parameter model, Liquid AI is pushing that approach further than most public training runs have gone at this size class.
What Makes Liquid AI Different
Liquid AI was founded by researchers from MIT who developed liquid neural networks - a model architecture distinct from the transformer design used in most commercial AI models. Where transformers process text through attention layers that look at relationships across the full input, liquid neural networks use dynamics based on differential equations, which the researchers argue handles sequential and time-varying data more efficiently.
The LFM2.5 line represents Liquid AI's production-grade output. A data-heavy, MoE-based 8 billion parameter model that runs at approximately 1 billion active parameters is practical for on-device deployment, edge inference, and cost-sensitive API workloads - use cases where compute cost per query matters more than achieving the highest possible benchmark score.
Full benchmark comparisons are detailed in Liquid AI's blog post.