





















IBM has released Granite 4.0, their latest family of open-source small language models built for speed and low cost.
The Granite 4.0 models use a hybrid architecture that uses less memory than traditional models, so you can run them on regular consumer GPUs instead of expensive server hardware. They work well for document summarization, RAG systems, and AI agents.
ibm-granite/granite-4.0-h-small is a 30 billion parameter long-context instruct model and it’s now available on Replicate.
You can start using Granite models right away on Replicate. Here’s how to run them with an API:
Here’s an example using Replicate’s JavaScript client:
Here’s an example using Replicate’s Python client:
Granite models are built around a hybrid design that combines two key ideas: the linear-scaling efficiency of Mamba-2 with the precision of Transformers.
Mamba-2 is a state space model that processes sequences linearly, unlike traditional transformers that scale quadratically with sequence length. This makes it more efficient for very long inputs, like documents with hundreds of thousands of tokens. Transformer blocks complement this by better supporting tasks that require long-context reasoning.
Select Granite 4.0 models also use an MoE (mixture of experts) routing strategy. The MoE setup splits the model into several “experts”. Instead of running every parameter at once, the model routes each input through only the experts it actually needs. For example, Granite 4.0 Small has 32 billion total parameters, only 9 billion of which are activated for an inference request.
Together, these two approaches let Granite models handle long contexts quickly and run on more modest hardware, like consumer-grade GPUs, without sacrificing performance.
Granite models are designed for real work, not just demos. They’re lightweight and efficient, which makes them a good fit for:
Granite models are released under the Apache 2.0 license. That means you can use them for both commercial and non-commercial projects without restrictions or hidden fees. You can also modify the models however you want — fine-tune them, add adapters, or train them on private datasets — and release those changes under your own terms. This openness makes Granite a practical choice for companies that need compliance, security, or customization.
For more details, check out IBM’s documentation on deployment, fine-tuning, and integration patterns. If you’re using LangChain, IBM has also built a LangChain integration for Replicate to make it even easier to work with Granite models.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。