




















The AI inference market is unlike anything the enterprise IT infrastructure industry has seen before. It’s not a product cycle. It’s a demand wave that’s broad, accelerating, and architecturally complex in ways that reward specialization. Generating video is a fundamentally different compute problem than running a chatbot. Optimizing a supply chain with an agentic AI system makes different demands on hardware than accelerating drug discovery. Real-time financial modeling is not the same as sovereign language model inference for a national government. The workloads are multiplying. The performance requirements are diverging. And the economics of getting the wrong silicon for the wrong job are increasingly severe.
Inference is a market that won’t be dominated by a single vendor, nor will it be served by a duopoly. The inference wave is too large, too fragmented, and too economically consequential for any two incumbents to capture. Instead, what’s emerging is a heterogeneous silicon landscape, where purpose-built accelerators (each optimized for a specific class of workload or deployment context) are earning real wins against the GPU establishment. Cerebras and a growing field of inference-focused startups have all found commercial traction.
The question has shifted away from whether alternatives to GPU-centric AI infrastructure can exist. The question is now which ones will scale, and which will find the architectural affinity with specific workloads and buyers that turns early adoption into a viable long-term market position.
Tenstorrent is one of the more interesting companies navigating this moment — and its recent TT-Deploy launch event is worth examining closely.
Into the inference market has stepped Tenstorrent. Led by semiconductor rockstar Jim Keller (the architect behind iconic chip designs at AMD, Apple, Intel, and Tesla), Tenstorrent has been building toward this for years. And at its TT-Deploy launch event, the company answered the central question the market had been asking: Can Tenstorrent’s AI accelerator architecture ship, scale, and deliver the economics it has long promised? The answer, based on what was presented, is a credible yes, with the appropriate caveats that accompany any company at this stage of its trajectory.
TT-Deploy wasn’t really a single product reveal. Rather, it felt like a repositioning of Tenstorrent as a full-stack AI infrastructure company: hardware, software, benchmarks, ecosystem partners, and paying production customers. At the center was the Galaxy Blackhole system, a 6U server outfitted with 32 Blackhole RISC-V AI accelerator chips, now shipping in volume. The system scales from a single Galaxy to a 36-node Super Cluster using standard Ethernet and a switchless torus topology, eliminating dedicated networking switch infrastructure entirely (more on this later).
The Blackhole chip is built around a two-dimensional torus array of Tensix cores. For those unfamiliar with this, a two-dimensional torus array is a chip architecture in which compute cores connect directly to neighboring cores in a mesh-like pattern, enabling fast, distributed data movement across the processor. The benefit is more efficient scaling and lower communication overhead for complex AI workloads that depend heavily on moving data between compute elements.
Each Tensix core contains five RISC-V processors, a matrix multiply unit, a vector unit, and local SRAM along with two NOC routers. In this case, don’t consider these RISC-V processors in the traditional CPU sense. These are more like specialized control and dataflow processes that handle different stages of work inside the core.
This architecture is fundamentally different from GPU design: There is no traditional SIMD-centric GPU execution hierarchy. Every core is independent and progresses at its own pace, and data movement is a first-class architectural priority rather than an afterthought.
Galaxy, the 6U system mentioned previously, is comprised of 32 Blackhole chips. This serves as Tenstorrent’s primary deployment building block. At this system level, Galaxy delivers up to 23 PFLOPS of Block FP8 compute, 1 TB of GDDR6 memory with 16 TB/s of bandwidth, 6.2 GB of on-chip SRAM delivering 2.9 PB/s, and up to 56 x 800G Ethernet ports for scale-out connectivity.
What matters more than the raw specifications, however, is the logic and philosophy behind them. Tenstorrent intentionally chose GDDR6 over HBM, standard Ethernet over proprietary fabrics, and air cooling over liquid cooling. These are deliberate architectural decisions aimed at reducing complexity and improving inference economics at scale. Especially around that ever increasing metric of cost-per-token.
Galaxy scales in a structured way. As mentioned, 32 Blackhole chips form a Galaxy. Four Galaxies form a quad, and quads interconnect in an all-to-all cabled torus to form a supercluster of up to 36 nodes. Interestingly (and critically), there are no Ethernet switches in this path. Traffic routes through the Galaxy systems themselves. Idle quads can even be repurposed as fabric switches — an architectural flexibility that has no GPU equivalent.
The Galaxy software stack is fully open-source and organized in layers, each targeting a different part of the development and deployment workflow:
One of the more interesting demonstrations at TT-Deploy was AI-assisted CUDA kernel conversion into TT-Lang. If that capability proves viable at enterprise scale, it could meaningfully reduce one of the biggest barriers facing alternative AI infrastructure providers: migrating existing CUDA-based software environments.
While every vendor wants to support everything, it’s good to have that “killer app” that sets a vendor apart. In Tenstorrent’s case, it appears to have three areas of strong alignment, and each was demonstrated using examples from customers.
The most obvious fit is large-scale AI inference, especially mixture-of-experts models like DeepSeek. This is where Galaxy’s design seems naturally suited to sparse, data-heavy workloads where moving data efficiently matters as much as raw compute. The claim of more than 350 tokens per second per user at roughly $6 per million tokens is really the core story here.
The second area is AI-generated video. These workloads tend to be memory-bandwidth-intensive and heavily dependent on fast collective communication between systems — both architectural strengths of Galaxy. An on-stage demonstration that generates a 720p 81-frame video in 2.4 seconds gives the company a tangible proof point in a market that is growing quickly.
The third category may actually be the most strategically interesting: sovereign AI infrastructure. Tenstorrent explicitly highlighted India and Japan, which feels intentional. Its use of RISC-V avoids some of the geopolitical and licensing concerns we’ve seen, while the open-source software stack and Equinix deployment model give governments and regional providers a way to build AI infrastructure more freely.
Tenstorrent is not trying to win a raw benchmark contest against NVIDIA or others. That would be the wrong fight. Rather, the company is making a fundamental case for how AI inference infrastructure should be built:
It is a smart framing. More importantly, it’s increasingly the framing that production AI buyers respond to.
The comparison table below reflects Moor Insights & Strategy analysis based on publicly available information as of May 2026. It is not a comprehensive benchmark comparison.
[chart here]
| Dimension | Tenstorrent Galaxy | NVIDIA NV72 | Groq LPX | Cerebras CS‑3 |
| Compute Approach | RISC-V + Tensix cores | GPU (CUDA cores) | LPU (dedicated) | Wafer-scale chip |
| Inference Throughput | 350+ t/s/u (claimed) | Industry-leading | Latency-focused | Throughput-focused |
| Token Economics | ~$6/M tokens (claimed) | ~$30/M tokens (est.) | Premium pricing | Premium pricing |
| Scale-out Fabric | Switchless Eth torus | NVLink/NVSwitch | Custom LPU fabric | Single wafer chip |
| Software Model | 100% open-source | CUDA (proprietary) | Proprietary | Proprietary |
| Model Coverage | ~90% HF (claimed) | Near-universal | Curated | Limited |
| Training Support | Limited / emerging | Full | Inference only | Full |
| Ecosystem Maturity | Early / building | Deep and established | Growing | Growing |
Tenstorrent’s sharpest competitive claim is economic: GPU systems become increasingly inefficient at the high end of the tokens-per-second-per-user curve, because higher throughput demands fewer concurrent users, reducing utilization and raising cost-per-token.
Tenstorrent claims to escape that constraint by maintaining $6 per million tokens at 350-plus t/s/u and targeting 500 t/s/u at the same cost floor. Against competitors like Groq and Cerebras, the differentiation is less about raw peak throughput and more about flexibility. The company is positioning Galaxy as an architecture that can support a wider variety of models and workload types, rather than being optimized primarily for a narrower set of inference scenarios. If those claims hold at production scale, that becomes a meaningful competitive advantage.
In short, yes.
The accelerator landscape is not a winner-take-all competition for a single workload. We are entering an era of deeply heterogeneous AI compute, where the right silicon for generating video is different from the right silicon for running a chatbot, which is different from the architecture best suited for agentic AI optimizing supply chains, which is different again from the accelerator ideal for drug discovery. The era of “one chip to rule them all” is giving way to purpose-built compute matched to specific workload economics.
To understand why inference is different, it helps to look at AI training. NVIDIA effectively won the training era before most of the industry recognized there was a race. CUDA gave developers a powerful, accessible abstraction over GPU compute at a time when no credible alternative existed. Frameworks were built on it. Libraries were optimized for it. Talent was trained in it.
By the time the deep-learning wave arrived in earnest, CUDA wasn’t just a competitive advantage, it was the AI software ecosystem substrate. Competing in training meant competing against that, and almost no one could. The result was a near-monopoly not just on training silicon, but on the developer mindset, the tooling ecosystem, and the institutional defaults of every major AI team on the planet.
Inference is being contested before any single architecture has achieved that kind of lock-in. There is no inference equivalent of CUDA. Workloads are diverse and multiplying, and organizations from the smallest AI-native startup to the largest enterprise increasingly need the right silicon for the right job. Buyer priorities have shifted from raw throughput to cost-per-token, latency profiles, and operational economics at scale.
The absence of an entrenched moat is exactly what creates this opening, and why a company like Tenstorrent, purpose-built for inference with data movement and token economics, can build a credible position from a standing start.
RISC-V’s open, modular ISA enables the deep, instruction-level architectural customization that Tenstorrent’s tensor processor design demands. The five RISC-V processors in each Tensix core handle distinct pipeline tasks (ingestion, formatting, compute, reformatting, output) in a tightly coupled arrangement that no x86 or GPU architecture could efficiently replicate.
The strategic dimension matters, too. RISC-V is an open standard with no single national owner and this could be a differentiator in sovereign AI procurement conversations.
One area to watch: RISC-V’s developer tooling and software library depth remain well behind Arm and x86. Tenstorrent’s open-source stack largely abstracts this from end users, but it creates ongoing engineering investment requirements that the company will need to sustain as the platform matures.
MI&S sees Tenstorrent’s likely near-term wins concentrated in three market segments:
Longer term, the Equinix partnership may be the most strategically important development from TT-Deploy. A distribution partner with 248 facilities across 75 metros and 35-plus countries gives Tenstorrent enterprise reach it could not build organically. As the BetterBrain application layer and OrionVM orchestration mature, this becomes a full-stack enterprise AI platform — and that is the path to the broader market.
Tenstorrent’s momentum is real. So are the challenges ahead. Here are four to keep in mind.
Ecosystem depth is measured in developer years, not benchmark results. NVIDIA’s moat isn’t the B300, it’s CUDA. More than 15 years of developer tooling, library integrations, and institutional familiarity create a stickiness (and switching costs) that no benchmark can dislodge.
Tenstorrent’s open-source stack and TT-Lang CUDA migration path are the right counter, but the gap between “runs the model” and “deeply optimized for enterprise production with full support coverage” is real. Closing it takes years, not quarters.
Independent validation remains thin. The Artificial Analysis benchmarks are a good first step, but they are Tenstorrent-initiated, not independently designed and executed. The token economics claims, the CUDA migration path, and the 90% model pass rate are all plausible and interesting, but until independently corroborated at production scale, they carry an asterisk with seasoned buyers. Independent third-party validation is the single highest-leverage credibility investment Tenstorrent can make in the next twelve months.
The innovator’s dilemma is real. Tenstorrent wins today with technically sophisticated buyers who can evaluate non-standard hardware. That is the right place to start. But expanding into the general enterprise is a different motion. These organizations have longer sales cycles, lower risk tolerance, and heavier integration requirements. Moving down-segment without a more explicit enterprise narrative risks leaving the company structurally confined to the innovator tier even as the broader market expands around it. The enterprise narrative needs to be built now, even though the revenue comes later.
Training coverage is a boundary condition. Galaxy Blackhole is an inference platform first and foremost, and Tenstorrent has been transparent about it. For organizations evaluating full-lifecycle AI infrastructure, NVIDIA will remain the default. This is not a fatal limitation, because inference is where the volume AI spend is heading. But it constrains the total addressable relationship Tenstorrent can have with large enterprises seeking a single integrated partner.
It is worth noting that Tenstorrent does claim training support, though the inference story is significantly stronger today. Tenstorrent should invest in maturing that capability and build the training-ecosystem partnerships that complement its inference-first positioning. That is the path to a more complete enterprise story.
Tenstorrent’s TT-Deploy event marked a genuine inflection point. Not because the architecture changed, but because the company demonstrated it can be deployed, scaled, and economically justified in production. That is harder to prove than architectural elegance, and Tenstorrent largely proved it.
Jim Keller has assembled a rare team: deep hardware architects, software engineers committed to openness, and a growing ecosystem of partners investing real capital alongside them. The architecture is sound. The economics, especially at scale, are compelling. And the Equinix partnership creates market reach that Tenstorrent could not have built organically.
The broader lesson of TT-Deploy is that we are entering a period of accelerating workload diversity. Video generation, agentic AI, sovereign infrastructure, real-time edge inference — these are use cases and deployment models where the idea that a single architecture serves all of them optimally is increasingly indefensible. Tenstorrent is a well-timed, well-designed bet on that heterogeneous future.
The challenge ahead is to find the architectural affinity plays where Tenstorrent wins decisively, use those wins to build ecosystem depth, and press ahead from that beachhead into the broader enterprise.
The innovator’s dilemma is real, but so is Tenstorrent’s team, its architecture, and its window of opportunity.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。