Swiss boffins tease 'fully open' LLM trained on Alps super

Swiss boffins just trained a 'fully open' LLM on the Alps supercomputer

Source code and weights coming later this summer with an Apache 2.0 bow on top

Supercomputers are usually associated with scientific exploration, research, and development, and ensuring our nuclear stockpiles actually work.

Typically, these workloads rely on highly precise calculations, with 64-bit floating point mathematics being the gold standard. But as support for lower-precision datatypes continues to find its way into the chips used to build these systems, supercomputers are increasingly being used to train AI models.

This is exactly what the boffins at ETH Zürich and the Swiss Federal Technology Institute in Lausanne, Switzerland, have done. At the International Open-Source LLM Builders Summit in Geneva this week, researchers teased a pair of open large language models (LLMs) trained using the nation's Alps supercomputer.

As supercomputers go, Alps is better suited than most for running AI workloads alongside more traditional high-performance computing (HPC) applications. The system is currently the third-most powerful supercomputer in Europe, and eighth worldwide in the bi-annual Top500 ranking. It's also among the first large-scale supercomputers based around Nvidia's Grace-Hopper Superchips.

Each of these GH200 Superchips features a custom Grace CPU powered by 72 Arm Neoverse V2 cores, connected via a 900GB/s NVLink-C2C fabric to a 96GB H100 GPU. Those GPUs account for the lion's share of Alps' total compute capacity, with up to 34 teraFLOPS of FP64 vector performance. However, if you're willing to turn down the resolution a bit to, say, FP8, the performance jumps to nearly four petaFLOPS of sparse compute.

Built by HPE's Cray division, Alps features a little over 10,000 of these chips across 2688 compute blades, which have been stitched together using the OEM's custom Slingshot-11 interconnects. Combined, the system boasts 42 exaFLOPS of sparse FP8 performance or roughly half when using the more precise BF16 data type.

While Nvidia's H100 accelerators have been widely employed for AI training for years now, the overwhelming majority of these Hopper clusters have employed Nvidia's 8-GPU HGX form factor rather than its Superchips.

With that said, Alps isn't the only supercomputer to use them. The Jupiter supercomputer in Germany and the UK's Isambard AI, both of which came online this spring, also use Nvidia's GH200 Superchips.

"Training this model is only possible because of our strategic investment in 'Alps', a supercomputer purpose-built for AI," Thomas Schulthess, Director of Swiss National Supercomputing Centre (CSCS) and professor at ETH Zurich, said in a blog post.

The researchers have yet to name the models, but we do know they'll be offered in both eight-billion and 70-billion parameter sizes, and have been trained on 15 trillion tokens of data. They're also expected to be fluent in more than 1,000 languages, with roughly 40 percent of the training data being in languages other than English.

More importantly, the researchers say, the models will be fully open. Instead of releasing simply the models and weights for the public to scrutinize and tweak, as we've seen with models from Microsoft, Google, Meta, and others, researchers at ETH Zürich also intend to release the source code used to train the model and claim that the "training data will be transparent and reproducible."

"By embracing full openness — unlike commercial models that are developed behind closed doors — we hope that our approach will drive innovation in Switzerland, across Europe, and through multinational collaborations," EPFL professor Martin Jaggi said in the post.

According to Imanol Schlag, a research scientist at the ETH AI Center, this transparency is essential to building high-trust applications and advancing research in AI risks and opportunities."

What's more, researchers contend that for most tasks and general knowledge questions, circumventing web crawling protections wasn't necessary, and complying with these opt-outs showed no sign of performance degradation.

The LLMs are expected to make their way into public hands later this summer under a highly permissive Apache 2.0 license. ®

推荐订阅源

The Register - Special Features: SC25

Swiss boffins just trained a 'fully open' LLM on the Alps supercomputer