Open R1: Update #4

Hugging Face - Blog

Waypoint-1.5: Higher-Fidelity Interactive Worlds for Everyday GPUs ALTK‑Evolve: On‑the‑Job Learning for AI Agents Safetensors is Joining the PyTorch Foundation Holo3: Breaking the Computer Use Frontier Any Custom Frontend with Gradio's Backend A New Framework for Evaluating Voice Agents (EVA) Bringing Robotics AI to Embedded Platforms: Dataset Recording, VLA Fine‑Tuning, and On‑Device Optimizations One-Shot Any Web App with Gradio's gr.HTML CUGA on Hugging Face: Democratizing Configurable AI Agents New in llama.cpp: Model Management Building Deep Research: How we Achieved State of the Art OVHcloud on Hugging Face Inference Providers 🔥 20x Faster TRL Fine-tuning with RapidFire AI Building for an Open Future - our new partnership with Google Cloud Aligning to What? Rethinking Agent Generalization in MiniMax M2 Building a Healthcare Robot from Simulation to Deployment with NVIDIA Isaac Sentence Transformers is joining Hugging Face! Unlock the power of images with AI Sheets Supercharge your OCR Pipelines with Open Models Google Cloud C4 Brings a 70% TCO improvement on GPT OSS with Intel and Hugging Face Get your VLM running in 3 simple steps on Intel CPUs Nemotron-Personas-India: Synthesized Data for Sovereign AI Introducing RTEB: A New Standard for Retrieval Evaluation Accelerating Qwen3-8B Agent on Intel® Core™ Ultra with Depth-Pruned Draft Models VibeGame: Exploring Vibe Coding Games Nemotron-Personas-Japan: ソブリン AI のための合成データセット Swift Transformers Reaches 1.0 – and Looks to the Future Smol2Operator: Post-Training GUI Agents for Computer Use SyGra: The One-Stop Framework for Building Data for LLMs and SLMs Gaia2 and ARE: Empowering the community to study agents Scaleway on Hugging Face Inference Providers 🔥 Democratizing AI Safety with RiskRubric.ai Public AI on Hugging Face Inference Providers 🔥 `LeRobotDataset:v3.0`: Bringing large-scale datasets to `lerobot` Visible Watermarking with Gradio Introducing the Palmyra-mini family: Powerful, lightweight, and ready to reason! Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers Fine-tune Any LLM from the Hugging Face Hub with Together AI Jupyter Agents: training LLMs to reason with notebooks mmBERT: ModernBERT goes Multilingual Welcome EmbeddingGemma, Google's new efficient embedding model SAIR: Accelerating Pharma R&D with AI-Powered Structural Intelligence Make your ZeroGPU Spaces go brrr with ahead-of-time compilation NVIDIA Releases 6 Million Multi-Lingual Reasoning Dataset Generate Images with Claude and Hugging Face From Zero to GPU: A Guide to Building and Scaling Production-Ready CUDA Kernels MCP for Research: How to Connect AI to Research Tools Kimina-Prover-RL Arm & ExecuTorch 0.7: Bringing Generative AI to the masses Neural Super Sampling is here! TextQuests: How Good are LLMs at Text-Based Video Games? 🇵🇭 FilBench - Can LLMs Understand and Generate Filipino? Introducing AI Sheets: a tool to work with datasets using open AI models! Accelerate ND-Parallel: A guide to Efficient Multi-GPU Training Vision Language Model Alignment in TRL ⚡️ Welcome GPT OSS, the new open-source model family from OpenAI! Measuring Open-Source Llama Nemotron Models on DeepResearch Bench 📚 3LM: A Benchmark for Arabic LLMs in STEM and Code Implementing MCP Servers in Python: An AI Shopping Assistant with Gradio Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face Say hello to `hf`: a faster, friendlier Hugging Face CLI ✨ Parquet Content-Defined Chunking TimeScope: How Long Can Your Video Large Multimodal Model Go? Fast LoRA inference for Flux with Diffusers and PEFT Accelerate a World of LLMs on Hugging Face with NVIDIA NIM Arc Virtual Cell Challenge: A Primer Consilium: When Multiple LLMs Collaborate Back to The Future: Evaluating AI Agents on Predicting Future Events Five Big Improvements to Gradio MCP Servers Ettin Suite: SoTA Paired Encoders and Decoders Migrating the Hub from Git LFS to Xet Kimina-Prover: Applying Test-time RL Search on Large Formal Reasoning Models Asynchronous Robot Inference: Decoupling Action Prediction and Execution ScreenEnv: Deploy your full stack Desktop Agent Building the Hugging Face MCP Server Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders Creating custom kernels for the AMD MI300 Upskill your LLMs With Gradio MCP Servers SmolLM3: smol, multilingual, long-context reasoner Three Mighty Alerts Supporting Hugging Face’s Production Infrastructure Efficient MultiModal Data Pipeline Announcing NeurIPS 2025 E2LM Competition: Early Training Evaluation of Language Models Training and Finetuning Sparse Embedding Models with Sentence Transformers Welcome the NVIDIA Llama Nemotron Nano VLM to Hugging Face Hub Gemma 3n fully available in the open-source ecosystem! Transformers backend integration in SGLang (LoRA) Fine-Tuning FLUX.1-dev on Consumer Hardware Groq on Hugging Face Inference Providers 🔥 How Long Prompts Block Other Requests - Optimizing LLM Performance Learn the Hugging Face Kernel Hub in 5 Minutes Convert Transformers to ONNX with Hugging Face Optimum Intel and Hugging Face Partner to Democratize Machine Learning Hardware Acceleration Director of Machine Learning Insights [Part 3: Finance Edition] The Annotated Diffusion Model Deep Q-Learning with Space Invaders Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers Introducing Pull Requests and Discussions 🥳 Efficient Table Pre-training without Real Data: An Introduction to TAPEX An Introduction to Q-Learning Part 2/2 How Sempre Health is leveraging the Expert Acceleration Program to accelerate their ML roadmap

Open R1: Update #4

Leandro von Werra, Vaibhav Srivastav, Daniel Vila, Yacine Jernit · 2025-03-27 · via Hugging Face - Blog

Back to Articles

Welcome DeepSeek-V3 0324

This week, a new model from DeepSeek silently landed on the Hub. It’s an updated version of DeepSeek-V3, the base model underlying the R1 reasoning model. There isn’t much information shared yet on this new model, but we do know a few things!

What we know so far

The model has the same architecture as the original DeepSeek-V3 and now also comes with an MIT license, while the previous V3 model had a custom model license. The focus of this model release was on improving the instruction following as well as code and math capabilities. Let’s have a look!

How good is it?

The DeepSeek team has evaluated the model on a range of math and coding tasks and we can see the model’s strong capabilities compared to other frontier models:

Clearly, the model plays in the top league: often on par with GPT-4.5 and generally stronger than Claude-Sonnet-3.7.

To summarise the model has seen significant improvements across benchmarks

MMLU-Pro: 75.9 → 81.2 (+5.3) (A good benchmark for overall understanding)
GPQA: 59.1 → 68.4 (+9.3)
AIME: 39.6 → 59.4 (+19.8) (proxy for MATH capabilities)
LiveCodeBench: 39.2 → 49.2 (+10.0) (indicator of coding abilities)

Specifically, in the model card the DeepSeek mentions targeted improvements in the following areas:

Front-End Web Development
- Improved executability of the code
- More aesthetically pleasing web pages and game front-ends
Chinese Writing Proficiency
- Enhanced style and content quality
  - Aligned with the R1 writing style
  - Better quality in medium-to-long-form writing
- Feature Enhancements
  - Improved mutli-turn interactive rewriting
  - Optimized translation quality and letter writing
Chinese Search Capabilities
- Enhanced report analysis requests with more detailed outputs
Function Calling Improvements
- Increased accuracy in Function Calling, fixing issues in previous V3 versions

So the question might pop-up: how did they actually do this? Let’s speculate a bit!

How did they do it?

Given the naming and architecture it is fairly safe to assume that the new model is based on the previous V3 model and trained on top of it. There are two possible areas how they improved the models:

Continual pretraining: Starting with the V3 model it’s possible to continue the pretraining process with a) newer, more up-to-date data and b) use data that has been better curated and thus higher quality. This will improve the factuality on recent events and improve the capabilities generally.
Improved post-training: Especially in the era of instruction following and style post-training plays the most important role. Likely they improved the post-training data mix and maybe even the algorithm.

Until the team releases a technical report we don’t know for sure what they tweaked but the post-training pipeline is quite likely and potentially also adding a bit of pretraining. So have a look at how to use the models next!

How to use the model

Inference Providers

You can use Hugging Face’s Inference Providers to quickly experiment with this model. It’s available through Fireworks, Hyperbolic, and Novita.

Here’s an example using the huggingface_hub library. You can also use the OpenAI client library like in this example.

from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="fireworks-ai",
    #api_key="your hf or provider token"
)

messages = [
    {
        "role": "user",
        "content": "My first is second in line; I send shivers up your spine; not quite shining bright. I glitter in the light."
    }
]

completion = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3-0324",
    messages=messages,
    temperature=0.3,
)

print(completion.choices[0].message['content'])
# ...**Final Answer: ice**

Text Generation Inference

TGI supports running DeepSeek V3-0324 with its latest release as well. You can use it directly with the tagged docker image on a node of H100s

docker run --gpus all --shm-size 1g -p 8080:80 -v $ volume:/data \
    ghcr.io/huggingface/text-generation-inference:3.2.1 --model-id deepseek-ai/DeepSeek-V3-0324

SGLang

SGLang supports running DeepSeek V3-0324 out of the box along with the Multi Latent Attention and Data Parallelism optimisations as well. To use you can simply just run the following on a node of H100s. For more information follow along here.

docker pull lmsysorg/sglang:latest

docker run --gpus all --shm-size 32g -p 30000:30000 -v ~/.cache/huggingface:/root/.cache/huggingface --ipc=host --network=host --privileged lmsysorg/sglang:latest \
    python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-V3-0324 --tp 8 --trust-remote-code --port 30000

Dynamic Quants from Unsloth and Llama.cpp

Running large LLMs like DeepSeek V3-0324 can be quite compute intensive and would require a large amount GPU VRAM to run. This is where Quantization comes in, it allows the end user to use the same model but with much lower VRAM consumption with a small trade-off in downstream performance.

Unsloth AI created Dynamic quantisations which allow one to run DeepSeek V3 with half the amount of compute as one node of H100 and can run with llama.cpp without as much degradation in benchmarks. Read more about it here: https://huggingface.co/unsloth/DeepSeek-V3-0324-GGUF

Is it safe?

Running language model safely has always been at the center of attention, ever since the first GPT models have been released. With the immense popularity of the DeepSeek models and their origin the question has found new interest. Let us run down the things that are safe to do and areas where some caution is a good idea. This is not DeepSeek specific but true for any open model!

First of all - is it safe to even download the model?

Downloading and running the model

Yes, downloading the model is safe. There are a few precautions on the Hub side that make sure it’s safe to download and run models:

Safetensors: The safetensors format is used to store the DeepSeek model weights on the Hub ensuring no hidden code execution is possible; which was a risk with the older PyTorch pickle format. Thus no malicious code can be hidden in the weights file. Read more in the Safetensors blog.
Modeling code: To run the model, the modeling code also needs to be downloaded along with the weight files. There are three mechanisms in place to improve safety there: 1. the files are fully visible on the hub, 2. the user needs to explicitly set trust_remote_code=True to execute any code associated with the model, 3. a security scanner runs over files on the hub and flags any malicious code files. If you want to be extra careful you can pin the model version with the revision setting to make sure you download the version of the modeling code that has been reviewed.

So downloading the weights is safe, and upon code review so is executing the modeling code. This means you can run the DeepSeek model locally without the risk of backdoors or malicious code execution.

So what would be the main risks outside of downloading and running the model? It depends on what you do with the model outputs!

Model outputs

The advice that follows is not specific to any model, and applies to both open and closed models: whether considering risks stemming from built-in secret behaviours in the model or from a model accidentally producing bad outputs.

We’ll cover risks in three areas: alignment, code generation and agents.

Alignment mismatch: Every model provider chooses how and to which values their models are aligned. What these values are and how they are chosen typically remains opaque and they might also change over time (see this study). The advantage of open models is that the alignment can be changed with custom fine-tuning at a later stage still as the example of Perplexity’s DeepSeek 1776 shows.

As a rule, users should be aware that any LLM is biased in one way or another and treat the model outputs accordingly.

Code generation: One of the most popular use-cases of LLMs is as coding assistants. However, this is also where indiscriminate usage of the model outputs can have the most negative effects. Models are trained on vast amounts of published code, new and old. This typically includes potentially malicious code or code that contains known vulnerabilities. So models might produce similar vulnerabilities when proposing code solutions.

So, how can you prevent security issues when using LLMs for code development? Run thorough code reviews of the proposed changes and scan the code with appropriate tools for vulnerabilities, as you would with any other code contribution.

Agents: In the past few months agent applications have gained significant interest, giving LLMs more autonomy and agency also bears risks. It’s important to be careful about what kind of system access agents have and which information you provide them. Some good practices:

Sandboxes: don’t run agents on your machine where they have access and control of your computer. This avoids leaking private information or accidentally deleting important files.
Private information: don’t share private information such as logins with the LLM. If you need to give the model access to a system use dedicated access keys with strict access rules.
Human-in-the-loop: for high stakes processes that you want to automate with agents make sure there is a human in the loop for final confirmation.

TL;DR: Is it safe to run the models? Yes, downloading and running the models is safe, but, as with any model, you should take precautions to use the models generations with the appropriate safety measures.

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

Hugging Face - Blog

Welcome DeepSeek-V3 0324

What we know so far

How good is it?

How did they do it?

How to use the model

Inference Providers

Text Generation Inference

Is it safe?

Downloading and running the model

Model outputs