Open R1: How to use OlympicCoder locally for coding

Hugging Face - Blog

Waypoint-1.5: Higher-Fidelity Interactive Worlds for Everyday GPUs ALTK‑Evolve: On‑the‑Job Learning for AI Agents Safetensors is Joining the PyTorch Foundation Holo3: Breaking the Computer Use Frontier Any Custom Frontend with Gradio's Backend A New Framework for Evaluating Voice Agents (EVA) Bringing Robotics AI to Embedded Platforms: Dataset Recording, VLA Fine‑Tuning, and On‑Device Optimizations One-Shot Any Web App with Gradio's gr.HTML CUGA on Hugging Face: Democratizing Configurable AI Agents New in llama.cpp: Model Management Building Deep Research: How we Achieved State of the Art OVHcloud on Hugging Face Inference Providers 🔥 20x Faster TRL Fine-tuning with RapidFire AI Building for an Open Future - our new partnership with Google Cloud Aligning to What? Rethinking Agent Generalization in MiniMax M2 Building a Healthcare Robot from Simulation to Deployment with NVIDIA Isaac Sentence Transformers is joining Hugging Face! Unlock the power of images with AI Sheets Supercharge your OCR Pipelines with Open Models Google Cloud C4 Brings a 70% TCO improvement on GPT OSS with Intel and Hugging Face Get your VLM running in 3 simple steps on Intel CPUs Nemotron-Personas-India: Synthesized Data for Sovereign AI Introducing RTEB: A New Standard for Retrieval Evaluation Accelerating Qwen3-8B Agent on Intel® Core™ Ultra with Depth-Pruned Draft Models VibeGame: Exploring Vibe Coding Games Nemotron-Personas-Japan: ソブリン AI のための合成データセット Swift Transformers Reaches 1.0 – and Looks to the Future Smol2Operator: Post-Training GUI Agents for Computer Use SyGra: The One-Stop Framework for Building Data for LLMs and SLMs Gaia2 and ARE: Empowering the community to study agents Scaleway on Hugging Face Inference Providers 🔥 Democratizing AI Safety with RiskRubric.ai Public AI on Hugging Face Inference Providers 🔥 `LeRobotDataset:v3.0`: Bringing large-scale datasets to `lerobot` Visible Watermarking with Gradio Introducing the Palmyra-mini family: Powerful, lightweight, and ready to reason! Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers Fine-tune Any LLM from the Hugging Face Hub with Together AI Jupyter Agents: training LLMs to reason with notebooks mmBERT: ModernBERT goes Multilingual Welcome EmbeddingGemma, Google's new efficient embedding model SAIR: Accelerating Pharma R&D with AI-Powered Structural Intelligence Make your ZeroGPU Spaces go brrr with ahead-of-time compilation NVIDIA Releases 6 Million Multi-Lingual Reasoning Dataset Generate Images with Claude and Hugging Face From Zero to GPU: A Guide to Building and Scaling Production-Ready CUDA Kernels MCP for Research: How to Connect AI to Research Tools Kimina-Prover-RL Arm & ExecuTorch 0.7: Bringing Generative AI to the masses Neural Super Sampling is here! TextQuests: How Good are LLMs at Text-Based Video Games? 🇵🇭 FilBench - Can LLMs Understand and Generate Filipino? Introducing AI Sheets: a tool to work with datasets using open AI models! Accelerate ND-Parallel: A guide to Efficient Multi-GPU Training Vision Language Model Alignment in TRL ⚡️ Welcome GPT OSS, the new open-source model family from OpenAI! Measuring Open-Source Llama Nemotron Models on DeepResearch Bench 📚 3LM: A Benchmark for Arabic LLMs in STEM and Code Implementing MCP Servers in Python: An AI Shopping Assistant with Gradio Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face Say hello to `hf`: a faster, friendlier Hugging Face CLI ✨ Parquet Content-Defined Chunking TimeScope: How Long Can Your Video Large Multimodal Model Go? Fast LoRA inference for Flux with Diffusers and PEFT Accelerate a World of LLMs on Hugging Face with NVIDIA NIM Arc Virtual Cell Challenge: A Primer Consilium: When Multiple LLMs Collaborate Back to The Future: Evaluating AI Agents on Predicting Future Events Five Big Improvements to Gradio MCP Servers Ettin Suite: SoTA Paired Encoders and Decoders Migrating the Hub from Git LFS to Xet Kimina-Prover: Applying Test-time RL Search on Large Formal Reasoning Models Asynchronous Robot Inference: Decoupling Action Prediction and Execution ScreenEnv: Deploy your full stack Desktop Agent Building the Hugging Face MCP Server Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders Creating custom kernels for the AMD MI300 Upskill your LLMs With Gradio MCP Servers SmolLM3: smol, multilingual, long-context reasoner Three Mighty Alerts Supporting Hugging Face’s Production Infrastructure Efficient MultiModal Data Pipeline Announcing NeurIPS 2025 E2LM Competition: Early Training Evaluation of Language Models Training and Finetuning Sparse Embedding Models with Sentence Transformers Welcome the NVIDIA Llama Nemotron Nano VLM to Hugging Face Hub Gemma 3n fully available in the open-source ecosystem! Transformers backend integration in SGLang (LoRA) Fine-Tuning FLUX.1-dev on Consumer Hardware Groq on Hugging Face Inference Providers 🔥 How Long Prompts Block Other Requests - Optimizing LLM Performance Learn the Hugging Face Kernel Hub in 5 Minutes Convert Transformers to ONNX with Hugging Face Optimum Intel and Hugging Face Partner to Democratize Machine Learning Hardware Acceleration Director of Machine Learning Insights [Part 3: Finance Edition] The Annotated Diffusion Model Deep Q-Learning with Space Invaders Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers Introducing Pull Requests and Discussions 🥳 Efficient Table Pre-training without Real Data: An Introduction to TAPEX An Introduction to Q-Learning Part 2/2 How Sempre Health is leveraging the Expert Acceleration Program to accelerate their ML roadmap

ben burtenshaw, Vaibhav Srivastav, Lewis Tunstall, Edward Beechi · 2025-03-20 · via Hugging Face - Blog

Back to Articles

Everyone’s been using Claude and OpenAI as coding assistants for the last few years, but there’s less appeal if you look at the developments coming out of open source projects like Open R1. If we look at the evaluation on LiveCodeBench below, we can see that the 7B parameter variant outperforms Claude 3.7 Sonnet and GPT-4o. These models are the daily drivers of many engineers in applications like Cursor and VSCode.

Evals are great and all, but I want to get my hands dirty and feel the commits! This blog post focuses on how you can integrate these models in your IDE now. We will set up OlympicCoder 7B, the smaller of the two OlympicCoder variants, and we’ll use a quantized variant for optimum local inference. Here’s the stack we’re going to use:

OlympicCoder 7B. The 4bit GGUF version from the LMStudio Community
LM Studio: A tool that simplifies running AI models
Visual Studio Code (VS Code)
Continue a VS Code extension for local models

It’s important to say that we chose this stack purely for simplicity. You might want to experiment with the larger model and/or different GGUF files. Or even alternative inference engines like llama.cpp.

1. Install LM Studio

LM Studio is like a control panel for AI models. It integrates with the Hugging Face hub to pull models, helps you find the right GGUF file, and exposes an API that other applications can use to interact with the model.

In short, it lets you download and run them without any complicated setup.

Go to the LM Studio website: Open your web browser and go to https://lmstudio.ai/download.
Choose your operating system: Click the download button for your computer (Windows, Mac, or Linux).
Install LM Studio: Run the downloaded file and follow the instructions. It’s just like installing any other program.

2. Get OlympicCoder 7B

The GGUF files that we need are hosted on the hub. We can open the model from the hub in LMStudio, using the ‘Use this model’ button:

This will link to the LMStudio application and open it on your machine. You’ll just need to Choose a Quantization. I went for Q4_K_M because it will perform well on most devices. If you have more compute, you might want to try out one of the options with Q8_*.

If you want to skip the UI, you can also load models with LMStudio via the command line:

lms get lmstudio-community/OlympicCoder-7B-GGUF
lms load olympiccoder-7b
lms server start

3. Connect LM Studio to VS Code

This is the important part. We now need to integrate VScode with the model served by LMStudio.

In LM Studio, activate the server on the ‘Developer’ tab. This will expose the endpoints at http://localhost:1234/v1.

Install the VS Code Extension to connect to our local server. I went for Continue.dev, but there are other options too.
- In VSCode, go to the Extensions view (click the square icon on the left sidebar, or press Ctrl+Shift+X / Cmd+Shift+X).
- Search for “Continue” and install the extension from “Continue Dev”.
Configure a New Model in Continue.dev
- Open the Continue tab and in the models dropdown, select ‘add new chat model’.
- This will open a json configuration file. You’ll need to specify the model name. I.e. olympiccoder-7b

🚀 You’ve got a local coding assistant!

Most of the core AI features in vscode are available via this setup, for example:

Code Completion: Start typing, and the AI will suggest how to finish your code.
Generate Code: Ask it to write a function or a whole block of code. For example, you could type (in a comment or a chat window, depending on the extension): // Write a function to reverse a string in JavaScript
Explain Code: Select some code and ask the AI to explain what it does.
Refactor Code: Ask the AI to make your code cleaner or more efficient.
Write Tests: Ask the AI to create unit tests for your code.

🏋️‍♀️ What’s the vibe of OlympicCoder?

OlympicCoder is not Claude. It’s optimised on the CodeForces-CoTs dataset which is based on competitive coding challenges. That means that you should not expect it to be super friendly and explanatory. Instead, roll up your sleeves and expect a no-holds barred competitive coder ready to deal with tough problems.

You might want to mix up OlympicCoder with other models to get a rounded coding experience. For example, if you’re trying to squeeze milliseconds out of a binary search, try OlympicCoder. If you want to design a user facing API, go for Claude-3.7-sonnet or Qwen-2.5-Coder.

Next Steps

Share your favorite generations in the comments below
Try out another variant of OlympicCoder from the hub.
Experiment with quantization types based on your hardware.
Try out multiple models in LM Studio for different coding vibes! Check out the model catalog https://lmstudio.ai/models
Experiment with other VS Code extensions like Cline which have agentic functionality

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

Hugging Face - Blog

1. Install LM Studio

2. Get OlympicCoder 7B

3. Connect LM Studio to VS Code

🚀 You’ve got a local coding assistant!

🏋️‍♀️ What’s the vibe of OlympicCoder?

Next Steps