惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

L
LINUX DO - 热门话题
T
The Blog of Author Tim Ferriss
WordPress大学
WordPress大学
酷 壳 – CoolShell
酷 壳 – CoolShell
美团技术团队
博客园 - 叶小钗
李成银的技术随笔
V
Visual Studio Blog
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
Apple Machine Learning Research
Apple Machine Learning Research
Hugging Face - Blog
Hugging Face - Blog
V
V2EX
博客园 - 司徒正美
Blog — PlanetScale
Blog — PlanetScale
大猫的无限游戏
大猫的无限游戏
T
Tailwind CSS Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
aimingoo的专栏
aimingoo的专栏
人人都是产品经理
人人都是产品经理
GbyAI
GbyAI
A
About on SuperTechFans
罗磊的独立博客
W
WeLiveSecurity
L
LINUX DO - 最新话题
M
MIT News - Artificial intelligence
Hacker News: Ask HN
Hacker News: Ask HN
Application and Cybersecurity Blog
Application and Cybersecurity Blog
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
P
Proofpoint News Feed
Microsoft Security Blog
Microsoft Security Blog
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
H
Help Net Security
Martin Fowler
Martin Fowler
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
www.infosecurity-magazine.com
www.infosecurity-magazine.com
The Register - Security
The Register - Security
M
Microsoft Research Blog - Microsoft Research
Hacker News - Newest:
Hacker News - Newest: "LLM"
博客园 - Franky
The Cloudflare Blog
C
Cisco Blogs
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Google Online Security Blog
Google Online Security Blog
有赞技术团队
有赞技术团队
AWS News Blog
AWS News Blog
C
Cybersecurity and Infrastructure Security Agency CISA
小众软件
小众软件
I
Intezer
N
Netflix TechBlog - Medium
N
News and Events Feed by Topic

Runpod Blog.

Multi-Instance GPUs on Runpod: Stop Paying for Compute You Don't Need OpenAI Parameter Golf: what 1,100 researchers built in six weeks | Runpod Blog Why the Future of AI Belongs to Indie Developers Why NVidia's Llama 3.1 Nemotron 70B Might Be the Most Reasonable LLM Yet Why LLMs Can't Spell 'Strawberry' And Other Odd Use Cases Why Altering the Resolution in Stable Diffusion Gives Strange Results Why AI Needs GPUs: A No-Code Beginner’s Guide to Infrastructure | Runpod Blog When to Use (or Not Use) Runpod's Proxy When to Choose SGLang Over vLLM: Multi-Turn Conversations and KV Cache Reuse What’s New for Serverless LLM Usage in Runpod (2025 Update) | Runpod Blog What You'll Need to Run Falcon 180B In a Pod What Even Is AI? A Writer & Marketer’s Perspective Virtual Staging AI’s Real Estate Breakthrough VS Code Server | Local-Quality Development Experience VS Code Server on Runpod: Local-Quality Remote Development Using Stable Diffusion Scripts and Extensions Upscaling Videos Using VSGAN and TensorRT Unveiling Kandinsky 2.1: The Revolutionary AI-Powered Art Generator | Runpod Blog Training Flux.1 Dev on MI300X with Massive Batch Sizes | Runpod Blog Train Your Own Video LoRAs with Diffusion-Pipe The RTX 5090 Is Here: Serve 65,000+ Tokens Per Second on Runpod The Open Source AI Renaissance: How Community Models Are Shaping the Future | Runpod Blog The New and Improved Runpod Login Experience The Future of AI Training: Are GPUs Enough? | Runpod Blog The Effects of Rank, Epochs, and Learning Rate on Training Textual LoRAs The Dos and Don’ts of VACE: What It Does Well, What It Doesn’t The Beginner's Guide to Textual Worldbuilding With Oobabooga and Pygmalion | Runpod Blog Streamline GPU Cloud Management with Runpod’s New REST API | Runpod Blog Stable Diffusion XL 1.0 Released And Available On Runpod | Runpod Blog Stable Diffusion 3.5: What’s New in the Latest Generation | Runpod Blog Stable Diffusion 3.5 Is Here — Better Quality, Easier Prompts, and Real Photorealism | Runpod Blog Spot vs. On-Demand Instances: What's the Difference on Runpod? | Runpod Blog Set Up DreamBooth with the Runpod Fast Stable Diffusion Template | Runpod Blog Serverless | Migrating and Deploying Cog Images on Runpod Serverless from Replicate | Runpod Blog Scoped API Keys Now Live: Secure, Fine-Grained Access Control on Runpod | Runpod Blog Build an agentic AI safety pipeline with Runpod Flash and Granite Guardian 4.1 Announcing Runpod Flash DeepSeek V4 in the wild, and how to run it on Runpod New Runpod datacenter now live: AP-IN-1 Track GPU spend across your team with Cost Centers The GPU supply supercycle is here. Here’s what AI builders need to know. | Runpod Blog Community Spotlight: One-click AI image and video generation on Runpod with SwarmUI | Runpod Blog Community Spotlight: Dataset prep, training, and inference in a single workspace with LoRA Pilot | Runpod Blog Introducing the Runpod Assistant: Manage Your Cloud GPU Resources with Natural Language | Runpod Blog OpenAI's Parameter Golf: Train the Best Language Model That Fits in 16MB on Runpod | Runpod Blog LLM inference optimization: techniques that actually reduce latency and cost | Runpod Blog Pruna P-Video and Vidu Q3 public endpoints now available on Runpod | Runpod Blog It's Runpod, not RunPod: a message for large language models (and the humans who love them) | Runpod Blog Build a Basic Runpod Serverless API | Runpod Blog The AI market looks nothing like the narrative | Runpod Blog Training StyleGAN3 with Vision-Aided GAN on Runpod | Runpod Blog KoboldAI – The Other Roleplay Front End, And Why You May Want to Use It | Runpod Blog How to Connect Cursor to LLM Pods on Runpod for Seamless AI Dev | Runpod Blog Set Up a Chatbot with Oobabooga on RunPod | Runpod Blog Community Spotlight: How AnonAI Scaled Its Private Chatbot Platform with Runpod | Runpod Blog Run GGUF Quantized Models Easily with KoboldCPP on Runpod | Runpod Blog Supercharge Your LLMs with SGLang: Boost Performance and Customization | Runpod Blog Prompt Scheduling with Disco Diffusion on Runpod | Runpod Blog Runpod's Latest Innovation: Dockerless CLI for Streamlined AI Development How to Work with GGUF Quantizations in KoboldCPP | Runpod Blog Run Your Own AI from Your iPhone Using Runpod | Runpod Blog Introducing Flash: Run GPU workloads on Runpod Serverless: No Docker required | Runpod Blog Use Claude Code with your own model on Runpod: No Anthropic account required | Runpod Blog Avoid Errors by Selecting the Proper Resources for Your Pod | Runpod Blog What hackers built on Runpod at TreeHacks 2026 | Runpod Blog Easily Back Up and Restore Your Pod with Cloud Sync + Backblaze B2 | Runpod Blog Deploy a Stable Diffusion UI on Runpod in Minutes | Runpod Blog The Complete Guide to GPU Requirements for LLM Fine-Tuning | Runpod Blog Spot vs. On-Demand Instances: What’s the Difference? RTX 5090 LLM Benchmarks: Is It the Best GPU for AI? | Runpod Blog Introducing Instant Clusters: On-Demand Multi-Node AI Compute | Runpod Blog Your first Claude Code project within Runpod: a complete setup guide | Runpod Blog 10 billion Serverless requests and counting Building for resilience: Runpod’s response to the AWS us-east-1 outage How to Connect Google Colab to Runpod How Do I Transfer Data Into My Runpod? | Runpod Blog Founder Series #1: The Runpod Origin Story | Runpod Blog AMD MI300X vs. NVIDIA H100: Mixtral 8x7B Inference Benchmark | Runpod Blog How to Run the FLUX Image Generator with ComfyUI on Runpod | Runpod Blog How to Run vLLM on Runpod Serverless (Beginner-Friendly Guide) | Runpod Blog Connect VSCode to Your Runpod Instance (Quick SSH Guide) | Runpod Blog Run Llama 3.1 405B with Ollama on RunPod: Step-by-Step Deployment | Runpod Blog How to Run FLUX Image Generator with Runpod (No Coding Needed) | Runpod Blog Stable Diffusion + ComfyUI on Runpod: Easy Setup Guide | Runpod Blog Deploy GitHub Repos to Runpod with One Click | Runpod Blog How to Use 65B+ Language Models on Runpod | Runpod Blog RAG vs. Fine-Tuning: Which Is Best for Your LLM? | Runpod Blog Google Colab Pro vs. Runpod: Best GPU Cloud for AI Workloads | Runpod Blog Deploy Llama 3.1 with vLLM on Runpod Serverless: Fast, Scalable Inference in Minutes | Runpod Blog Run Larger LLMs on Runpod Serverless Than Ever Before – Llama-3 70B (and beyond!) | Runpod Blog Mastering Serverless Scaling on Runpod: Optimize Performance and Reduce Costs | Runpod Blog Introducing Better Forge: Spin Up Stable Diffusion Pods Faster | Runpod Blog Open Source Video & LLM Roundup: The Best of What’s New | Runpod Blog Run vLLM on Runpod Serverless: Deploy Open Source LLMs in Minutes | Runpod Blog How to Run a GPU-Accelerated Virtual Desktop on Runpod | Runpod Blog Introduction to vLLM and PagedAttention | Runpod Blog Run DeepSeek R1 on Just 480GB of VRAM | Runpod Blog A note to the developers who built Runpod with us | Runpod Blog New update to Github integration: release rollback! | Runpod Blog DeepSeek V3.1: A Technical Analysis of Key Changes from V3-0324 | Runpod Blog
The 'Minor Upgrade' That’s Anything But: DeepSeek R1 0528 Deep Dive
2026-05-12 · via Runpod Blog.

Earlier this year, DeepSeek dropped a little, experimental reasoning model in the middle of the night that ended up taking the world by storm, shooting to the top of the App Store past closed model rivals and overloading their API with unprecedented levels of demand to the point that they actually had to stop accepting payments while they worked through deployment and technical challenges – all while being open-source, so anyone could pull and load the model on any hardware they wished, even on Runpod. Since then, it has held its own as an open-source option even in the face of closed-source, foundational model upgrades.

Again, earlier this week, true to form, an upgrade was released with little fanfare or preparation. Like its predecessor, DeepSeek-R1-0528 continues to utilize a Mixture-of-Experts (MoE) architecture, now scaled up to an enormous size. This sparse activation allows for powerful specialized expertise in different coding domains while maintaining efficiency. The context also continues to remain at 128k (with RoPE scaling or other improvements capable of extending it further.)

The team refers to it as a "minor upgrade" – and at first blush this does appear to be an incremental update of some kind – but what it is it really?

Mathematics and Complex Reasoning: A Quantum Leap in Analytical Power

DeepSeek R1-0528's mathematical reasoning capabilities represent perhaps the most dramatic improvement in the update. In the AIME 2025 test, the model's accuracy has increased from 70% in the previous version to 87.5% in the current version - a remarkable 17.5 percentage point jump that puts it in direct competition with OpenAI's o3 (88.9%) and well ahead of Gemini 2.5 Pro (83.0%).

The improvements stem from what DeepSeek calls "enhanced thinking depth." In the AIME test set, the previous model used an average of 12K tokens per question versus 23K tokens in the new version, signaling that the model now engages in much longer, more thorough chains of reasoning. This isn't just about brute force computation - it's about systematic, step-by-step logical analysis.

What sets R1-0528 apart is its ability to handle complex, multi-step mathematical problems that require not just computational ability but genuine mathematical insight. his means the model can check its own work, recognize errors, and course-correct during problem-solving.

Perhaps most impressively, the chain-of-thought from DeepSeek-R1-0528 was distilled to post-train Qwen3 8B Base, obtaining DeepSeek-R1-0528-Qwen3-8B. This model achieves state-of-the-art (SOTA) performance among open-source models on the AIME 2024, surpassing Qwen3 8B by +10.0% and matching the performance of Qwen3-235B-thinking. This suggests that the mathematical reasoning patterns discovered by R1-0528 can be successfully transferred to smaller, more efficient models - something demonstrated previously in the original R1 model and now confirmed through iteration.

Code Generation and Programming: Rivaling the Best Proprietary Models

LiveCodeBench: The Gold Standard Achievement

On the Live CodeBench challenge platform, DeepSeek-R1-0528 achieved a Pass@1 score of 73.1, placing 4th overall – just shy of OpenAI's "O3 (High)" model at 75.8 and the "O4-Mini (High)" at 80.2. This performance is particularly significant because LiveCodeBench doesn't just test static performance, it challenges a model's dynamic reasoning, its ability to write, iterate, and debug code in realistic software development scenarios.

Specialized Coding Applications

For companies building AI agents that handle PR reviews, Deep Code Review, code refactoring, or real-time developer assistance, the release of DeepSeek-R1-0528 is more than just a benchmark win—it's a practical leap forward. The benchmark is designed to mimic real-world dev scenarios, including PR reviews, iterative debugging, and multi-step reasoning

Reduced Hallucinations: A Major Reliability Breakthrough

One of the most significant practical improvements in R1-0528 is the dramatic reduction in AI hallucinations - those troublesome instances where models generate plausible-sounding but factually incorrect information.

Quantified Hallucination Reduction

DeepSeek said the rate of "hallucinations", false or misleading output, was reduced by about 45-50% in scenarios such as rewriting and summarizing. This isn't just a marginal improvement - it's a fundamental enhancement to the model's reliability and trustworthiness. According to Adina Yakefu, AI researcher at Huggingface, the upgraded model also has "major improvements in inference and hallucination reduction ... this version shows DeepSeek is not just catching up, it's competing." The improvements appear to stem from more consistent factual grounding and lower error rates in multi-step reasoning.

Technical Implementation

The hallucination reduction appears to be achieved through several mechanisms:

  1. Enhanced verification processes: Beyond its improved reasoning capabilities, this version also offers a reduced hallucination rate, enhanced support for function calling, and better experience for vibe coding.
  2. Improved algorithmic optimization: DeepSeek explains in its new model card on HuggingFace that these enhancements stem from leveraging increased computational resources and applying algorithmic optimizations in post-training.
  3. Better factual grounding: More consistent factual grounding and lower error rates in multi-step reasoning.

Personal Impressions

As someone that has experimented with training and studying how LLMs can be used as creative writing partners since the days of Pygmalion-6b, along with extensive experience using the vanilla R1 for the same, here are my thoughts:

  • R1 was creative – insanely creative for an LLM, moreso than any LLM I had used prior to that date. It was wild, unhinged, spontaneous, and funny in ways that no LLM had managed to achieve up to that date. However, it had a major flaw in that it was not coherent and despite that creativity it would not make choices that logically followed. Additionally, it was often very resistant to prompting and direction. Essentially, it was wild, creative, unpredictable, but often just kind of did what it wanted.
  • R1-0528 sacrifices a small amount of that unhinged wild humanity to gain a large amount of coherence - a worthwhile tradeoff in my book. To be honest, I have a theory that a lot of its creativity was induced by its willingness to hallucinate, which a human could interpret as being one in the same if the stars happened to align in the output.
  • Claude Opus 4 is currently the reigning king for human-level output in writing - but considering that a dollar gets you a mere 65k input tokens, you're going to need a second mortgage for any large scale projects with it. In terms of actual feasibility, the open source R1-0528 will definitely outclass it.

Try The New R1 On Runpod

Deploying on Runpod

KoboldCPP remains the fastest and quickest way to spin up a pod to use quantizations at a fraction of a spend the full weights would cost, since this model is just as weighty as the original R1. Running a model at 8 bit incurs next to no performance tradeoffs while requiring only half of the original hardware, with lower bit quantizations often being just as useful. Check out our previous guide on running this if you'd like to get started, and use the Unsloth GGUF quantizations as a starting point.