惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Fox-IT International blog
Recent Announcements
Recent Announcements
D
Docker
IT之家
IT之家
B
Blog
Jina AI
Jina AI
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
博客园 - 【当耐特】
Google DeepMind News
Google DeepMind News
F
Fortinet All Blogs
量子位
C
Check Point Blog
Microsoft Azure Blog
Microsoft Azure Blog
罗磊的独立博客
博客园 - 司徒正美
李成银的技术随笔
美团技术团队
Blog — PlanetScale
Blog — PlanetScale
雷峰网
雷峰网
The GitHub Blog
The GitHub Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
J
Java Code Geeks
T
The Blog of Author Tim Ferriss
酷 壳 – CoolShell
酷 壳 – CoolShell
MongoDB | Blog
MongoDB | Blog
P
Proofpoint News Feed
L
LangChain Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
Y
Y Combinator Blog
大猫的无限游戏
大猫的无限游戏
有赞技术团队
有赞技术团队
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
V
Visual Studio Blog
T
Tailwind CSS Blog
H
Help Net Security
Engineering at Meta
Engineering at Meta
小众软件
小众软件
B
Blog RSS Feed
Stack Overflow Blog
Stack Overflow Blog
月光博客
月光博客
M
Microsoft Research Blog - Microsoft Research
宝玉的分享
宝玉的分享
人人都是产品经理
人人都是产品经理
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
GbyAI
GbyAI
H
Hackread – Cybersecurity News, Data Breaches, AI and More
Last Week in AI
Last Week in AI
Martin Fowler
Martin Fowler
Stack Overflow Blog
Stack Overflow Blog

OpenAI Developers

Realtime prompting guide OpenAI Developers plugin for Codex API deployment checklist | OpenAI API Codex for (almost) everything Sora 2 Prompting Guide Codex Prompting Guide Introducing the Codex app Codex in JetBrains IDEs Docs MCP | OpenAI Developers Gpt-image-1.5 Prompting Guide GPT-5.2 Prompting Guide Transcribing User Audio with a Separate Realtime Request Modernizing your Codebase with Codex Codex code review Build beautiful frontends with OpenAI Codex Building with Open Models Using OpenAI Codex CLI with GPT-5-Codex OpenAI Codex in your code editor Context Engineering & Coding Agents with Cursor Shipping with Codex Sora, ImageGen, and Codex: The Next Wave of Creative Production Live Demo Showcase: Tools That 10x Your Codebase GitHub - openai/openai-sora-sample-app: Sample app to get started using the Video API with Sora GitHub - openai/openai-apps-sdk-examples: Example apps for the Apps SDK GitHub - openai/openai-chatkit-advanced-samples: Starter app to build with OpenAI ChatKit SDK GitHub - openai/openai-chatkit-starter-app: Starter app to build with OpenAI ChatKit + Agent Builder Agentic Commerce Protocol | OpenAI Developers Rate limits guide Web search guide Getting Started with Evals Prompt Optimizer Verifying gpt-oss implementations Fine-tuning with gpt-oss and Hugging Face Transformers How to run gpt-oss locally with Ollama Function calling guide OpenAI models page Reasoning best practices Reasoning guide Background mode guide Batch API guide Conversation state guide File search guide Flex processing guide MCP guide Code interpreter guide Quickstart - OpenAI Agents SDK Balance accuracy, latency, and cost Build hour — agentic tool calling Build hour — built-in tools Keep costs low & accuracy high Graders Evals Best Practices Working with the Evals API Guardrails - OpenAI Agents SDK Latency optimization guide AI Techniques (Foundations): Evaluating LLM Applications LLM correctness and consistency Model distillation overview Agent orchestration - OpenAI Agents SDK Production best practices Prompt engineering guide RAG technique overview Realtime guide Realtime intro Realtime translation guide AI Techniques (Foundations): Responses API Tools & Features Responses guide Responses vs. chat completions guide Speech-to-text guide Supervised fine-tuning overview Tracing - OpenAI Agents SDK AI Techniques (Foundations): Introduction to Agentic Workflows Vision fine-tuning overview Audio & speech guide GitHub - openai/openai-cs-agents-demo: Demo of a customer service use case implemented with the OpenAI Agents SDK Model distillation overview Voice applications intro Fine-tuning best practices Realtime translation guide 4o image generation intro GitHub - openai/openai-agents-python: A lightweight, powerful framework for multi-agent workflows GitHub - openai/openai-agents-js: A lightweight, powerful framework for multi-agent workflows and voice agents Building agents guide Built-in tools guide Codex intro Computer Use API guide GitHub - openai/openai-cua-sample-app: Learn how to use CUA (our Computer Using Agent) via the API on multiple computer environments. DevDay — distillation breakout DevDay — realtime breakout GitHub - openai/openai-testing-agent-demo: Demo of a UI testing agent using the OpenAI CUA model and the Responses API. DevDay — structured outputs breakout Image generation guide An Introduction to MCP Model optimization guide New audio models intro GitHub - openai/openai-fm: Code for openai.fm, a demo for the OpenAI Speech API Predicted outputs guide GitHub - openai/openai-realtime-console: React app for inspecting, building and debugging with the Realtime API Building Voice Agents GitHub - openai/openai-realtime-solar-system: Demo showing how to use the OpenAI Realtime API to navigate a 3D scene via tool calling
How to run gpt-oss locally with LM Studio
2025-08-07 · via OpenAI Developers

LM Studio is a performant and friendly desktop application for running large language models (LLMs) on local hardware. This guide will walk you through how to set up and run gpt-oss-20b or gpt-oss-120b models using LM Studio, including how to chat with them, use MCP servers, or interact with the models through LM Studio’s local development API.

Note that this guide is meant for consumer hardware, like running gpt-oss on a PC or Mac. For server applications with dedicated GPUs like NVIDIA’s H100s, check out our vLLM guide.

LM Studio supports both model sizes of gpt-oss:

  • openai/gpt-oss-20b
    • The smaller model
    • Only requires at least 16GB of VRAM
    • Perfect for higher-end consumer GPUs or Apple Silicon Macs
  • openai/gpt-oss-120b
    • Our larger full-sized model
    • Best with ≥60GB VRAM
    • Ideal for multi-GPU or beefy workstation setup

LM Studio ships both a llama.cpp inferencing engine (running GGUF formatted models), as well as an Apple MLX engine for Apple Silicon Macs.

  1. Install LM Studio LM Studio is available for Windows, macOS, and Linux. Get it here.

  2. Download the gpt-oss model

# For 20B
lms get openai/gpt-oss-20b
# or for 120B
lms get openai/gpt-oss-120b
  1. Load the model in LM Studio → Open LM Studio and use the model loading interface to load the gpt-oss model you downloaded. Alternatively, you can use the command line:
# For 20B
lms load openai/gpt-oss-20b
# or for 120B
lms load openai/gpt-oss-120b
  1. Use the model → Once loaded, you can interact with the model directly in LM Studio’s chat interface or through the API.

Use LM Studio’s chat interface to start a conversation with gpt-oss, or use the chat command in the terminal:

lms chat openai/gpt-oss-20b

Note about prompt formatting: LM Studio utilizes OpenAI’s Harmony library to construct the input to gpt-oss models, both when running via llama.cpp and MLX.

LM Studio exposes a Chat Completions-compatible API so you can use the OpenAI SDK without changing much. Here’s a Python example:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:1234/v1",
    api_key="not-needed"  # LM Studio does not require an API key
)

result = client.chat.completions.create(
    model="openai/gpt-oss-20b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain what MXFP4 quantization is."}
    ]
)

print(result.choices[0].message.content)

If you’ve used the OpenAI SDK before, this will feel instantly familiar and your existing code should work by changing the base URL.

LM Studio is an MCP client, which means you can connect MCP servers to it. This allows you to provide external tools to gpt-oss models.

LM Studio’s mcp.json file is located in:

~/.lmstudio/mcp.json

LM Studio’s SDK is available in both Python and TypeScript. You can leverage the SDK to implement tool calling and local function execution with gpt-oss.

The way to achieve this is via the .act() call, which allows you to provide tools to the gpt-oss and have it go between calling tools and reasoning, until it completes your task.

The example below shows how to provide a single tool to the model that is able to create files on your local filesystem. You can use this example as a starting point, and extend it with more tools. See docs about tool definitions here for Python and TypeScript.

uv pip install lmstudio
import readline # Enables input line editing
from pathlib import Path

import lmstudio as lms

# Define a function that can be called by the model and provide them as tools to the model.
# Tools are just regular Python functions. They can be anything at all.
def create_file(name: str, content: str):
    """Create a file with the given name and content."""
    dest_path = Path(name)
    if dest_path.exists():
        return "Error: File already exists."
    try:
        dest_path.write_text(content, encoding="utf-8")
    except Exception as exc:
        return "Error: {exc!r}"
    return "File created."

def print_fragment(fragment, round_index=0):
    # .act() supplies the round index as the second parameter
    # Setting a default value means the callback is also
    # compatible with .complete() and .respond().
    print(fragment.content, end="", flush=True)

model = lms.llm("openai/gpt-oss-20b")
chat = lms.Chat("You are a helpful assistant running on the user's computer.")

while True:
    try:
        user_input = input("User (leave blank to exit): ")
    except EOFError:
        print()
        break
    if not user_input:
        break
    chat.add_user_message(user_input)
    print("Assistant: ", end="", flush=True)
    model.act(
        chat,
        [create_file],
        on_message=chat.append,
        on_prediction_fragment=print_fragment,
    )
    print()

For TypeScript developers who want to utilize gpt-oss locally, here’s a similar example using lmstudio-js:

npm install @lmstudio/sdk
import { Chat, LMStudioClient, tool } from "@lmstudio/sdk";
import { existsSync } from "fs";
import { writeFile } from "fs/promises";
import { createInterface } from "readline/promises";
import { z } from "zod";

const rl = createInterface({ input: process.stdin, output: process.stdout });
const client = new LMStudioClient();
const model = await client.llm.model("openai/gpt-oss-20b");
const chat = Chat.empty();

const createFileTool = tool({
  name: "createFile",
  description: "Create a file with the given name and content.",
  parameters: { name: z.string(), content: z.string() },
  implementation: async ({ name, content }) => {
    if (existsSync(name)) {
      return "Error: File already exists.";
    }
    await writeFile(name, content, "utf-8");
    return "File created.";
  },
});

while (true) {
  const input = await rl.question("User: ");
  // Append the user input to the chat
  chat.append("user", input);

  process.stdout.write("Assistant: ");
  await model.act(chat, [createFileTool], {
    // When the model finish the entire message, push it to the chat
    onMessage: (message) => chat.append(message),
    onPredictionFragment: ({ content }) => {
      process.stdout.write(content);
    },
  });
  process.stdout.write("\n");
}