惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

NISL@THU
NISL@THU
Vercel News
Vercel News
aimingoo的专栏
aimingoo的专栏
P
Proofpoint News Feed
Stack Overflow Blog
Stack Overflow Blog
T
Tailwind CSS Blog
云风的 BLOG
云风的 BLOG
L
LangChain Blog
有赞技术团队
有赞技术团队
Last Week in AI
Last Week in AI
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Microsoft Azure Blog
Microsoft Azure Blog
WordPress大学
WordPress大学
博客园 - 司徒正美
宝玉的分享
宝玉的分享
F
Full Disclosure
Microsoft Security Blog
Microsoft Security Blog
The GitHub Blog
The GitHub Blog
V
Visual Studio Blog
B
Blog
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
Y
Y Combinator Blog
I
InfoQ
酷 壳 – CoolShell
酷 壳 – CoolShell
Engineering at Meta
Engineering at Meta
博客园 - 聂微东
博客园 - Franky
MyScale Blog
MyScale Blog
H
Hackread – Cybersecurity News, Data Breaches, AI and More
T
The Blog of Author Tim Ferriss
月光博客
月光博客
H
Help Net Security
B
Blog RSS Feed
人人都是产品经理
人人都是产品经理
V
V2EX
罗磊的独立博客
小众软件
小众软件
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
大猫的无限游戏
大猫的无限游戏
N
Netflix TechBlog - Medium
A
About on SuperTechFans
Apple Machine Learning Research
Apple Machine Learning Research
Hugging Face - Blog
Hugging Face - Blog
S
SegmentFault 最新的问题
D
Docker
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
The Cloudflare Blog
量子位
Jina AI
Jina AI
博客园_首页

cs updates on arXiv.org

暂无文章

ISE: An Execution-Grounded Recipe for Multi-Turn OS-Agent Trajectories
Siyuan Luo, Nairong Zheng, Lin Zhou, Tiankuo Yao, Shengyou Yuan, · 2026-06-10 · via cs updates on arXiv.org

Training capable OS agents requires data that simultaneously captures structured user intents, multi-turn task delegation, and grounded tool execution--properties absent from existing datasets. We propose ISE (Intent -> Simulate -> Execute), a three-stage synthesis paradigm that addresses these gaps jointly. Stage 1 constructs roughly 50000 structured intents via a 4D framework (Persona x Domain x Task x Complexity); after deduplication the pool contains 43956 unique intents and attains a Vendi Score of 61.57 over the entire pool on mpnet-base-v2 embeddings (cosine kernel, q=1). Stage 2 drives multi-turn user-agent interaction through a role-locked user simulator that grounds each user turn in actual execution outcomes, producing 23132 complete trajectories averaging 8.12 user turns and 68.24 total dialogue turns. Stage 3 runs every tool call inside a live, isolated OS workspace, generating authentic failure-recovery dynamics instead of simulated responses. Fine-tuning on ISETrace improves ClawEval pass@1 from 19.3 to 37.7 using Qwen3-8B on agent tool-use tasks with a standard protocol. This result outperforms zero-shot GPT-4o and the larger Qwen3-32B base model which is four times bigger. An ablation on Stage 2 proves multi-turn simulation brings a large portion of the performance gain. We release all source code and dataset at https://github.com/Valiere01/ISE-Trace.