惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

D
DataBreaches.Net
O
OpenAI News
U
Unit 42
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
Microsoft Security Blog
Microsoft Security Blog
博客园_首页
博客园 - Franky
H
Hackread – Cybersecurity News, Data Breaches, AI and More
博客园 - 叶小钗
Hugging Face - Blog
Hugging Face - Blog
V
Visual Studio Blog
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
量子位
博客园 - 司徒正美
大猫的无限游戏
大猫的无限游戏
Microsoft Azure Blog
Microsoft Azure Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Apple Machine Learning Research
Apple Machine Learning Research
人人都是产品经理
人人都是产品经理
爱范儿
爱范儿
小众软件
小众软件
腾讯CDC
G
Google Developers Blog
博客园 - 【当耐特】
F
Fortinet All Blogs
阮一峰的网络日志
阮一峰的网络日志
S
SegmentFault 最新的问题
Martin Fowler
Martin Fowler
酷 壳 – CoolShell
酷 壳 – CoolShell
Vercel News
Vercel News
Stack Overflow Blog
Stack Overflow Blog
博客园 - 聂微东
C
Check Point Blog
罗磊的独立博客
Jina AI
Jina AI
WordPress大学
WordPress大学
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
A
About on SuperTechFans
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
J
Java Code Geeks
Y
Y Combinator Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
IT之家
IT之家
B
Blog
有赞技术团队
有赞技术团队
GbyAI
GbyAI
Last Week in AI
Last Week in AI
V
V2EX
月光博客
月光博客
宝玉的分享
宝玉的分享

Sam Altman

- Sora update #1 Sora 2 Abundant Intelligence Jakub and Szymon The Gentle Singularity Three Observations Reflections GPT-4o What I Wish Someone Had Told Me Helion Needs You DALL•E 2 Helion The Strength of Being Misunderstood PG and Jessica Researchers and Founders Project Covalence Idea Generation Please Fund More Science Funding for COVID-19 Projects The Virus Hard Startups How To Invest In Startups How To Be Successful US Digital Currency Productivity A Clarification E Pur Si Muove The Merge
Reinforcement Learning Progress
Sam Altman · 2018-06-26 · via Sam Altman

Today, OpenAI released a new result.  We used PPO (Proximal Policy Optimization), a general reinforcement learning algorithm invented by OpenAI, to train a team of 5 agents to play Dota and beat semi-pros.

This is the game that to me feels closest to the real world and complex decision making (combining strategy, tactics, coordinating, and real-time action) of any game AI had made real progress against so far.

The agents we train consistently outperform two-week old agents with a win rate of 90-95%.  We did this without training on human-played games—we did design the reward functions, of course, but the algorithm figured out how to play by training against itself.

This is a big deal because it shows that deep reinforcement learning can solve extremely hard problems whenever you can throw enough computing scale and a really good simulated environment that captures the problem you’re solving.  We hope to use this same approach to solve very different problems soon.  It's easy to imagine this being applied to environments that look increasingly like the real world.

There are many problems in the world that are far too complex to hand-code solutions for.  I expect this to be a large branch of machine learning, and an important step on the road towards general intelligence.