惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Google DeepMind News
Google DeepMind News
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
Security Latest
Security Latest
P
Palo Alto Networks Blog
AWS News Blog
AWS News Blog
NISL@THU
NISL@THU
T
Threatpost
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
Latest news
Latest news
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
WordPress大学
WordPress大学
J
Java Code Geeks
P
Privacy International News Feed
阮一峰的网络日志
阮一峰的网络日志
S
Schneier on Security
博客园 - 聂微东
Project Zero
Project Zero
美团技术团队
Recent Commits to openclaw:main
Recent Commits to openclaw:main
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Scott Helme
Scott Helme
I
Intezer
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
H
Hacker News: Front Page
S
Security @ Cisco Blogs
博客园 - 司徒正美
O
OpenAI News
Last Week in AI
Last Week in AI
L
LINUX DO - 热门话题
酷 壳 – CoolShell
酷 壳 – CoolShell
SecWiki News
SecWiki News
月光博客
月光博客
S
Security Affairs
The GitHub Blog
The GitHub Blog
P
Privacy & Cybersecurity Law Blog
S
Secure Thoughts
V
V2EX
S
Securelist
F
Fortinet All Blogs
W
WeLiveSecurity
D
Docker
博客园 - 三生石上(FineUI控件)
Simon Willison's Weblog
Simon Willison's Weblog
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
C
Cyber Attacks, Cyber Crime and Cyber Security
V
Visual Studio Blog
www.infosecurity-magazine.com
www.infosecurity-magazine.com
Webroot Blog
Webroot Blog
Engineering at Meta
Engineering at Meta

Sam Altman

- Sora update #1 Sora 2 Abundant Intelligence Jakub and Szymon The Gentle Singularity Three Observations Reflections GPT-4o What I Wish Someone Had Told Me Helion Needs You DALL•E 2 Helion The Strength of Being Misunderstood PG and Jessica Researchers and Founders Project Covalence Idea Generation Please Fund More Science Funding for COVID-19 Projects The Virus Hard Startups How To Invest In Startups How To Be Successful US Digital Currency Productivity A Clarification E Pur Si Muove The Merge
Reinforcement Learning Progress
Sam Altman · 2018-06-26 · via Sam Altman

Today, OpenAI released a new result.  We used PPO (Proximal Policy Optimization), a general reinforcement learning algorithm invented by OpenAI, to train a team of 5 agents to play Dota and beat semi-pros.

This is the game that to me feels closest to the real world and complex decision making (combining strategy, tactics, coordinating, and real-time action) of any game AI had made real progress against so far.

The agents we train consistently outperform two-week old agents with a win rate of 90-95%.  We did this without training on human-played games—we did design the reward functions, of course, but the algorithm figured out how to play by training against itself.

This is a big deal because it shows that deep reinforcement learning can solve extremely hard problems whenever you can throw enough computing scale and a really good simulated environment that captures the problem you’re solving.  We hope to use this same approach to solve very different problems soon.  It's easy to imagine this being applied to environments that look increasingly like the real world.

There are many problems in the world that are far too complex to hand-code solutions for.  I expect this to be a large branch of machine learning, and an important step on the road towards general intelligence.