惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Cloudbric
Cloudbric
E
Exploit-DB.com RSS Feed
SecWiki News
SecWiki News
Forbes - Security
Forbes - Security
N
News | PayPal Newsroom
S
Security @ Cisco Blogs
Schneier on Security
Schneier on Security
V
V2EX - 技术
S
Secure Thoughts
W
WeLiveSecurity
Google DeepMind News
Google DeepMind News
C
CERT Recently Published Vulnerability Notes
NISL@THU
NISL@THU
S
Securelist
S
Security Archives - TechRepublic
Know Your Adversary
Know Your Adversary
V
Vulnerabilities – Threatpost
Security Latest
Security Latest
Recent Commits to openclaw:main
Recent Commits to openclaw:main
G
GRAHAM CLULEY
H
Hacker News: Front Page
Microsoft Azure Blog
Microsoft Azure Blog
I
Intezer
Google Online Security Blog
Google Online Security Blog
美团技术团队
阮一峰的网络日志
阮一峰的网络日志
T
The Exploit Database - CXSecurity.com
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
Webroot Blog
Webroot Blog
Jina AI
Jina AI
Engineering at Meta
Engineering at Meta
P
Proofpoint News Feed
The Cloudflare Blog
I
InfoQ
L
LangChain Blog
U
Unit 42
P
Proofpoint News Feed
S
Schneier on Security
S
Security Affairs
Y
Y Combinator Blog
T
Tenable Blog
N
News and Events Feed by Topic
MyScale Blog
MyScale Blog
量子位
Google DeepMind News
Google DeepMind News
Cyberwarzone
Cyberwarzone
博客园 - 聂微东
D
Darknet – Hacking Tools, Hacker News & Cyber Security
GbyAI
GbyAI
AWS News Blog
AWS News Blog

博客园 - 范仁义

吊打市面上100%的markdown编辑器 范仁义软件合集 全网最通俗易懂傅里叶变换 【强化学习玩转超级马里奥】04-stable-baselines3 库介绍 【强化学习玩转超级马里奥】03-马里奥环境代码说明 【强化学习玩转超级马里奥】02-运行超级马里奥 【强化学习玩转超级马里奥】01-nes-py 包安装实例 【强化学习玩转超级马里奥】01-超级马里奥环境安装 【强化学习玩转超级马里奥】00-强化学习玩马里奥课程介绍 linux查找操作 分析MongoDB架构案例 legend3---bootstrap modal框出现蒙层,无法点击modal框内容(z-index问题) legend3---laravel报419错误 laravel自定义中间件实例 laravel中间件Middleware原理解析及实例 git: Failed to connect to github.com port 443: Timed out 记忆规律 tinymce上传的图片不指定宽高 z-index总结
【强化学习玩转超级马里奥】05-最最简单的超级马里奥训练过程
范仁义 · 2022-03-18 · via 博客园 - 范仁义

【强化学习玩转超级马里奥】05-最最简单的超级马里奥训练过程

最最简单的超级马里奥训练过程

from nes_py.wrappers import JoypadSpace
import gym_super_mario_bros
from gym_super_mario_bros.actions import SIMPLE_MOVEMENT
import time
from matplotlib import pyplot as plt
from stable_baselines3 import PPO
env = gym_super_mario_bros.make('SuperMarioBros-v0')
env = JoypadSpace(env, SIMPLE_MOVEMENT)
tensorboard_log = r'./tensorboard_log/'

model = PPO("CnnPolicy", env, verbose=1,
            tensorboard_log = tensorboard_log)
model.learn(total_timesteps=25000)
model.save("mario_model")

Using cuda device
Wrapping the env with a Monitor wrapper
Wrapping the env in a DummyVecEnv.
Wrapping the env in a VecTransposeImage.
Logging to ./tensorboard_log/PPO_1

D:\software\e_anaconda\envs\pytorch\lib\site-packages\gym_super_mario_bros\smb_env.py:148: RuntimeWarning: overflow encountered in ubyte_scalars
return (self.ram[0x86] - self.ram[0x071c]) % 256

-----------------------------
| time/ | |
| fps | 116 |
| iterations | 1 |
| time_elapsed | 17 |
| total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/ | |
| fps | 81 |
| iterations | 2 |
| time_elapsed | 50 |
| total_timesteps | 4096 |
| train/ | |
| approx_kl | 0.025405666 |
| clip_fraction | 0.274 |
| clip_range | 0.2 |
| entropy_loss | -1.92 |
| explained_variance | 0.00504 |
| learning_rate | 0.0003 |
| loss | 0.621 |
| n_updates | 10 |
| policy_gradient_loss | 0.0109 |
| value_loss | 17.4 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 73 |
| iterations | 3 |
| time_elapsed | 83 |
| total_timesteps | 6144 |
| train/ | |
| approx_kl | 0.010906073 |
| clip_fraction | 0.109 |
| clip_range | 0.2 |
| entropy_loss | -1.92 |
| explained_variance | 0.0211 |
| learning_rate | 0.0003 |
| loss | 0.101 |
| n_updates | 20 |
| policy_gradient_loss | -0.00392 |
| value_loss | 0.187 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 69 |
| iterations | 4 |
| time_elapsed | 117 |
| total_timesteps | 8192 |
| train/ | |
| approx_kl | 0.009882288 |
| clip_fraction | 0.0681 |
| clip_range | 0.2 |
| entropy_loss | -1.9 |
| explained_variance | 0.101 |
| learning_rate | 0.0003 |
| loss | 0.0738 |
| n_updates | 30 |
| policy_gradient_loss | -0.00502 |
| value_loss | 0.13 |
-----------------------------------------
-----------------------------------------
| rollout/ | |
| ep_len_mean | 1.01e+04 |
| ep_rew_mean | 891 |
| time/ | |
| fps | 65 |
| iterations | 5 |
| time_elapsed | 156 |
| total_timesteps | 10240 |
| train/ | |
| approx_kl | 0.008186281 |
| clip_fraction | 0.105 |
| clip_range | 0.2 |
| entropy_loss | -1.87 |
| explained_variance | 0.0161 |
| learning_rate | 0.0003 |
| loss | 0.28 |
| n_updates | 40 |
| policy_gradient_loss | -0.00649 |
| value_loss | 0.811 |
-----------------------------------------
-----------------------------------------
| rollout/ | |
| ep_len_mean | 1.01e+04 |
| ep_rew_mean | 891 |
| time/ | |
| fps | 64 |
| iterations | 6 |
| time_elapsed | 190 |
| total_timesteps | 12288 |
| train/ | |
| approx_kl | 0.024062362 |
| clip_fraction | 0.246 |
| clip_range | 0.2 |
| entropy_loss | -1.9 |
| explained_variance | 0.269 |
| learning_rate | 0.0003 |
| loss | 0.54 |
| n_updates | 50 |
| policy_gradient_loss | 0.0362 |
| value_loss | 10.8 |
-----------------------------------------
-----------------------------------------
| rollout/ | |
| ep_len_mean | 1.01e+04 |
| ep_rew_mean | 891 |
| time/ | |
| fps | 63 |
| iterations | 7 |
| time_elapsed | 225 |
| total_timesteps | 14336 |
| train/ | |
| approx_kl | 0.024466533 |
| clip_fraction | 0.211 |
| clip_range | 0.2 |
| entropy_loss | -1.89 |
| explained_variance | 0.839 |
| learning_rate | 0.0003 |
| loss | 0.435 |
| n_updates | 60 |
| policy_gradient_loss | 0.023 |
| value_loss | 3.06 |
-----------------------------------------
----------------------------------------
| rollout/ | |
| ep_len_mean | 1.01e+04 |
| ep_rew_mean | 891 |
| time/ | |
| fps | 63 |
| iterations | 8 |
| time_elapsed | 259 |
| total_timesteps | 16384 |
| train/ | |
| approx_kl | 0.01970315 |
| clip_fraction | 0.242 |
| clip_range | 0.2 |
| entropy_loss | -1.9 |
| explained_variance | 0.486 |
| learning_rate | 0.0003 |
| loss | 0.526 |
| n_updates | 70 |
| policy_gradient_loss | 0.00486 |
| value_loss | 1.57 |
----------------------------------------
-----------------------------------------
| rollout/ | |
| ep_len_mean | 1.01e+04 |
| ep_rew_mean | 891 |
| time/ | |
| fps | 62 |
| iterations | 9 |
| time_elapsed | 293 |
| total_timesteps | 18432 |
| train/ | |
| approx_kl | 0.012460884 |
| clip_fraction | 0.217 |
| clip_range | 0.2 |
| entropy_loss | -1.87 |
| explained_variance | 0.74 |
| learning_rate | 0.0003 |
| loss | 0.139 |
| n_updates | 80 |
| policy_gradient_loss | -0.000311 |
| value_loss | 0.734 |
-----------------------------------------
----------------------------------------
| rollout/ | |
| ep_len_mean | 1.01e+04 |
| ep_rew_mean | 891 |
| time/ | |
| fps | 62 |
| iterations | 10 |
| time_elapsed | 327 |
| total_timesteps | 20480 |
| train/ | |
| approx_kl | 0.02535792 |
| clip_fraction | 0.298 |
| clip_range | 0.2 |
| entropy_loss | -1.88 |
| explained_variance | 0.405 |
| learning_rate | 0.0003 |
| loss | 1.17 |
| n_updates | 90 |
| policy_gradient_loss | 0.0205 |
| value_loss | 6.6 |
----------------------------------------
-----------------------------------------
| rollout/ | |
| ep_len_mean | 1.01e+04 |
| ep_rew_mean | 891 |
| time/ | |
| fps | 62 |
| iterations | 11 |
| time_elapsed | 361 |
| total_timesteps | 22528 |
| train/ | |
| approx_kl | 0.019694094 |
| clip_fraction | 0.243 |
| clip_range | 0.2 |
| entropy_loss | -1.91 |
| explained_variance | 0.952 |
| learning_rate | 0.0003 |
| loss | 0.39 |
| n_updates | 100 |
| policy_gradient_loss | -0.00434 |
| value_loss | 1.31 |
-----------------------------------------
-----------------------------------------
| rollout/ | |
| ep_len_mean | 1.19e+04 |
| ep_rew_mean | 884 |
| time/ | |
| fps | 61 |
| iterations | 12 |
| time_elapsed | 398 |
| total_timesteps | 24576 |
| train/ | |
| approx_kl | 0.013096321 |
| clip_fraction | 0.227 |
| clip_range | 0.2 |
| entropy_loss | -1.91 |
| explained_variance | 0.0132 |
| learning_rate | 0.0003 |
| loss | 0.669 |
| n_updates | 110 |
| policy_gradient_loss | -0.000837 |
| value_loss | 1.42 |
-----------------------------------------
-----------------------------------------
| rollout/ | |
| ep_len_mean | 1.19e+04 |
| ep_rew_mean | 884 |
| time/ | |
| fps | 61 |
| iterations | 13 |
| time_elapsed | 432 |
| total_timesteps | 26624 |
| train/ | |
| approx_kl | 0.014833134 |
| clip_fraction | 0.239 |
| clip_range | 0.2 |
| entropy_loss | -1.9 |
| explained_variance | 0.452 |
| learning_rate | 0.0003 |
| loss | 18.1 |
| n_updates | 120 |
| policy_gradient_loss | -7.3e-05 |
| value_loss | 26.3 |
-----------------------------------------

测试代码

from nes_py.wrappers import JoypadSpace
import gym_super_mario_bros
from gym_super_mario_bros.actions import SIMPLE_MOVEMENT
import time
from matplotlib import pyplot as plt
from stable_baselines3 import PPO
env = gym_super_mario_bros.make('SuperMarioBros-v0')
env = JoypadSpace(env, SIMPLE_MOVEMENT)
model = PPO.load("mario_model")

obs = env.reset()
obs=obs.copy()
done = True
while True:
    if done:
        state = env.reset()
    action, _states = model.predict(obs)
    obs, rewards, done, info = env.step(action)
    obs=obs.copy()
    env.render()

视频位置

强化学习玩超级马里奥【2022 年 3 月最新】(学不会可以来打我)_哔哩哔哩_bilibili
https://www.bilibili.com/video/BV1iL411A7zo?spm_id_from=333.999.0.0

强化学习库 Stable-Baselines3_哔哩哔哩_bilibili
https://www.bilibili.com/video/BV1ca41187qB?spm_id_from=333.999.0.0

超参数调优框架 optuna_哔哩哔哩_bilibili
https://www.bilibili.com/video/BV1ni4y1C7Sv?spm_id_from=333.999.0.0

强化学习玩超级马里奥-读书编程笔记
https://fanrenyi.com/lesson/48

超参数调优框架 optuna-读书编程笔记
https://fanrenyi.com/lesson/49

强化学习库 Stable-Baselines3-读书编程笔记
https://fanrenyi.com/lesson/50

《强化学习玩超级马里奥》课程讲解如何用强化学习来训练超级马里奥。本课程是保姆级教程,通俗易懂,一步步带你敲代码。深度学习库用的 Pytorch,强化学习库用的是 Stable-Baselines3,超参数调优框架用的是 Optuna。代码及资料 github 地址:【 https://github.com/fry404006308/fry_course_materials/tree/master 】中的【220310_强化学习玩马里奥】

代码 github 位置

fry_course_materials/220310_强化学习玩马里奥 at master · fry404006308/fry_course_materials · GitHub
https://github.com/fry404006308/fry_course_materials/tree/master/220310_强化学习玩马里奥

博客位置

其它更多博客内容可以去 github 代码中查看

https://github.com/fry404006308/fry_course_materials/tree/master/

【强化学习玩转超级马里奥】05-最最简单的超级马里奥训练过程 - 范仁义 - 博客园
https://www.cnblogs.com/Renyi-Fan/p/16021552.html

【强化学习玩转超级马里奥】04-stable-baselines3 库介绍 - 范仁义 - 博客园
https://www.cnblogs.com/Renyi-Fan/p/16021529.html

【强化学习玩转超级马里奥】03-马里奥环境代码说明 - 范仁义 - 博客园
https://www.cnblogs.com/Renyi-Fan/p/16021518.html

【强化学习玩转超级马里奥】02-运行超级马里奥 - 范仁义 - 博客园
https://www.cnblogs.com/Renyi-Fan/p/16021507.html

【强化学习玩转超级马里奥】01-nes-py 包安装实例 - 范仁义 - 博客园
https://www.cnblogs.com/Renyi-Fan/p/16021496.html

【强化学习玩转超级马里奥】01-超级马里奥环境安装 - 范仁义 - 博客园
https://www.cnblogs.com/Renyi-Fan/p/16021460.html

【强化学习玩转超级马里奥】00-强化学习玩马里奥课程介绍 - 范仁义 - 博客园
https://www.cnblogs.com/Renyi-Fan/p/16021398.html

课程内容

【强化学习玩转超级马里奥】00-强化学习玩马里奥课程介绍

【强化学习玩转超级马里奥】01-超级马里奥环境安装

【强化学习玩转超级马里奥】01-nes-py 包安装实例

【强化学习玩转超级马里奥】02-运行超级马里奥

【强化学习玩转超级马里奥】03-马里奥环境代码说明

【强化学习玩转超级马里奥】04-stable-baselines3 库介绍

【强化学习玩转超级马里奥】05-最最简单的超级马里奥训练过程

【强化学习玩转超级马里奥】06-1-预处理与矢量化环境-预处理

【强化学习玩转超级马里奥】06-2-预处理与矢量化环境-矢量化环境

【强化学习玩转超级马里奥】07-1-模型训练参数设置-模型训练参数设置

【强化学习玩转超级马里奥】07-2-模型训练参数设置-修改参数接着训练

【强化学习玩转超级马里奥】07-3-模型训练参数设置-打印模型的参数

【强化学习玩转超级马里奥】08-保存最优模型

【强化学习玩转超级马里奥】09-1-隔多少步保存模型

【强化学习玩转超级马里奥】09-2-隔多少步保存模型-测试保存的模型

【强化学习玩转超级马里奥】10-阶段二训练与测试

【强化学习玩转超级马里奥】11-超参数调优库 optuna 介绍

【强化学习玩转超级马里奥】12-1-optuna 库选择超参数-optuna 库选择超参数

【强化学习玩转超级马里奥】12-2-optuna 库选择超参数-超参数选择具体实例

【强化学习玩转超级马里奥】12-3-optuna 库选择超参数-测试超参数调优出来的模型

【强化学习玩转超级马里奥】13-用选好超参数的模型去训练