惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Cloudbric
Cloudbric
E
Exploit-DB.com RSS Feed
SecWiki News
SecWiki News
Forbes - Security
Forbes - Security
N
News | PayPal Newsroom
S
Security @ Cisco Blogs
Schneier on Security
Schneier on Security
V
V2EX - 技术
S
Secure Thoughts
W
WeLiveSecurity
Google DeepMind News
Google DeepMind News
C
CERT Recently Published Vulnerability Notes
NISL@THU
NISL@THU
S
Securelist
S
Security Archives - TechRepublic
Know Your Adversary
Know Your Adversary
V
Vulnerabilities – Threatpost
Security Latest
Security Latest
Recent Commits to openclaw:main
Recent Commits to openclaw:main
G
GRAHAM CLULEY
H
Hacker News: Front Page
Microsoft Azure Blog
Microsoft Azure Blog
I
Intezer
Google Online Security Blog
Google Online Security Blog
美团技术团队
阮一峰的网络日志
阮一峰的网络日志
T
The Exploit Database - CXSecurity.com
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
Webroot Blog
Webroot Blog
Jina AI
Jina AI
Engineering at Meta
Engineering at Meta
P
Proofpoint News Feed
The Cloudflare Blog
I
InfoQ
L
LangChain Blog
U
Unit 42
P
Proofpoint News Feed
S
Schneier on Security
S
Security Affairs
Y
Y Combinator Blog
T
Tenable Blog
N
News and Events Feed by Topic
MyScale Blog
MyScale Blog
量子位
Google DeepMind News
Google DeepMind News
Cyberwarzone
Cyberwarzone
博客园 - 聂微东
D
Darknet – Hacking Tools, Hacker News & Cyber Security
GbyAI
GbyAI
AWS News Blog
AWS News Blog

博客园 - 范仁义

吊打市面上100%的markdown编辑器 范仁义软件合集 全网最通俗易懂傅里叶变换 【强化学习玩转超级马里奥】05-最最简单的超级马里奥训练过程 【强化学习玩转超级马里奥】04-stable-baselines3 库介绍 【强化学习玩转超级马里奥】02-运行超级马里奥 【强化学习玩转超级马里奥】01-nes-py 包安装实例 【强化学习玩转超级马里奥】01-超级马里奥环境安装 【强化学习玩转超级马里奥】00-强化学习玩马里奥课程介绍 linux查找操作 分析MongoDB架构案例 legend3---bootstrap modal框出现蒙层,无法点击modal框内容(z-index问题) legend3---laravel报419错误 laravel自定义中间件实例 laravel中间件Middleware原理解析及实例 git: Failed to connect to github.com port 443: Timed out 记忆规律 tinymce上传的图片不指定宽高 z-index总结
【强化学习玩转超级马里奥】03-马里奥环境代码说明
范仁义 · 2022-03-18 · via 博客园 - 范仁义

【强化学习玩转超级马里奥】03-马里奥环境代码说明

一、代码分析

from nes_py.wrappers import JoypadSpace
import gym_super_mario_bros
from gym_super_mario_bros.actions import SIMPLE_MOVEMENT
import time
from matplotlib import pyplot as plt
env = gym_super_mario_bros.make('SuperMarioBros-v0')
env = JoypadSpace(env, SIMPLE_MOVEMENT)

二、分析动作

1、使用 JoypadSpace

env = gym_super_mario_bros.make('SuperMarioBros-v0')
env = JoypadSpace(env, SIMPLE_MOVEMENT)
env.action_space
env.action_space.sample()

2、查看动作具体是什么

SIMPLE_MOVEMENT
SIMPLE_MOVEMENT[1]

3、不使用JoypadSpace的情况

env = gym_super_mario_bros.make('SuperMarioBros-v0')
env.action_space

4、使用固定动作效果

比如只让马里奥向右走

from nes_py.wrappers import JoypadSpace
import gym_super_mario_bros
from gym_super_mario_bros.actions import SIMPLE_MOVEMENT
import time

env = gym_super_mario_bros.make('SuperMarioBros-v0')
env = JoypadSpace(env, SIMPLE_MOVEMENT)

done = True
for step in range(5000):
    if done:
        state = env.reset()
    state, reward, done, info = env.step(6)
    time.sleep(0.01)
    env.render()

env.close()
env.close()

三、分析state

state = env.reset()
state.shape
plt.imshow(state)
state, reward, done, info = env.step(2)
plt.imshow(state)

四、查看奖励

from nes_py.wrappers import JoypadSpace
import gym_super_mario_bros
from gym_super_mario_bros.actions import SIMPLE_MOVEMENT
import time

env = gym_super_mario_bros.make('SuperMarioBros-v0')
env = JoypadSpace(env, SIMPLE_MOVEMENT)

done = True
for step in range(5000):
    if done:
        state = env.reset()
    state, reward, done, info = env.step(1)
    print(reward)
    time.sleep(0.04)
    env.render()

env.close()

五、查看info

from nes_py.wrappers import JoypadSpace
import gym_super_mario_bros
from gym_super_mario_bros.actions import SIMPLE_MOVEMENT
import time

env = gym_super_mario_bros.make('SuperMarioBros-v0')
env = JoypadSpace(env, SIMPLE_MOVEMENT)

state = env.reset()
state, reward, done, info = env.step(1)
print(info)

六、换关卡

from nes_py.wrappers import JoypadSpace
import gym_super_mario_bros
from gym_super_mario_bros.actions import SIMPLE_MOVEMENT
import time

env = gym_super_mario_bros.make('SuperMarioBros-4-2-v1')
env = JoypadSpace(env, SIMPLE_MOVEMENT)

done = True
for step in range(5000):
    if done:
        state = env.reset()
    state, reward, done, info = env.step(env.action_space.sample())
    time.sleep(0.01)
    env.render()

env.close()