My Opinion on RL | Hacker News
umjunsik132
·
2026-06-22
·
via HN's home page
|
|
| | My Opinion on RL | | 1 point by umjunsik132 1 minute ago | hide | past | favorite | discuss | | I think RL as a method which produces training data by model's predictions — It directly leads the model to extend its output range because of increased diversity of the data. However, fundamentally RL relies on bootstrapping and has moving target problem which are the reason of its poor stability. One of the most tractable method to approximate value function is TD which causes sample noise, function approximator error and moving target problems. I argue that we need to extend pure RL theory at the level of the Bellman equation to achieve more stable RL. Consequently, we need both a better mathematical foundation for value functions and a tractable approximation method that are aligned with each other — free from problems | |
| help |
|

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact |
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。