CodeWa!
找到你要的答案

## Q：Implementing Eligibility Traces in SARSA |
## Q：在结合实施资格的痕迹 |

I am writing a MATLAB implemention of the SARSA algorithm, and have successfully writtena one-step implementation. I am now trying to extend it to use eligibility traces, but the results I obtain are worse than with one-step. (Ie: The algorithm converges at a slower rate and the final path followed by the agent is longer.)
Essentially, my q-values are stored in an nXm weights matrix where n = number of actions and m = number of states. Eligibility trace values are stored in the e_trace matrix. According to whether I want to use one-step or ET I use either of the two definitions of dw. I am not sure where I am going wrong. The algorithm is implemented as shown in here: http://webdocs.cs.ualberta.ca/~sutton/book/ebook/node77.html The
Defines the weight change for all weights in the network (Ie: The change in value for all Q(s,a) pairs), which is then fed into the network adjusted by the learning-rate. I should add that initially my weights and e-values are set to 0. Any advice? |
我写的Sarsa算法的MATLAB实现，并已成功地writtena单步执行。 我现在试图延长它使用资格的痕迹，但我得到的结果比一步差。（该算法收敛速度较慢，随后的代理的最终路径是更长的。）
基本上，我的Q值都存储在一个N×M个权值矩阵n =动作和状态的M =号。资格迹值存储在e_trace矩阵。根据我是否想用一步或等我用两种数据仓库定义。我不知道我哪里出错了。该算法是显示在这里实现：HTTP：/ / webdocs。CS。ualberta。钙/ ~萨顿/书/电子书/ node77.html 这个
Defines the weight change for all weights in the network (Ie: 这个 change in value for all Q(s,a) pairs), which is then fed into the network adjusted by the learning-rate. 我要补充一点，最初我的重量和能量值都设置为0。 有什么建议吗？ |

algorithm matlab sarsa |