# Q：在结合实施资格的痕迹

I am writing a MATLAB implemention of the SARSA algorithm, and have successfully writtena one-step implementation.

I am now trying to extend it to use eligibility traces, but the results I obtain are worse than with one-step. (Ie: The algorithm converges at a slower rate and the final path followed by the agent is longer.)

``````        e_trace(action_old, state_old) = e_trace(action_old, state_old) + 1;

% Update weights but only if we are past the first step
if(step > 1)

delta = (reward + discount*qval_new - qval_old);

% SARSA-lambda (Eligibility Traces)
dw = e_trace.*delta;

% One-step SARSA
dw = zeros(actions, states);
dw(action_old, state_old) = delta;

weights = weights + learning_rate*dw;

end

e_trace = discount*decay*e_trace;
``````

Essentially, my q-values are stored in an nXm weights matrix where n = number of actions and m = number of states. Eligibility trace values are stored in the e_trace matrix. According to whether I want to use one-step or ET I use either of the two definitions of dw. I am not sure where I am going wrong. The algorithm is implemented as shown in here: http://webdocs.cs.ualberta.ca/~sutton/book/ebook/node77.html

The

``````dw = e_trace .* delta
``````

Defines the weight change for all weights in the network (Ie: The change in value for all Q(s,a) pairs), which is then fed into the network adjusted by the learning-rate.

I should add that initially my weights and e-values are set to 0.

``````        e_trace(action_old, state_old) = e_trace(action_old, state_old) + 1;

% Update weights but only if we are past the first step
if(step > 1)

delta = (reward + discount*qval_new - qval_old);

% SARSA-lambda (Eligibility Traces)
dw = e_trace.*delta;

% One-step SARSA
dw = zeros(actions, states);
dw(action_old, state_old) = delta;

weights = weights + learning_rate*dw;

end

e_trace = discount*decay*e_trace;
``````

``````dw = e_trace .* delta
``````

Defines the weight change for all weights in the network (Ie: 这个 change in value for all Q(s,a) pairs), which is then fed into the network adjusted by the learning-rate.

algorithm  matlab  sarsa