Pre-knowledge review
DQN
loss function:
$I = (r + \gamma max_{a’} Q(s’, a’, w_{-} - Q(s, a, w)))^2$
Deep Q Learning implement the q-network to approximate Q function in place of a huge state table. In the training stage, our target data may gain from Q-network with old weight to update the new weight, which is very similar to supervised learning, while DQN is not.algorithm
A2C
loss
algorithm
DDPG
Naive analysis may start from the decomposition three phrases: deep + Deterministic + Policy Gradient (from )
Deep
In fact, in DQN, we have two weights, or two network, one to predict the future Q value with old weight, and one to be updated. Also, we may perform the memory replay.
Deterministic
Deterministic Action taken rather than sample from a distribution.
Policy Gradient
actor
Policy Gradientcritic
Value-basedalgorithm