accelbrainbase.controllablemodel._mxnet.dqlcontroller package¶
Submodules¶
accelbrainbase.controllablemodel._mxnet.dqlcontroller.dqn_controller module¶
-
class
accelbrainbase.controllablemodel._mxnet.dqlcontroller.dqn_controller.
DQNController
(function_approximator, policy_sampler, computable_loss, optimizer_name='SGD', learning_rate=1e-05, learning_attenuate_rate=1.0, attenuate_epoch=50, hybridize_flag=True, scale=1.0, ctx=gpu(0), initializer=None, recursive_learning_flag=False, **kwargs)¶ Bases:
accelbrainbase.controllablemodel._mxnet.dql_controller.DQLController
Abstract base class to implement the Deep Q-Network(DQN).
The structure of Q-Learning is based on the Epsilon Greedy Q-Leanring algorithm, which is a typical off-policy algorithm. In this paradigm, stochastic searching and deterministic searching can coexist by hyperparameter epsilon_greedy_rate that is probability that agent searches greedy. Greedy searching is deterministic in the sensethat policy of agent follows the selection that maximizes the Q-Value.
References
- https://code.accel-brain.com/Reinforcement-Learning/README.html#deep-q-network
- Egorov, M. (2016). Multi-agent deep reinforcement learning.(URL: https://pdfs.semanticscholar.org/dd98/9d94613f439c05725bad958929357e365084.pdf)
- Gupta, J. K., Egorov, M., & Kochenderfer, M. (2017, May). Cooperative multi-agent control using deep reinforcement learning. In International Conference on Autonomous Agents and Multiagent Systems (pp. 66-83). Springer, Cham.
- Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
-
epsilon_greedy_rate
¶ getter for epsilo-greedy rate
-
get_epsilon_greedy_rate
()¶ getter for epsilo-greedy rate
-
select_action
(possible_action_arr, possible_predicted_q_arr, possible_reward_value_arr, possible_next_q_arr, possible_meta_data_arr=None)¶ Select action by Q(state, action).
Parameters: - possible_action_arr – Tensor of actions.
- possible_predicted_q_arr – Tensor of Q-Values.
- possible_reward_value_arr – Tensor of reward values.
- possible_next_q_arr – Tensor of Q-Values in next time.
- possible_meta_data_arr – mxnet.ndarray.NDArray or np.array of meta data of the actions.
- Retruns:
- Tuple(np.ndarray of action., Q-Value)
-
select_action_key
(possible_action_arr, possible_predicted_q_arr)¶ Select action by Q(state, action).
Parameters: - possible_action_arr – np.ndarray of actions.
- possible_predicted_q_arr – np.ndarray of Q-Values.
- Retruns:
- np.ndarray of keys.
-
set_epsilon_greedy_rate
(value)¶ setter for epsilo-greedy rate