accelbrainbase.controllablemodel._mxnet.dqlcontroller package

Submodules

accelbrainbase.controllablemodel._mxnet.dqlcontroller.dqn_controller module

class accelbrainbase.controllablemodel._mxnet.dqlcontroller.dqn_controller.DQNController(function_approximator, policy_sampler, computable_loss, optimizer_name='SGD', learning_rate=1e-05, learning_attenuate_rate=1.0, attenuate_epoch=50, hybridize_flag=True, scale=1.0, ctx=gpu(0), initializer=None, recursive_learning_flag=False, **kwargs)

Bases: accelbrainbase.controllablemodel._mxnet.dql_controller.DQLController

Abstract base class to implement the Deep Q-Network(DQN).

The structure of Q-Learning is based on the Epsilon Greedy Q-Leanring algorithm, which is a typical off-policy algorithm. In this paradigm, stochastic searching and deterministic searching can coexist by hyperparameter epsilon_greedy_rate that is probability that agent searches greedy. Greedy searching is deterministic in the sensethat policy of agent follows the selection that maximizes the Q-Value.

References

epsilon_greedy_rate

getter for epsilo-greedy rate

get_epsilon_greedy_rate()

getter for epsilo-greedy rate

select_action(possible_action_arr, possible_predicted_q_arr, possible_reward_value_arr, possible_next_q_arr, possible_meta_data_arr=None)

Select action by Q(state, action).

Parameters:
  • possible_action_arr – Tensor of actions.
  • possible_predicted_q_arr – Tensor of Q-Values.
  • possible_reward_value_arr – Tensor of reward values.
  • possible_next_q_arr – Tensor of Q-Values in next time.
  • possible_meta_data_arrmxnet.ndarray.NDArray or np.array of meta data of the actions.
Retruns:
Tuple(np.ndarray of action., Q-Value)
select_action_key(possible_action_arr, possible_predicted_q_arr)

Select action by Q(state, action).

Parameters:
  • possible_action_arrnp.ndarray of actions.
  • possible_predicted_q_arrnp.ndarray of Q-Values.
Retruns:
np.ndarray of keys.
set_epsilon_greedy_rate(value)

setter for epsilo-greedy rate

Module contents