pyqlearning.samplabledata.policysampler._mxnet package¶
Submodules¶
pyqlearning.samplabledata.policysampler._mxnet.maze_multi_agent_policy module¶
-
class
pyqlearning.samplabledata.policysampler._mxnet.maze_multi_agent_policy.
MazeMultiAgentPolicy
(batch_size=25, map_size=(50, 50), moving_max_dist=3, possible_n=10, memory_num=3, repeating_penalty=0.5, enemy_num=2, enemy_init_dist=5, enemy_moving_max_dist=1, ctx=gpu(0))[source]¶ Bases:
accelbrainbase.samplabledata.policy_sampler.PolicySampler
Policy sampler for the multi-agent Deep Q-learning to evaluate the value of the “action”.
-
END_STATE
= 'running'¶
-
GOAL
= 3¶
-
SPACE
= 1¶
-
START
= 0¶
-
START_POS
= (1, 1)¶
-
WALL
= -1¶
-
check_the_end_flag
(state_arr, meta_data_arr=None)[source]¶ Check the end flag.
If this return value is True, the learning is end.
As a rule, the learning can not be stopped. This method should be overrided for concreate usecases.
Parameters: - state_arr – state in self.t.
- meta_data_arr – meta data of the state.
Returns: bool
-
inferencing_mode
¶ getter
-
map_arr
¶ getter
-
observe_reward_value
(state_arr, action_arr, meta_data_arr=None)[source]¶ Compute the reward value.
Parameters: - state_arr – Tensor of state.
- action_arr – Tensor of action.
- meta_data_arr – Meta data of actions.
Returns: Reward value.
-
pyqlearning.samplabledata.policysampler._mxnet.maze_policy module¶
-
class
pyqlearning.samplabledata.policysampler._mxnet.maze_policy.
MazePolicy
(batch_size=25, map_size=(50, 50), moving_max_dist=3, possible_n=10, memory_num=3, repeating_penalty=0.5, ctx=gpu(0))[source]¶ Bases:
accelbrainbase.samplabledata.policy_sampler.PolicySampler
Policy sampler for the multi-agent Deep Q-learning to evaluate the value of the “action”.
-
END_STATE
= 'running'¶
-
GOAL
= 3¶
-
SPACE
= 1¶
-
START
= 0¶
-
START_POS
= (1, 1)¶
-
WALL
= -1¶
-
check_the_end_flag
(state_arr, meta_data_arr=None)[source]¶ Check the end flag.
If this return value is True, the learning is end.
As a rule, the learning can not be stopped. This method should be overrided for concreate usecases.
Parameters: - state_arr – state in self.t.
- meta_data_arr – meta data of the state.
Returns: bool
-
inferencing_mode
¶ getter
-
map_arr
¶ getter
-
observe_reward_value
(state_arr, action_arr, meta_data_arr=None)[source]¶ Compute the reward value.
Parameters: - state_arr – Tensor of state.
- action_arr – Tensor of action.
- meta_data_arr – Meta data of actions.
Returns: Reward value.
-