pyqlearning.samplabledata.policysampler._mxnet package¶

Submodules¶

pyqlearning.samplabledata.policysampler._mxnet.maze_multi_agent_policy module¶

class pyqlearning.samplabledata.policysampler._mxnet.maze_multi_agent_policy.MazeMultiAgentPolicy(batch_size=25, map_size=(50, 50), moving_max_dist=3, possible_n=10, memory_num=3, repeating_penalty=0.5, enemy_num=2, enemy_init_dist=5, enemy_moving_max_dist=1, ctx=gpu(0))[source]¶

Bases: accelbrainbase.samplabledata.policy_sampler.PolicySampler

Policy sampler for the multi-agent Deep Q-learning to evaluate the value of the “action”.

END_STATE = 'running'¶

GOAL = 3¶

SPACE = 1¶

START = 0¶

START_POS = (1, 1)¶

WALL = -1¶

check_the_end_flag(state_arr, meta_data_arr=None)[source]¶

Check the end flag.

If this return value is True, the learning is end.

As a rule, the learning can not be stopped. This method should be overrided for concreate usecases.

Parameters:	state_arr – state in self.t. meta_data_arr – meta data of the state.
Returns:	bool

create_enemy()[source]¶: Create enemies.

draw()[source]¶

Draw samples from distribtions.

Returns:	Tuple of `mx.nd.array`s.

extract_now_state()[source]¶

Extract now map state.

Returns:	np.ndarray of state.

get_inferencing_mode()[source]¶: getter

get_map_arr()[source]¶: getter

inferencing_mode¶: getter

map_arr¶: getter

observe_reward_value(state_arr, action_arr, meta_data_arr=None)[source]¶

Compute the reward value.

Parameters:	state_arr – Tensor of state. action_arr – Tensor of action. meta_data_arr – Meta data of actions.
Returns:	Reward value.

observe_state(state_arr, meta_data_arr)[source]¶

Observe states of agents in last epoch.

Parameters:	state_arr – Tensor of state. meta_data_arr – meta data of the state.

reset_agent_pos()[source]¶

set_inferencing_mode(value)[source]¶: setter

set_readonly(value)[source]¶: setter

update_state(action_arr, meta_data_arr=None)[source]¶

Update state.

This method can be overrided for concreate usecases.

Parameters:	action_arr – action in self.t. meta_data_arr – meta data of the action.
Returns:	Tuple data. - state in self.t+1. - meta data of the state.

pyqlearning.samplabledata.policysampler._mxnet.maze_policy module¶

class pyqlearning.samplabledata.policysampler._mxnet.maze_policy.MazePolicy(batch_size=25, map_size=(50, 50), moving_max_dist=3, possible_n=10, memory_num=3, repeating_penalty=0.5, ctx=gpu(0))[source]¶