pyqlearning.samplabledata.policysampler._mxnet package

Submodules

pyqlearning.samplabledata.policysampler._mxnet.maze_multi_agent_policy module

class pyqlearning.samplabledata.policysampler._mxnet.maze_multi_agent_policy.MazeMultiAgentPolicy(batch_size=25, map_size=(50, 50), moving_max_dist=3, possible_n=10, memory_num=3, repeating_penalty=0.5, enemy_num=2, enemy_init_dist=5, enemy_moving_max_dist=1, ctx=gpu(0))[source]

Bases: accelbrainbase.samplabledata.policy_sampler.PolicySampler

Policy sampler for the multi-agent Deep Q-learning to evaluate the value of the “action”.

END_STATE = 'running'
GOAL = 3
SPACE = 1
START = 0
START_POS = (1, 1)
WALL = -1
check_the_end_flag(state_arr, meta_data_arr=None)[source]

Check the end flag.

If this return value is True, the learning is end.

As a rule, the learning can not be stopped. This method should be overrided for concreate usecases.

Parameters:
  • state_arr – state in self.t.
  • meta_data_arr – meta data of the state.
Returns:

bool

create_enemy()[source]

Create enemies.

draw()[source]

Draw samples from distribtions.

Returns:Tuple of `mx.nd.array`s.
extract_now_state()[source]

Extract now map state.

Returns:np.ndarray of state.
get_inferencing_mode()[source]

getter

get_map_arr()[source]

getter

inferencing_mode

getter

map_arr

getter

observe_reward_value(state_arr, action_arr, meta_data_arr=None)[source]

Compute the reward value.

Parameters:
  • state_arr – Tensor of state.
  • action_arr – Tensor of action.
  • meta_data_arr – Meta data of actions.
Returns:

Reward value.

observe_state(state_arr, meta_data_arr)[source]

Observe states of agents in last epoch.

Parameters:
  • state_arr – Tensor of state.
  • meta_data_arr – meta data of the state.
reset_agent_pos()[source]
set_inferencing_mode(value)[source]

setter

set_readonly(value)[source]

setter

update_state(action_arr, meta_data_arr=None)[source]

Update state.

This method can be overrided for concreate usecases.

Parameters:
  • action_arr – action in self.t.
  • meta_data_arr – meta data of the action.
Returns:

Tuple data. - state in self.t+1. - meta data of the state.

pyqlearning.samplabledata.policysampler._mxnet.maze_policy module

class pyqlearning.samplabledata.policysampler._mxnet.maze_policy.MazePolicy(batch_size=25, map_size=(50, 50), moving_max_dist=3, possible_n=10, memory_num=3, repeating_penalty=0.5, ctx=gpu(0))[source]

Bases: accelbrainbase.samplabledata.policy_sampler.PolicySampler

Policy sampler for the multi-agent Deep Q-learning to evaluate the value of the “action”.

END_STATE = 'running'
GOAL = 3
SPACE = 1
START = 0
START_POS = (1, 1)
WALL = -1
check_the_end_flag(state_arr, meta_data_arr=None)[source]

Check the end flag.

If this return value is True, the learning is end.

As a rule, the learning can not be stopped. This method should be overrided for concreate usecases.

Parameters:
  • state_arr – state in self.t.
  • meta_data_arr – meta data of the state.
Returns:

bool

draw()[source]

Draw samples from distribtions.

Returns:Tuple of `mx.nd.array`s.
extract_now_state()[source]

Extract now map state.

Returns:np.ndarray of state.
get_inferencing_mode()[source]

getter

get_map_arr()[source]

getter

inferencing_mode

getter

map_arr

getter

observe_reward_value(state_arr, action_arr, meta_data_arr=None)[source]

Compute the reward value.

Parameters:
  • state_arr – Tensor of state.
  • action_arr – Tensor of action.
  • meta_data_arr – Meta data of actions.
Returns:

Reward value.

observe_state(state_arr, meta_data_arr)[source]

Observe states of agents in last epoch.

Parameters:
  • state_arr – Tensor of state.
  • meta_data_arr – meta data of the state.
reset_agent_pos()[source]
set_inferencing_mode(value)[source]

setter

set_readonly(value)[source]

setter

update_state(action_arr, meta_data_arr=None)[source]

Update state.

This method can be overrided for concreate usecases.

Parameters:
  • action_arr – action in self.t.
  • meta_data_arr – meta data of the action.
Returns:

Tuple data. - state in self.t+1. - meta data of the state.

Module contents