pyqlearning package

Submodules

pyqlearning.annealing_model module

class pyqlearning.annealing_model.AnnealingModel[source]

Bases: object

Abstract class of Annealing.

accepted_pos

getter

annealing()[source]

Annealing.

computed_cost_arr

getter

current_cost_arr

getter

current_dist_arr

getter

fit_dist_mat(dist_mat_arr)[source]

Fit ovserved data points.

Parameters:dist_mat_arr – fitted data points.
get_accepted_pos()[source]

getter

get_computed_cost_arr()[source]

getter

get_current_cost_arr()[source]

getter

get_current_dist_arr()[source]

getter

get_predicted_log_arr()[source]

getter

get_predicted_log_list()[source]

getter

get_stocked_predicted_arr()[source]

getter

get_var_arr()[source]

getter

get_var_log_arr()[source]

getter

get_x()[source]

getter

predicted_log_arr

getter

predicted_log_list

getter

set_accepted_pos(value)[source]

setter

set_computed_cost_arr(value)[source]

setter

set_current_cost_arr(value)[source]

setter

set_current_dist_arr(value)[source]

setter

set_predicted_log_arr(value)[source]

setter

set_predicted_log_list(value)[source]

setter

set_stocked_predicted_arr(value)[source]

setter

set_var_arr(value)[source]

setter

set_var_log_arr(value)[source]

setter

set_x(value)[source]

setter

stocked_predicted_arr

getter

var_arr

getter

var_log_arr

getter

x

getter

pyqlearning.beta_dist module

class pyqlearning.beta_dist.BetaDist(default_alpha=1, default_beta=1)[source]

Bases: object

Beta Distribusion for Thompson Sampling.

expected_value()[source]

Compute expected value.

Returns:Expected value.
likelihood()[source]

Compute likelihood.

Returns:likelihood.
observe(success, failure)[source]

Observation data.

Parameters:
  • success – The number of success.
  • failure – The number of failure.
variance()[source]

Compute variance.

Returns:variance.

pyqlearning.q_learning module

class pyqlearning.q_learning.QLearning[source]

Bases: object

Abstract base class and Template Method Pattern of Q-Learning.

alpha_value

Learning rate.

gamma_value

Gammma value.

q_dict

Q(state, action)

r_dict

R(state)

t

time.

alpha_value

getter Learning rate.

check_the_end_flag(state_key)[source]

Check the end flag.

If this return value is True, the learning is end.

As a rule, the learning can not be stopped. This method should be overrided for concreate usecases.

Parameters:state_key – The key of state in self.t.
Returns:bool
extract_possible_actions(state_key)[source]

Extract the list of the possible action in self.t+1.

Abstract method for concreate usecases.

Parameters:The key of state in self.t+1. (state_key) –
Returns:The possible action in self.t+1.
extract_q_df(state_key, action_key)[source]

Extract Q-Value from self.q_dict.

Parameters:
  • state_key – The key of state.
  • action_key – The key of action.
Returns:

Q-Value.

extract_r_df(state_key, r_value, action_key=None)[source]

Insert or update R-Value in self.r_dict.

Parameters:
  • state_key – The key of state.
  • r_value – R-Value(Reward).
  • action_key – The key of action if it is nesesary for the parametar of value function.
Exceptions:
TypeError: If the type of r_value is not float.
gamma_value

getter Gamma value.

get_alpha_value()[source]

getter Learning rate.

get_gamma_value()[source]

getter Gamma value.

get_q_df()[source]

getter

get_r_df()[source]

getter

get_t()[source]

getter Time.

learn(state_key, limit=1000)[source]

Learning.

normalize_q_value()[source]

Normalize q-value. This method should be overrided for concreate usecases.

This method is called in each learning steps.

For example:
self.q_df.q_value = self.q_df.q_value / self.q_df.q_value.sum()
normalize_r_value()[source]

Normalize r-value. This method should be overrided for concreate usecases.

This method is called in each learning steps.

For example:
self.r_df.r_value = self.r_df.r_value / self.r_df.r_value.sum()
observe_reward_value(state_key, action_key)[source]

Compute the reward value.

Parameters:
  • state_key – The key of state.
  • action_key – The key of action.
Returns:

Reward value.

predict_next_action(state_key, next_action_list)[source]

Predict next action by Q-Learning.

Parameters:
  • state_key – The key of state in self.t+1.
  • next_action_list – The possible action in self.t+1.
Returns:

The key of action.

q_df

getter

r_df

getter

save_q_df(state_key, action_key, q_value)[source]

Insert or update Q-Value in self.q_dict.

Parameters:
  • state_key – State.
  • action_key – Action.
  • q_value – Q-Value.
Exceptions:
TypeError: If the type of q_value is not float.
save_r_df(state_key, r_value, action_key=None)[source]

Insert or update R-Value in self.r_dict.

Parameters:
  • state_key – The key of state.
  • r_value – R-Value(Reward).
  • action_key – The key of action if it is nesesary for the parametar of value function.
Exceptions:
TypeError: If the type of r_value is not float.
select_action(state_key, next_action_list)[source]

Select action by Q(state, action).

Abstract method for concreate usecases.

Parameters:
  • state_key – The key of state.
  • next_action_list – The possible action in self.t+1. If the length of this list is zero, all action should be possible.
Retruns:
The key of action.
set_alpha_value(value)[source]

setter Learning rate.

set_gamma_value(value)[source]

setter Gamma value.

set_q_df(value)[source]

setter

set_r_df(value)[source]

setter

set_t(value)[source]

setter Time.

t

getter Time.

update_q(state_key, action_key, reward_value, next_max_q)[source]

Update Q-Value.

Parameters:
  • state_key – The key of state.
  • action_key – The key of action.
  • reward_value – R-Value(Reward).
  • next_max_q – Maximum Q-Value.
update_state(state_key, action_key)[source]

Update state.

This method can be overrided for concreate usecases.

Parameters:
  • state_key – The key of state in self.t.
  • action_key – The key of action in self.t.
Returns:

The key of state in self.t+1.

visualize_learning_result(state_key)[source]

Visualize learning result. This method should be overrided for concreate usecases.

This method is called in last learning steps.

Parameters:state_key – The key of state in self.t.

pyqlearning.thompson_sampling module

class pyqlearning.thompson_sampling.ThompsonSampling(arm_id_list)[source]

Bases: object

Thompson Sampling.

pull(arm_id, success, failure)[source]

Pull arms.

Parameters:
  • arm_id – Arms master id.
  • success – The number of success.
  • failure – The number of failure.
recommend(limit=10)[source]

Listup arms and expected value.

Parameters:limit – Length of the list.
Returns:[Tuple(Arms master id, expected value)]

Module contents