accelbrainbase.controllablemodel._mxnet package¶

Subpackages¶

Submodules¶

accelbrainbase.controllablemodel._mxnet.dql_controller module¶

class accelbrainbase.controllablemodel._mxnet.dql_controller.DQLController(function_approximator, policy_sampler, computable_loss, optimizer_name='SGD', learning_rate=1e-05, learning_attenuate_rate=1.0, attenuate_epoch=50, hybridize_flag=True, scale=1.0, ctx=gpu(0), initializer=None, recursive_learning_flag=False, **kwargs)¶

Bases: mxnet.gluon.block.HybridBlock, accelbrainbase.controllable_model.ControllableModel

Abstract base class to implement the Deep Q-Learning.

The structure of Q-Learning is based on the Epsilon Greedy Q-Leanring algorithm, which is a typical off-policy algorithm. In this paradigm, stochastic searching and deterministic searching can coexist by hyperparameter epsilon_greedy_rate that is probability that agent searches greedy. Greedy searching is deterministic in the sensethat policy of agent follows the selection that maximizes the Q-Value.

References

https://code.accel-brain.com/Reinforcement-Learning/README.html#deep-q-network
Egorov, M. (2016). Multi-agent deep reinforcement learning.(URL: https://pdfs.semanticscholar.org/dd98/9d94613f439c05725bad958929357e365084.pdf)
Gupta, J. K., Egorov, M., & Kochenderfer, M. (2017, May). Cooperative multi-agent control using deep reinforcement learning. In International Conference on Autonomous Agents and Multiagent Systems (pp. 66-83). Springer, Cham.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.

alpha_value¶: getter Learning rate.

collect_params(select=None)¶: Overrided collect_params in mxnet.gluon.HybridBlok.

computable_loss¶: getter for ComputableLoss.

extract_learned_dict()¶

Extract (pre-) learned parameters.

Returns:	dict of the parameters.

function_approximator¶: getter for FunctionApproximator

gamma_value¶: getter Gamma value.

get_alpha_value()¶: getter Learning rate.

get_computable_loss()¶: getter for ComputableLoss.

get_function_approximator()¶: getter for FunctionApproximator

get_gamma_value()¶: getter Gamma value.

get_init_deferred_flag()¶: getter for bool that means initialization in this class will be deferred or not.

get_policy_sampler()¶: getter for PolicySampler

get_q_logs_arr()¶: getter

inference(iter_n=100)¶

Inference.

Parameters:	iter_n – int of the number of training iterations.
Returns:	list of logs of states.

init_deferred_flag¶: getter for bool that means initialization in this class will be deferred or not.

learn(iter_n=100)¶

Learning.

Parameters:	iter_n – int of the number of training iterations.

load_parameters(filename, ctx=None, allow_missing=False, ignore_extra=False)¶

Load parameters to files.

Parameters:	filename – File name. ctx – mx.cpu() or mx.gpu(). allow_missing – bool of whether to silently skip loading parameters not represents in the file. ignore_extra – bool of whether to silently ignre parameters from the file that are not present in this Block.

policy_sampler¶: getter for PolicySampler

q_logs_arr¶: getter

save_parameters(filename)¶

Save parameters to files.

Parameters:	filename – File name.

select_action(possible_action_arr, possible_predicted_q_arr, possible_reward_value_arr, possible_meta_data_arr=None)¶

Select action by Q(state, action).

Parameters:	possible_action_arr – Tensor of actions. possible_predicted_q_arr – Tensor of Q-Values. possible_reward_value_arr – Tensor of reward values. possible_meta_data_arr – Meta data of the actions.

Retruns:: Tuple(np.ndarray of action., Q-Value)

set_alpha_value(value)¶: setter Learning rate.

set_computable_loss(value)¶: setter for ComputableLoss.

set_function_approximator(value)¶: setter for FunctionApproximator

set_gamma_value(value)¶: setter Gamma value.

set_init_deferred_flag(value)¶: setter for bool that means initialization in this class will be deferred or not.

set_policy_sampler(value)¶: setter for PolicySampler

set_q_logs_arr(values)¶: setter

set_readonly(value)¶: setter

update_q(reward_value_arr, next_max_q_arr)¶

Update Q.

Parameters:	reward_value_arr – np.ndarray of reward values. next_max_q_arr – np.ndarray of maximum Q-Values in next time step.
Returns:	np.ndarray of real Q-Values.

accelbrainbase.controllablemodel._mxnet.gan_controller module¶

class accelbrainbase.controllablemodel._mxnet.gan_controller.GANController(true_sampler, generative_model, discriminative_model, generator_loss, discriminator_loss, feature_matching_loss=None, optimizer_name='SGD', learning_rate=1e-05, learning_attenuate_rate=1.0, attenuate_epoch=50, hybridize_flag=True, scale=1.0, ctx=gpu(0), initializer=None, **kwargs)¶

Bases: mxnet.gluon.block.HybridBlock, accelbrainbase.controllable_model.ControllableModel

The Generative Adversarial Networks(GANs).

The Generative Adversarial Networks(GANs) (Goodfellow et al., 2014) framework establishes a min-max adversarial game between two neural networks – a generative model, G, and a discriminative model, D. The discriminator model, D(x), is a neural network that computes the probability that a observed data point x in data space is a sample from the data distribution (positive samples) that we are trying to model, rather than a sample from our generative model (negative samples).

Concurrently, the generator uses a function G(z) that maps samples z from the prior p(z) to the data space. G(z) is trained to maximally confuse the discriminator into believing that samples it generates come from the data distribution. The generator is trained by leveraging the gradient of D(x) w.r.t. x, and using that to modify its parameters.

The Conditional GANs (or cGANs) is a simple extension of the basic GAN model which allows the model to condition on external information. This makes it possible to engage the learned generative model in different “modes” by providing it with different contextual information (Gauthier, J. 2014).

This model can be constructed by simply feeding the data, y, to condition on to both the generator and discriminator. In an unconditioned generative model, because the maps samples z from the prior p(z) are drawn from uniform or normal distribution, there is no control on modes of the data being generated. On the other hand, it is possible to direct the data generation process by conditioning the model on additional information (Mirza, M., & Osindero, S. 2014).

References

Gauthier, J. (2014). Conditional generative adversarial nets for convolutional face generation. Class Project for Stanford CS231N: Convolutional Neural Networks for Visual Recognition, Winter semester, 2014(5), 2.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672-2680).
Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., & Frey, B. (2015). Adversarial autoencoders. arXiv preprint arXiv:1511.05644.
Mirza, M., & Osindero, S. (2014). Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784.
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. (2016). Improved techniques for training gans. In Advances in neural information processing systems (pp. 2234-2242).
Zhao, J., Mathieu, M., & LeCun, Y. (2016). Energy-based generative adversarial network. arXiv preprint arXiv:1609.03126.
Warde-Farley, D., & Bengio, Y. (2016). Improving generative adversarial networks with denoising feature matching.

collect_params(select=None)¶: Overrided collect_params in mxnet.gluon.HybridBlok.

discriminative_loss_arr¶: getter for Generator’s losses.

discriminative_model¶: getter for DiscriminativeModel.

discriminator_loss¶: getter for DiscriminatorLoss.

extract_learned_dict()¶

Extract (pre-) learned parameters.

Returns:	dict of the parameters.

feature_matching_loss¶: getter for FeatureMatchingLoss.

feature_matching_loss_arr¶: getter for logs of posteriors.

generative_loss_arr¶: getter for Generator’s losses.

generative_model¶: getter for GenerativeModel.

generator_loss¶: getter for GeneratorLoss.

get_discriminative_loss_arr()¶: getter for Generator’s losses.

get_discriminative_model()¶: getter for DiscriminativeModel.

get_discriminator_loss()¶: getter for DiscriminatorLoss.

get_feature_matching_loss()¶: getter for FeatureMatchingLoss.

get_feature_matching_loss_arr()¶: getter for logs of posteriors.

get_generative_loss_arr()¶: getter for Generator’s losses.

get_generative_model()¶: getter for GenerativeModel.

get_generator_loss()¶: getter for GeneratorLoss.

get_init_deferred_flag()¶: getter for bool that means initialization in this class will be deferred or not.

get_posterior_logs_arr()¶: getter for logs of posteriors.

get_true_sampler()¶: getter for TrueSampler.

init_deferred_flag¶: getter for bool that means initialization in this class will be deferred or not.

learn(iter_n=1000, k_step=10)¶

Learning.

Parameters:	iter_n – int of the number of training iterations. k_step – int of the number of learning of the discriminative_model.

load_parameters(filename, ctx=None, allow_missing=False, ignore_extra=False)¶

Load parameters to files.

Parameters:	filename – File name. ctx – mx.cpu() or mx.gpu(). allow_missing – bool of whether to silently skip loading parameters not represents in the file. ignore_extra – bool of whether to silently ignre parameters from the file that are not present in this Block.

posterior_logs_arr¶: getter for logs of posteriors.

save_parameters(filename)¶

Save parameters to files.

Parameters:	filename – File name.

set_discriminative_model(value)¶: getter for DiscriminativeModel.

set_discriminator_loss(value)¶: getter for DiscriminatorLoss.

set_feature_matching_loss(value)¶: setter for FeatureMatchingLoss.

set_generative_model(value)¶: getter for GenerativeModel.

set_generator_loss(value)¶: setter for GeneratorLoss.

set_init_deferred_flag(value)¶: setter for bool that means initialization in this class will be deferred or not.

set_readonly(value)¶: setter

set_true_sampler(value)¶: setter for TrueSampler.

train_by_feature_matching(k_step)¶

train_discriminator(k_step)¶

Training for discriminator.

Parameters:	k_step – int of the number of learning of the discriminative_model.
Returns:	Tuple data. - discriminative loss. - discriminative posterior.

train_generator()¶

Train generator.

Returns:	Tuple data. - generative loss. - discriminative posterior.

true_sampler¶: getter for TrueSampler.

Table Of Contents

Previous topic

Next topic

This Page

accelbrainbase.controllablemodel._mxnet package¶

Subpackages¶

Submodules¶

accelbrainbase.controllablemodel._mxnet.dql_controller module¶

accelbrainbase.controllablemodel._mxnet.gan_controller module¶

Module contents¶