pyqlearning.functionapproximator package¶
Submodules¶
pyqlearning.functionapproximator.cnn_fa module¶

class
pyqlearning.functionapproximator.cnn_fa.
CNNFA
(batch_size, layerable_cnn_list, learning_rate=1e05, computable_loss=None, opt_params=None, verificatable_result=None, pre_learned_path_list=None, fc_w_arr=None, fc_activation_function=None, verbose_mode=False)[source]¶ Bases:
pyqlearning.function_approximator.FunctionApproximator
Convolutional Neural Networks(CNNs) as a Function Approximator.
CNNs are hierarchical models whose convolutional layers alternate with subsampling layers, reminiscent of simple and complex cells in the primary visual cortex.
This class demonstrates that a CNNs can solve generalisation problems to learn successful control policies from observed data points in complex Reinforcement Learning environments. The network is trained with a variant of the Qlearning algorithm, with stochastic gradient descent to update the weights.
The Deconvolution also called transposed convolutions “work by swapping the forward and backward passes of a convolution.” (Dumoulin, V., & Visin, F. 2016, p20.)
References
 Dumoulin, V., & V,kisin, F. (2016). A guide to convolution arithmetic for deep learning. arXiv preprint arXiv:1603.07285.
 Masci, J., Meier, U., Cireşan, D., & Schmidhuber, J. (2011, June). Stacked convolutional autoencoders for hierarchical feature extraction. In International Conference on Artificial Neural Networks (pp. 5259). Springer, Berlin, Heidelberg.
 Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.

inference_q
(next_action_arr)[source]¶ Infernce QValue.
Parameters: next_action_arr – np.ndarray of action. Returns: np.ndarray of QValues.

learn_q
(q, new_q)[source]¶ Infernce QValue.
Parameters:  q – Predicted QValue.
 new_q – Real QValue.

q_logs_list
¶ getter
pyqlearning.functionapproximator.convolutional_lstm_fa module¶

class
pyqlearning.functionapproximator.convolutional_lstm_fa.
ConvolutionalLSTMFA
(batch_size, conv_lstm_model, seq_len=10, learning_rate=1e05, computable_loss=None, opt_params=None, verificatable_result=None, pre_learned_path_list=None, verbose_mode=False)[source]¶ Bases:
pyqlearning.function_approximator.FunctionApproximator
Convolutional LSTM Networks as a Function Approximator, which is a model that structurally couples convolution operators to LSTM networks, can be utilized as components in constructing the Function Approximator.
References
 Sainath, T. N., Vinyals, O., Senior, A., & Sak, H. (2015, April). Convolutional, long shortterm memory, fully connected deep neural networks. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on (pp. 45804584). IEEE.
 Xingjian, S. H. I., Chen, Z., Wang, H., Yeung, D. Y., Wong, W. K., & Woo, W. C. (2015). Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Advances in neural information processing systems (pp. 802810).

inference_q
(next_action_arr)[source]¶ Infernce QValue.
Parameters: next_action_arr – np.ndarray of action. Returns: np.ndarray of QValues.

learn_q
(q, new_q)[source]¶ Infernce QValue.
Parameters:  q – Predicted QValue.
 new_q – Real QValue.

q_logs_list
¶ getter
pyqlearning.functionapproximator.convolutional_lstm_fc_fa module¶

class
pyqlearning.functionapproximator.convolutional_lstm_fc_fa.
ConvolutionalLSTMFCFA
(batch_size, layerable_cnn_list, lstm_model, seq_len=10, learning_rate=1e05, computable_loss=None, opt_params=None, verificatable_result=None, pre_learned_path_list=None, verbose_mode=False)[source]¶ Bases:
pyqlearning.function_approximator.FunctionApproximator
Convolutional LSTM Networks as a Function Approximator like CLDNN Architecture(Sainath, T. N, et al., 2015).
This is a model of the function approximator which loosely coupled CNN and LSTM. Like CLDNN Architecture(Sainath, T. N, et al., 2015), this model uses CNNs to reduce the spectral variation of the input feature of rewards, and then passes this to LSTM layers to perform temporal modeling, and finally outputs this to DNN layers, which produces a feature representation of QValues that is more easily separable.
References
 https://code.accelbrain.com/DeepLearningbymeansofDesignPattern/pydbm.cnn.html
 Sainath, T. N., Vinyals, O., Senior, A., & Sak, H. (2015, April). Convolutional, long shortterm memory, fully connected deep neural networks. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on (pp. 45804584). IEEE.

inference_q
(next_action_arr)[source]¶ Infernce QValue.
Parameters: next_action_arr – np.ndarray of action. Returns: np.ndarray of QValues.

learn_q
(q, new_q)[source]¶ Infernce QValue.
Parameters:  q – Predicted QValue.
 new_q – Real QValue.

q_logs_list
¶ getter
pyqlearning.functionapproximator.lstm_fa module¶

class
pyqlearning.functionapproximator.lstm_fa.
LSTMFA
(batch_size, lstm_model, seq_len=10, learning_rate=1e05, computable_loss=None, opt_params=None, verificatable_result=None, pre_learned_path_list=None, verbose_mode=False)[source]¶ Bases:
pyqlearning.function_approximator.FunctionApproximator
LSTM Networks as a Function Approximator.
Originally, Long ShortTerm Memory(LSTM) networks as a special RNN structure has proven stable and powerful for modeling longrange dependencies.
The Key point of structural expansion is its memory cell which essentially acts as an accumulator of the state information. Every time observed data points are given as new information and input to LSTM’s input gate, its information will be accumulated to the cell if the input gate is activated. The past state of cell could be forgotten in this process if LSTM’s forget gate is on. Whether the latest cell output will be propagated to the final state is further controlled by the output gate.
References
 Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoderdecoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
 Malhotra, P., Ramakrishnan, A., Anand, G., Vig, L., Agarwal, P., & Shroff, G. (2016). LSTMbased encoderdecoder for multisensor anomaly detection. arXiv preprint arXiv:1607.00148.
 Zaremba, W., Sutskever, I., & Vinyals, O. (2014). Recurrent neural network regularization. arXiv preprint arXiv:1409.2329.

inference_q
(next_action_arr)[source]¶ Infernce QValue.
Parameters: next_action_arr – np.ndarray of action. Returns: np.ndarray of QValues.

learn_q
(q, new_q)[source]¶ Infernce QValue.
Parameters:  q – Predicted QValue.
 new_q – Real QValue.

q_logs_list
¶ getter