pysummarization.vectorizablesentence package¶

Submodules¶

pysummarization.vectorizablesentence.encoder_decoder module¶

class pysummarization.vectorizablesentence.encoder_decoder.EncoderDecoder[source]¶

Bases: pysummarization.vectorizable_sentence.VectorizableSentence

Vectorize sentences by Encoder/Decoder based on LSTM.

This library provides Encoder/Decoder based on LSTM, which is a reconstruction model and makes it possible to extract series features embedded in deeper layers. The LSTM encoder learns a fixed length vector of time-series observed data points and the LSTM decoder uses this representation to reconstruct the time-series using the current hidden state and the value inferenced at the previous time-step.

References

https://github.com/chimera0/accel-brain-code/blob/master/Deep-Learning-by-means-of-Design-Pattern/demo/demo_sine_wave_prediction_by_LSTM_encoder_decoder.ipynb
https://github.com/chimera0/accel-brain-code/blob/master/Deep-Learning-by-means-of-Design-Pattern/demo/demo_anomaly_detection_by_enc_dec_ad.ipynb
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
Malhotra, P., Ramakrishnan, A., Anand, G., Vig, L., Agarwal, P., & Shroff, G. (2016). LSTM-based encoder-decoder for multi-sensor anomaly detection. arXiv preprint arXiv:1607.00148.

controller¶: getter

get_controller()[source]¶: getter

learn(sentence_list, token_master_list, hidden_neuron_count=200, epochs=100, batch_size=100, learning_rate=1e-05, learning_attenuate_rate=0.1, attenuate_epoch=50, weight_limit=0.5, dropout_rate=0.5, test_size_rate=0.3)[source]¶

Init.

Parameters:

sentence_list – The list of tokenized sentences. [[token, token, token, …], [token, token, token, …], [token, token, token, …]]
token_master_list – Unique list of tokens.
hidden_neuron_count – The number of units in hidden layer.
epochs – Epochs of Mini-batch.
batch_size – Batch size of Mini-batch.
learning_rate – Learning rate.
learning_attenuate_rate – Attenuate the learning_rate by a factor of this value every attenuate_epoch.
attenuate_epoch – Attenuate the learning_rate by a factor of learning_attenuate_rate every attenuate_epoch. Additionally, in relation to regularization, this class constrains weight matrixes every attenuate_epoch.
weight_limit – Regularization for weights matrix to repeat multiplying the weights matrix and 0.9 until $sum_{j=0}^{n}w_{ji}^2 < weight_limit$.
dropout_rate – The probability of dropout.
test_size_rate – Size of Test data set. If this value is 0, the

set_readonly(value)[source]¶: setter

vectorize(sentence_list)[source]¶

Parameters:	sentence_list – The list of tokenized sentences. [[token, token, token, …], [token, token, token, …], [token, token, token, …]]
Returns:	np.ndarray of tokens. [vector of token, vector of token, vector of token]

pysummarization.vectorizablesentence.lstm_rtrbm module¶

class pysummarization.vectorizablesentence.lstm_rtrbm.LSTMRTRBM[source]¶

Bases: pysummarization.vectorizable_sentence.VectorizableSentence

Vectorize sentences by LSTM-RTRBM.

LSTM-RTRBM model integrates the ability of LSTM in memorizing and retrieving useful history information, together with the advantage of RBM in high dimensional data modelling(Lyu, Q., Wu, Z., Zhu, J., & Meng, H. 2015, June). Like RTRBM, LSTM-RTRBM also has the recurrent hidden units.

References

Boulanger-Lewandowski, N., Bengio, Y., & Vincent, P. (2012). Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription. arXiv preprint arXiv:1206.6392.
Lyu, Q., Wu, Z., Zhu, J., & Meng, H. (2015, June). Modelling High-Dimensional Sequences with LSTM-RTRBM: Application to Polyphonic Music Generation. In IJCAI (pp. 4138-4139).
Lyu, Q., Wu, Z., & Zhu, J. (2015, October). Polyphonic music modelling with LSTM-RTRBM. In Proceedings of the 23rd ACM international conference on Multimedia (pp. 991-994). ACM.
Sutskever, I., Hinton, G. E., & Taylor, G. W. (2009). The recurrent temporal restricted boltzmann machine. In Advances in Neural Information Processing Systems (pp. 1601-1608).

learn(sentence_list, token_master_list, hidden_neuron_count=1000, training_count=1, batch_size=100, learning_rate=0.001, seq_len=5)[source]¶

Init.

Parameters:	sentence_list – The list of sentences. token_master_list – Unique list of tokens. hidden_neuron_count – The number of units in hidden layer. training_count – The number of training. bath_size – Batch size of Mini-batch. learning_rate – Learning rate. seq_len – The length of one sequence.

vectorize(sentence_list)[source]¶

Parameters:	sentence_list – The list of tokenized sentences. [[token, token, token, …], [token, token, token, …], [token, token, token, …]]
Returns:	np.ndarray of tokens. [vector of token, vector of token, vector of token]

Previous topic

Next topic

pysummarization.vectorizablesentence package¶

Submodules¶

pysummarization.vectorizablesentence.encoder_decoder module¶

pysummarization.vectorizablesentence.lstm_rtrbm module¶

Module contents¶