pysummarization.vectorizabletoken package¶
Subpackages¶
Submodules¶
pysummarization.vectorizabletoken.dbm_like_skip_gram_vectorizer module¶
-
class
pysummarization.vectorizabletoken.dbm_like_skip_gram_vectorizer.
DBMLikeSkipGramVectorizer
(token_list, document_list=[], traning_count=100, batch_size=20, learning_rate=1e-05, feature_dim=100)[source]¶ Bases:
pysummarization.vectorizable_token.VectorizableToken
Vectorize token by Deep Bolzmann Machine(DBM).
Note that this class employs an original method based on this library-specific intuition and analogy about skip-gram, where by n-grams are still stored to model language, but they allow for tokens to be skipped.
-
convert_tokens_into_matrix
(token_list)[source]¶ Create matrix of sentences.
Parameters: token_list – The list of tokens. Returns: 2-D np.ndarray of sentences. Each row means one hot vectors of one sentence.
-
token_arr
¶ getter
-
token_list
¶ getter
-
pysummarization.vectorizabletoken.encoder_decoder module¶
-
class
pysummarization.vectorizabletoken.encoder_decoder.
EncoderDecoder
[source]¶ Bases:
pysummarization.vectorizable_token.VectorizableToken
Vectorize tokens by Encoder/Decoder based on LSTM.
This library provides Encoder/Decoder based on LSTM, which is a reconstruction model and makes it possible to extract series features embedded in deeper layers. The LSTM encoder learns a fixed length vector of time-series observed data points and the LSTM decoder uses this representation to reconstruct the time-series using the current hidden state and the value inferenced at the previous time-step.
References
- https://github.com/chimera0/accel-brain-code/blob/master/Deep-Learning-by-means-of-Design-Pattern/demo/demo_sine_wave_prediction_by_LSTM_encoder_decoder.ipynb
- https://github.com/chimera0/accel-brain-code/blob/master/Deep-Learning-by-means-of-Design-Pattern/demo/demo_anomaly_detection_by_enc_dec_ad.ipynb
- Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
- Malhotra, P., Ramakrishnan, A., Anand, G., Vig, L., Agarwal, P., & Shroff, G. (2016). LSTM-based encoder-decoder for multi-sensor anomaly detection. arXiv preprint arXiv:1607.00148.
-
controller
¶ getter
-
learn
(sentence_list, token_master_list, hidden_neuron_count=200, epochs=100, batch_size=100, learning_rate=1e-05, learning_attenuate_rate=0.1, attenuate_epoch=50, bptt_tau=8, weight_limit=0.5, dropout_rate=0.5, test_size_rate=0.3)[source]¶ Init.
Parameters: - sentence_list – The list of tokenized sentences. [[token, token, token, …], [token, token, token, …], [token, token, token, …]]
- token_master_list – Unique list of tokens.
- hidden_neuron_count – The number of units in hidden layer.
- epochs – Epochs of Mini-batch.
- batch_size – Batch size of Mini-batch.
- learning_rate – Learning rate.
- learning_attenuate_rate – Attenuate the learning_rate by a factor of this value every attenuate_epoch.
- attenuate_epoch – Attenuate the learning_rate by a factor of learning_attenuate_rate every attenuate_epoch. Additionally, in relation to regularization, this class constrains weight matrixes every attenuate_epoch.
- bptt_tau – Refereed maxinum step t in Backpropagation Through Time(BPTT).
- weight_limit – Regularization for weights matrix to repeat multiplying the weights matrix and 0.9 until $sum_{j=0}^{n}w_{ji}^2 < weight_limit$.
- dropout_rate – The probability of dropout.
- test_size_rate – Size of Test data set. If this value is 0, the
pysummarization.vectorizabletoken.skip_gram_vectorizer module¶
-
class
pysummarization.vectorizabletoken.skip_gram_vectorizer.
SkipGramVectorizer
(token_list, epochs=300, skip_n=1, batch_size=50, feature_dim=20, scale=1e-05, learning_rate=1e-05, auto_encoder=None)[source]¶ Bases:
pysummarization.vectorizable_token.VectorizableToken
Vectorize token by skip-gram.
-
auto_encoder
¶ getter
-
convert_tokens_into_matrix
(token_list)[source]¶ Create matrix of sentences.
Parameters: token_list – The list of tokens. Returns: 2-D np.ndarray of sentences. Each row means one hot vectors of one sentence.
-
token_arr
¶ getter
-
token_list
¶ getter
-
pysummarization.vectorizabletoken.t_hot_vectorizer module¶
-
class
pysummarization.vectorizabletoken.t_hot_vectorizer.
THotVectorizer
(token_list)[source]¶ Bases:
pysummarization.vectorizable_token.VectorizableToken
Vectorize token by t-hot Vectorizer.
-
convert_tokens_into_matrix
(token_list)[source]¶ Create matrix of sentences.
Parameters: token_list – The list of tokens. Returns: 2-D np.ndarray of sentences. Each row means one hot vectors of one sentence.
-
token_arr
¶ getter
-
pysummarization.vectorizabletoken.tfidf_vectorizer module¶
-
class
pysummarization.vectorizabletoken.tfidf_vectorizer.
TfidfVectorizer
(token_list_list)[source]¶ Bases:
pysummarization.vectorizable_token.VectorizableToken
Vectorize token.