pysummarization.vectorizablesentence package

Submodules

pysummarization.vectorizablesentence.encoder_decoder module

class pysummarization.vectorizablesentence.encoder_decoder.EncoderDecoder[source]

Bases: pysummarization.vectorizable_sentence.VectorizableSentence

Vectorize sentences by Encoder/Decoder based on LSTM.

controller

getter

get_controller()[source]

getter

learn(sentence_list, token_master_list, hidden_neuron_count=200, epochs=100, batch_size=100, learning_rate=1e-05, learning_attenuate_rate=0.1, attenuate_epoch=50, bptt_tau=8, weight_limit=0.5, dropout_rate=0.5, test_size_rate=0.3)[source]

Init.

Parameters:
  • sentence_list – The list of sentences.
  • token_master_list – Unique list of tokens.
  • hidden_neuron_count – The number of units in hidden layer.
  • epochs – Epochs of Mini-batch.
  • bath_size – Batch size of Mini-batch.
  • learning_rate – Learning rate.
  • learning_attenuate_rate – Attenuate the learning_rate by a factor of this value every attenuate_epoch.
  • attenuate_epoch – Attenuate the learning_rate by a factor of learning_attenuate_rate every attenuate_epoch. Additionally, in relation to regularization, this class constrains weight matrixes every attenuate_epoch.
  • bptt_tau – Refereed maxinum step t in Backpropagation Through Time(BPTT).
  • weight_limit – Regularization for weights matrix to repeat multiplying the weights matrix and 0.9 until $sum_{j=0}^{n}w_{ji}^2 < weight_limit$.
  • dropout_rate – The probability of dropout.
  • test_size_rate – Size of Test data set. If this value is 0, the
set_readonly(value)[source]

setter

vectorize(sentence_list)[source]

Tokenize token list.

Parameters:sentence_list

The list of tokenized sentences: [

[token, token, token, …], [token, token, token, …], [token, token, token, …],

]

Returns:
[
vector of token, vector of token, vector of token

]

Return type:np.ndarray

Module contents