pysummarization.similarityfilter package

Submodules

pysummarization.similarityfilter.dice module

class pysummarization.similarityfilter.dice.Dice[source]

Bases: pysummarization.similarity_filter.SimilarityFilter

Concrete class for filtering mutually similar sentences.

calculate(token_list_x, token_list_y)[source]

Calculate similarity with the Dice coefficient.

Concrete method.

Parameters:
  • token_list_x – [token, token, token, …]
  • token_list_y – [token, token, token, …]
Returns:

Similarity.

pysummarization.similarityfilter.encoder_decoder_clustering module

class pysummarization.similarityfilter.encoder_decoder_clustering.EncoderDecoderClustering(document=None, tokenizable_doc=None, hidden_neuron_count=200, epochs=100, batch_size=100, learning_rate=1e-05, learning_attenuate_rate=0.1, attenuate_epoch=50, bptt_tau=8, weight_limit=0.5, dropout_rate=0.5, test_size_rate=0.3, cluster_num=10, max_iter=100, debug_mode=False)[source]

Bases: pysummarization.similarity_filter.SimilarityFilter

Concrete class for filtering mutually similar sentences.

calculate(token_list_x, token_list_y)[source]

Calculate similarity with the so-called Cosine similarity of Tf-Idf vectors.

Concrete method.

Parameters:
  • token_list_x – [token, token, token, …]
  • token_list_y – [token, token, token, …]
Returns:

Similarity.

get_labeled_arr()[source]

getter

get_sentence_list()[source]

getter

labeled_arr

getter

learn(document, tokenizable_doc=None, hidden_neuron_count=200, epochs=100, batch_size=100, learning_rate=1e-05, learning_attenuate_rate=0.1, attenuate_epoch=50, bptt_tau=8, weight_limit=0.5, dropout_rate=0.5, test_size_rate=0.3, cluster_num=10, max_iter=100)[source]

Learning.

Parameters:
  • document – String of document.
  • tokenizable_doc – is-a TokenizableDoc.
  • hidden_neuron_count – The number of units in hidden layer.
  • epochs – Epochs of Mini-batch.
  • bath_size – Batch size of Mini-batch.
  • learning_rate – Learning rate.
  • learning_attenuate_rate – Attenuate the learning_rate by a factor of this value every attenuate_epoch.
  • attenuate_epoch – Attenuate the learning_rate by a factor of learning_attenuate_rate every attenuate_epoch. Additionally, in relation to regularization, this class constrains weight matrixes every attenuate_epoch.
  • bptt_tau – Refereed maxinum step t in Backpropagation Through Time(BPTT).
  • weight_limit – Regularization for weights matrix to repeat multiplying the weights matrix and 0.9 until $sum_{j=0}^{n}w_{ji}^2 < weight_limit$.
  • dropout_rate – The probability of dropout.
  • test_size_rate – Size of Test data set. If this value is 0, the
  • cluster_num – The number of clusters.
  • max_iter – Maximum number of iterations.
sentence_list

getter

set_readonly(value)[source]

setter

pysummarization.similarityfilter.encoder_decoder_cosine module

class pysummarization.similarityfilter.encoder_decoder_cosine.EncoderDecoderCosine(document, tokenizable_doc=None, hidden_neuron_count=200, epochs=100, batch_size=100, learning_rate=1e-05, learning_attenuate_rate=0.1, attenuate_epoch=50, bptt_tau=8, weight_limit=0.5, dropout_rate=0.5, test_size_rate=0.3, debug_mode=False)[source]

Bases: pysummarization.similarity_filter.SimilarityFilter

Concrete class for filtering mutually similar sentences.

calculate(token_list_x, token_list_y)[source]

Calculate similarity with the so-called Cosine similarity of Tf-Idf vectors.

Concrete method.

Parameters:
  • token_list_x – [token, token, token, …]
  • token_list_y – [token, token, token, …]
Returns:

Similarity.

pysummarization.similarityfilter.jaccard module

class pysummarization.similarityfilter.jaccard.Jaccard[source]

Bases: pysummarization.similarity_filter.SimilarityFilter

Concrete class for filtering mutually similar sentences.

calculate(token_list_x, token_list_y)[source]

Calculate similarity with the Jaccard coefficient.

Concrete method.

Parameters:
  • token_list_x – [token, token, token, …]
  • token_list_y – [token, token, token, …]
Returns:

Similarity.

pysummarization.similarityfilter.lstm_rtrbm_clustering module

class pysummarization.similarityfilter.lstm_rtrbm_clustering.LSTMRTRBMClustering(document=None, tokenizable_doc=None, hidden_neuron_count=1000, training_count=1, batch_size=10, learning_rate=0.001, seq_len=5, cluster_num=10, max_iter=100, debug_mode=False)[source]

Bases: pysummarization.similarity_filter.SimilarityFilter

Concrete class for filtering mutually similar sentences.

calculate(token_list_x, token_list_y)[source]

Check whether token_list_x and token_list_y belonging to the same cluster, and if so, this method returns 1.0, if not, returns 0.0.

Concrete method.

Parameters:
  • token_list_x – [token, token, token, …]
  • token_list_y – [token, token, token, …]
Returns:

0.0 or 1.0.

get_labeled_arr()[source]

getter

get_sentence_list()[source]

getter

labeled_arr

getter

learn(document, tokenizable_doc, hidden_neuron_count=1000, training_count=1, batch_size=10, learning_rate=0.001, seq_len=5, cluster_num=10, max_iter=100)[source]

Learning.

Parameters:
  • document – String of document.
  • tokenizable_doc – is-a TokenizableDoc.
  • hidden_neuron_count – The number of units in hidden layer.
  • training_count – The number of training.
  • bath_size – Batch size of Mini-batch.
  • learning_rate – Learning rate.
  • seq_len – The length of one sequence.
  • cluster_num – The number of clusters.
  • max_iter – Maximum number of iterations.
sentence_list

getter

set_readonly(value)[source]

setter

pysummarization.similarityfilter.lstm_rtrbm_cosine module

class pysummarization.similarityfilter.lstm_rtrbm_cosine.LSTMRTRBMCosine(document, tokenizable_doc=None, hidden_neuron_count=1000, training_count=1, batch_size=10, learning_rate=0.001, seq_len=5, debug_mode=False)[source]

Bases: pysummarization.similarity_filter.SimilarityFilter

Concrete class for filtering mutually similar sentences.

calculate(token_list_x, token_list_y)[source]

Calculate similarity with the so-called Cosine similarity of Tf-Idf vectors.

Concrete method.

Parameters:
  • token_list_x – [token, token, token, …]
  • token_list_y – [token, token, token, …]
Returns:

Similarity.

pysummarization.similarityfilter.simpson module

class pysummarization.similarityfilter.simpson.Simpson[source]

Bases: pysummarization.similarity_filter.SimilarityFilter

Concrete class for filtering mutually similar sentences.

calculate(token_list_x, token_list_y)[source]

Calculate similarity with the Simpson coefficient.

Concrete method.

Parameters:
  • token_list_x – [token, token, token, …]
  • token_list_y – [token, token, token, …]
Returns:

Similarity.

pysummarization.similarityfilter.tanimoto module

class pysummarization.similarityfilter.tanimoto.Tanimoto[source]

Bases: pysummarization.similarity_filter.SimilarityFilter

Concrete class for filtering mutually similar sentences.

calculate(token_list_x, token_list_y)[source]

Calculate similarity with the Tanimoto coefficient.

Concrete method.

Parameters:
  • token_list_x – [token, token, token, …]
  • token_list_y – [token, token, token, …]
Returns:

Similarity.

pysummarization.similarityfilter.tfidf_cosine module

class pysummarization.similarityfilter.tfidf_cosine.TfIdfCosine[source]

Bases: pysummarization.similarity_filter.SimilarityFilter

Concrete class for filtering mutually similar sentences.

calculate(token_list_x, token_list_y)[source]

Calculate similarity with the so-called Cosine similarity of Tf-Idf vectors.

Concrete method.

Parameters:
  • token_list_x – [token, token, token, …]
  • token_list_y – [token, token, token, …]
Returns:

Similarity.

Module contents