pysummarization.abstractablesemantics package¶
Submodules¶
pysummarization.abstractablesemantics.enc_dec_ad module¶

class
pysummarization.abstractablesemantics.enc_dec_ad.
EncDecAD
(normal_prior_flag=False, encoder_decoder_controller=None, input_neuron_count=20, hidden_neuron_count=20, weight_limit=0.5, dropout_rate=0.5, pre_learning_epochs=1000, epochs=100, batch_size=20, learning_rate=1e05, learning_attenuate_rate=0.1, attenuate_epoch=50, seq_len=8, bptt_tau=8, test_size_rate=0.3, tol=0.0, tld=100.0)[source]¶ Bases:
pysummarization.abstractable_semantics.AbstractableSemantics
LSTMbased Encoder/Decoder scheme for Anomaly Detection (EncDecAD).
This library applies the EncoderDecoder scheme for Anomaly Detection (EncDecAD) to text summarizations by intuition. In this scheme, LSTMbased Encoder/Decoder or socalled the sequencetosequence(Seq2Seq) model learns to reconstruct normal timeseries behavior, and thereafter uses reconstruction error to detect anomalies.
Malhotra, P., et al. (2016) showed that EncDecAD paradigm is robust and can detect anomalies from predictable, unpredictable, periodic, aperiodic, and quasiperiodic timeseries. Further, they showed that the paradigm is able to detect anomalies from short timeseries (length as small as 30) as well as long timeseries (length as large as 500).
This library refers to the intuitive insight in relation to the use case of reconstruction error to detect anomalies above to apply the model to text summarization. As exemplified by Seq2Seq paradigm, document and sentence which contain tokens of text can be considered as timeseries features. The anomalies data detected by EncDecAD should have to express something about the text.
From the above analogy, this library introduces two conflicting intuitions. On the one hand, the anomalies data may catch observer’s eye from the viewpoints of rarity or amount of information as the indicator of natural language processing like TFIDF shows. On the other hand, the anomalies data may be ignorable noise as mere outlier.
In any case, this library deduces the function and potential of EncDecAD in text summarization is to draw the distinction of normal and anomaly texts and is to filter the one from the other.
Note that the model in this library and Malhotra, P., et al. (2016) are different in some respects from the relation with the specification of the Deep Learning library: [pydbm](https://github.com/chimera0/accelbraincode/tree/master/DeepLearningbymeansofDesignPattern). First, weight matrix of encoder and decoder is not shered. Second, it is possible to introduce regularization techniques which are not discussed in Malhotra, P., et al. (2016) such as the dropout, the gradient clipping, and limitation of weights. Third, the loss function for reconstruction error is not limited to the L2 norm.
References
 Malhotra, P., Ramakrishnan, A., Anand, G., Vig, L., Agarwal, P., & Shroff, G. (2016). LSTMbased encoderdecoder for multisensor anomaly detection. arXiv preprint arXiv:1607.00148.

encoder_decoder_controller
¶ getter

inference
(observed_arr)[source]¶ Infernece by the model.
Parameters: observed_arr – np.ndarray of observed data points. Returns: np.ndarray of inferenced feature points.

learn
(observed_arr, target_arr)[source]¶ Training the model.
Parameters:  observed_arr – np.ndarray of observed data points.
 target_arr – np.ndarray of target labeled data.

summarize
(test_arr, vectorizable_token, sentence_list, limit=5)[source]¶ Summarize input document.
Parameters:  test_arr – np.ndarray of observed data points..
 vectorizable_token – isa VectorizableToken.
 sentence_list – list of all sentences.
 limit – The number of selected abstract sentence.
Returns: np.ndarray of scores.
pysummarization.abstractablesemantics.re_seq_2_seq module¶

class
pysummarization.abstractablesemantics.re_seq_2_seq.
ReSeq2Seq
(margin_param=0.01, retrospective_lambda=0.5, retrospective_eta=0.5, encoder_decoder_controller=None, retrospective_encoder=None, input_neuron_count=20, hidden_neuron_count=20, weight_limit=0.5, dropout_rate=0.5, pre_learning_epochs=1000, epochs=100, batch_size=20, learning_rate=1e05, learning_attenuate_rate=0.1, attenuate_epoch=50, grad_clip_threshold=10000000000.0, seq_len=8, bptt_tau=8, test_size_rate=0.3, tol=0.0, tld=100.0)[source]¶ Bases:
pysummarization.abstractable_semantics.AbstractableSemantics
A retrospective sequencetosequence learning(reseq2seq).
The concept of the reseq2seq(Zhang, K. et al., 2018) provided inspiration to this library. This model is a new sequence learning model mainly in the field of Video Summarizations. “The key idea behind reseq2seq is to measure how well the machinegenerated summary is similar to the original video in an abstract semantic space” (Zhang, K. et al., 2018, p3).
The encoder of a seq2seq model observes the original video and output feature points which represents the semantic meaning of the observed data points. Then the feature points is observed by the decoder of this model. Additionally, in the reseq2seq model, the outputs of the decoder is propagated to a retrospective encoder, which infers feature points to represent the semantic meaning of the summary. “If the summary preserves the important and relevant information in the original video, then we should expect that the two embeddings are similar (e.g. in Euclidean distance)” (Zhang, K. et al., 2018, p3).
This library refers to this intuitive insight above to apply the model to text summarizations. Like videos, semantic feature representation based on representation learning of manifolds is also possible in text summarizations.
The intuition in the design of their loss function is also suggestive. “The intuition behind our modeling is that the outputs should convey the same amount of information as the inputs. For summarization, this is precisely the goal: a good summary should be such that after viewing the summary, users would get about the same amount of information as if they had viewed the original video” (Zhang, K. et al., 2018, p7).
But the model in this library and Zhang, K. et al.(2018) are different in some respects from the relation with the specification of the Deep Learning library: [pydbm](https://github.com/chimera0/accelbraincode/tree/master/DeepLearningbymeansofDesignPattern). First, Encoder/Decoder based on LSTM is not designed as a hierarchical structure. Second, it is possible to introduce regularization techniques which are not discussed in Zhang, K. et al.(2018) such as the dropout, the gradient clipping, and limitation of weights. Third, the regression loss function for matching summaries is simplified in terms of calculation efficiency in this library.
References
 Zhang, K., Grauman, K., & Sha, F. (2018). Retrospective Encoders for Video Summarization. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 383399).

back_propagation
(delta_arr)[source]¶ Back propagation.
Parameters: delta_output_arr – Delta. Returns: Tuple data.  decoder’s list of gradations,  encoder’s np.ndarray of Delta,  encoder’s list of gradations.

compute_retrospective_loss
()[source]¶ Compute retrospective loss.
Returns: The tuple data.  np.ndarray of delta.  np.ndarray of losses of each batch.  float of loss of all batch.

encoder_decoder_controller
¶ getter

inference
(observed_arr)[source]¶ Infernece by the model.
Parameters: observed_arr – np.ndarray of observed data points. Returns: np.ndarray of inferenced feature points.

learn
(observed_arr, target_arr)[source]¶ Training the model.
Parameters:  observed_arr – np.ndarray of observed data points.
 target_arr – np.ndarray of target labeled data.

learn_generated
(feature_generator)[source]¶ Learn features generated by FeatureGenerator.
Parameters: feature_generator – isa FeatureGenerator.

logs_arr
¶ getter

optimize
(re_encoder_grads_list, decoder_grads_list, encoder_grads_list, learning_rate, epoch)[source]¶ Back propagation.
Parameters:  re_encoder_grads_list – reencoder’s list of graduations.
 decoder_grads_list – decoder’s list of graduations.
 encoder_grads_list – encoder’s list of graduations.
 learning_rate – Learning rate.
 epoch – Now epoch.

retrospective_encoder
¶ getter

summarize
(test_arr, vectorizable_token, sentence_list, limit=5)[source]¶ Summarize input document.
Parameters:  test_arr – np.ndarray of observed data points..
 vectorizable_token – isa VectorizableToken.
 sentence_list – list of all sentences.
 limit – The number of selected abstract sentence.
Returns: list of str of abstract sentences.