pysummarization.abstractablesemantics package

Submodules

pysummarization.abstractablesemantics.enc_dec_ad module

class pysummarization.abstractablesemantics.enc_dec_ad.EncDecAD(normal_prior_flag=False, encoder_decoder_controller=None, input_neuron_count=20, hidden_neuron_count=20, weight_limit=10000000000.0, dropout_rate=0.5, pre_learning_epochs=1000, epochs=100, batch_size=20, learning_rate=1e-05, learning_attenuate_rate=1.0, attenuate_epoch=50, seq_len=8, bptt_tau=8, test_size_rate=0.3, tol=0.0, tld=100.0)[source]

Bases: pysummarization.abstractable_semantics.AbstractableSemantics

LSTM-based Encoder/Decoder scheme for Anomaly Detection (EncDec-AD).

This library applies the Encoder-Decoder scheme for Anomaly Detection (EncDec-AD) to text summarizations by intuition. In this scheme, LSTM-based Encoder/Decoder or so-called the sequence-to-sequence(Seq2Seq) model learns to reconstruct normal time-series behavior, and thereafter uses reconstruction error to detect anomalies.

Malhotra, P., et al. (2016) showed that EncDecAD paradigm is robust and can detect anomalies from predictable, unpredictable, periodic, aperiodic, and quasi-periodic time-series. Further, they showed that the paradigm is able to detect anomalies from short time-series (length as small as 30) as well as long time-series (length as large as 500).

This library refers to the intuitive insight in relation to the use case of reconstruction error to detect anomalies above to apply the model to text summarization. As exemplified by Seq2Seq paradigm, document and sentence which contain tokens of text can be considered as time-series features. The anomalies data detected by EncDec-AD should have to express something about the text.

From the above analogy, this library introduces two conflicting intuitions. On the one hand, the anomalies data may catch observer’s eye from the viewpoints of rarity or amount of information as the indicator of natural language processing like TF-IDF shows. On the other hand, the anomalies data may be ignorable noise as mere outlier.

In any case, this library deduces the function and potential of EncDec-AD in text summarization is to draw the distinction of normal and anomaly texts and is to filter the one from the other.

Note that the model in this library and Malhotra, P., et al. (2016) are different in some respects from the relation with the specification of the Deep Learning library: [pydbm](https://github.com/chimera0/accel-brain-code/tree/master/Deep-Learning-by-means-of-Design-Pattern). First, weight matrix of encoder and decoder is not shered. Second, it is possible to introduce regularization techniques which are not discussed in Malhotra, P., et al. (2016) such as the dropout, the gradient clipping, and limitation of weights. Third, the loss function for reconstruction error is not limited to the L2 norm.

References

  • Malhotra, P., Ramakrishnan, A., Anand, G., Vig, L., Agarwal, P., & Shroff, G. (2016). LSTM-based encoder-decoder for multi-sensor anomaly detection. arXiv preprint arXiv:1607.00148.
encoder_decoder_controller

getter

get_encoder_decoder_controller()[source]

getter

inference(observed_arr)[source]

Infernece by the model.

Parameters:observed_arrnp.ndarray of observed data points.
Returns:np.ndarray of inferenced feature points.
learn(observed_arr, target_arr)[source]

Training the model.

Parameters:
  • observed_arrnp.ndarray of observed data points.
  • target_arrnp.ndarray of target labeled data.
set_readonly(value)[source]

setter

summarize(test_arr, vectorizable_token, sentence_list, limit=5)[source]

Summarize input document.

Parameters:
  • test_arrnp.ndarray of observed data points..
  • vectorizable_token – is-a VectorizableToken.
  • sentence_listlist of all sentences.
  • limit – The number of selected abstract sentence.
Returns:

np.ndarray of scores.

pysummarization.abstractablesemantics.re_seq_2_seq module

class pysummarization.abstractablesemantics.re_seq_2_seq.ReSeq2Seq(margin_param=0.01, retrospective_lambda=0.5, retrospective_eta=0.5, encoder_decoder_controller=None, retrospective_encoder=None, input_neuron_count=20, hidden_neuron_count=20, weight_limit=10000000000.0, dropout_rate=0.5, pre_learning_epochs=1000, epochs=100, batch_size=20, learning_rate=1e-05, learning_attenuate_rate=1.0, attenuate_epoch=50, grad_clip_threshold=10000000000.0, seq_len=8, bptt_tau=8, test_size_rate=0.3, tol=0.0, tld=100.0)[source]

Bases: pysummarization.abstractable_semantics.AbstractableSemantics

A retrospective sequence-to-sequence learning(re-seq2seq).

The concept of the re-seq2seq(Zhang, K. et al., 2018) provided inspiration to this library. This model is a new sequence learning model mainly in the field of Video Summarizations. “The key idea behind re-seq2seq is to measure how well the machine-generated summary is similar to the original video in an abstract semantic space” (Zhang, K. et al., 2018, p3).

The encoder of a seq2seq model observes the original video and output feature points which represents the semantic meaning of the observed data points. Then the feature points is observed by the decoder of this model. Additionally, in the re-seq2seq model, the outputs of the decoder is propagated to a retrospective encoder, which infers feature points to represent the semantic meaning of the summary. “If the summary preserves the important and relevant information in the original video, then we should expect that the two embeddings are similar (e.g. in Euclidean distance)” (Zhang, K. et al., 2018, p3).

This library refers to this intuitive insight above to apply the model to text summarizations. Like videos, semantic feature representation based on representation learning of manifolds is also possible in text summarizations.

The intuition in the design of their loss function is also suggestive. “The intuition behind our modeling is that the outputs should convey the same amount of information as the inputs. For summarization, this is precisely the goal: a good summary should be such that after viewing the summary, users would get about the same amount of information as if they had viewed the original video” (Zhang, K. et al., 2018, p7).

But the model in this library and Zhang, K. et al.(2018) are different in some respects from the relation with the specification of the Deep Learning library: [pydbm](https://github.com/chimera0/accel-brain-code/tree/master/Deep-Learning-by-means-of-Design-Pattern). First, Encoder/Decoder based on LSTM is not designed as a hierarchical structure. Second, it is possible to introduce regularization techniques which are not discussed in Zhang, K. et al.(2018) such as the dropout, the gradient clipping, and limitation of weights. Third, the regression loss function for matching summaries is simplified in terms of calculation efficiency in this library.

References

  • Zhang, K., Grauman, K., & Sha, F. (2018). Retrospective Encoders for Video Summarization. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 383-399).
back_propagation(delta_arr)[source]

Back propagation.

Parameters:delta_output_arr – Delta.
Returns:Tuple data. - decoder’s list of gradations, - encoder’s np.ndarray of Delta, - encoder’s list of gradations.
compute_retrospective_loss()[source]

Compute retrospective loss.

Returns:The tuple data. - np.ndarray of delta. - np.ndarray of losses of each batch. - float of loss of all batch.
encoder_decoder_controller

getter

get_encoder_decoder_controller()[source]

getter

get_logs_arr()[source]

getter

get_retrospective_encoder()[source]

getter

inference(observed_arr)[source]

Infernece by the model.

Parameters:observed_arrnp.ndarray of observed data points.
Returns:np.ndarray of inferenced feature points.
learn(observed_arr, target_arr)[source]

Training the model.

Parameters:
  • observed_arrnp.ndarray of observed data points.
  • target_arrnp.ndarray of target labeled data.
learn_generated(feature_generator)[source]

Learn features generated by FeatureGenerator.

Parameters:feature_generator – is-a FeatureGenerator.
logs_arr

getter

optimize(re_encoder_grads_list, decoder_grads_list, encoder_grads_list, learning_rate, epoch)[source]

Back propagation.

Parameters:
  • re_encoder_grads_list – re-encoder’s list of graduations.
  • decoder_grads_list – decoder’s list of graduations.
  • encoder_grads_list – encoder’s list of graduations.
  • learning_rate – Learning rate.
  • epoch – Now epoch.
retrospective_encoder

getter

set_readonly(value)[source]

setter

summarize(test_arr, vectorizable_token, sentence_list, limit=5)[source]

Summarize input document.

Parameters:
  • test_arrnp.ndarray of observed data points..
  • vectorizable_token – is-a VectorizableToken.
  • sentence_listlist of all sentences.
  • limit – The number of selected abstract sentence.
Returns:

list of str of abstract sentences.

Module contents