pysummarization.abstractablesemantics._mxnet package¶

Submodules¶

pysummarization.abstractablesemantics._mxnet.enc_dec_ad module¶

class pysummarization.abstractablesemantics._mxnet.enc_dec_ad.EncDecAD(computable_loss=None, normal_prior_flag=False, encoder_decoder_controller=None, hidden_neuron_count=20, output_neuron_count=20, dropout_rate=0.5, epochs=100, batch_size=20, learning_rate=1e-05, learning_attenuate_rate=1.0, attenuate_epoch=50, seq_len=8)[source]¶

Bases: pysummarization.abstractable_semantics.AbstractableSemantics

LSTM-based Encoder/Decoder scheme for Anomaly Detection (EncDec-AD).

This library applies the Encoder-Decoder scheme for Anomaly Detection (EncDec-AD) to text summarizations by intuition. In this scheme, LSTM-based Encoder/Decoder or so-called the sequence-to-sequence(Seq2Seq) model learns to reconstruct normal time-series behavior, and thereafter uses reconstruction error to detect anomalies.

Malhotra, P., et al. (2016) showed that EncDecAD paradigm is robust and can detect anomalies from predictable, unpredictable, periodic, aperiodic, and quasi-periodic time-series. Further, they showed that the paradigm is able to detect anomalies from short time-series (length as small as 30) as well as long time-series (length as large as 500).

This library refers to the intuitive insight in relation to the use case of reconstruction error to detect anomalies above to apply the model to text summarization. As exemplified by Seq2Seq paradigm, document and sentence which contain tokens of text can be considered as time-series features. The anomalies data detected by EncDec-AD should have to express something about the text.

From the above analogy, this library introduces two conflicting intuitions. On the one hand, the anomalies data may catch observer’s eye from the viewpoints of rarity or amount of information as the indicator of natural language processing like TF-IDF shows. On the other hand, the anomalies data may be ignorable noise as mere outlier.

In any case, this library deduces the function and potential of EncDec-AD in text summarization is to draw the distinction of normal and anomaly texts and is to filter the one from the other.

References

Malhotra, P., Ramakrishnan, A., Anand, G., Vig, L., Agarwal, P., & Shroff, G. (2016). LSTM-based encoder-decoder for multi-sensor anomaly detection. arXiv preprint arXiv:1607.00148.

ctx¶: getter for mx.gpu() or mx.cpu().

encoder_decoder_controller¶: getter

get_ctx()[source]¶: getter for mx.gpu() or mx.cpu().

get_encoder_decoder_controller()[source]¶: getter

inference(observed_arr)[source]¶

Infernece by the model.

Parameters:	observed_arr – np.ndarray of observed data points.
Returns:	np.ndarray of inferenced feature points.

learn(iteratable_data)[source]¶

Learn the observed data points for vector representation of the input time-series.

Parameters:	iteratable_data – is-a IteratableData.

set_ctx(value)[source]¶: setter for mx.gpu() or mx.cpu().

set_readonly(value)[source]¶: setter

summarize(iteratable_data, vectorizable_token, sentence_list, limit=5)[source]¶

Summarize input document.

Parameters:	iteratable_data – is-a IteratableData. vectorizable_token – is-a VectorizableToken. sentence_list – list of all sentences. limit – The number of selected abstract sentence.
Returns:	np.ndarray of scores.

pysummarization.abstractablesemantics._mxnet.re_seq_2_seq module¶

class pysummarization.abstractablesemantics._mxnet.re_seq_2_seq.ReSeq2Seq(initializer=None, computable_loss=None, margin_param=0.01, retrospective_lambda=0.5, retrospective_eta=0.5, encoder_decoder_controller=None, retrospective_encoder=None, hidden_neuron_count=20, output_neuron_count=20, dropout_rate=0.5, batch_size=20, learning_rate=1e-05, learning_attenuate_rate=1.0, attenuate_epoch=50, optimizer_name='sgd', grad_clip_threshold=10000000000.0, seq_len=8, ctx=gpu(0), **kwargs)[source]¶

Bases: mxnet.gluon.block.HybridBlock, pysummarization.abstractable_semantics.AbstractableSemantics

A retrospective sequence-to-sequence learning(re-seq2seq).

The concept of the re-seq2seq(Zhang, K. et al., 2018) provided inspiration to this library. This model is a new sequence learning model mainly in the field of Video Summarizations. “The key idea behind re-seq2seq is to measure how well the machine-generated summary is similar to the original video in an abstract semantic space” (Zhang, K. et al., 2018, p3).

The encoder of a seq2seq model observes the original video and output feature points which represents the semantic meaning of the observed data points. Then the feature points is observed by the decoder of this model. Additionally, in the re-seq2seq model, the outputs of the decoder is propagated to a retrospective encoder, which infers feature points to represent the semantic meaning of the summary. “If the summary preserves the important and relevant information in the original video, then we should expect that the two embeddings are similar (e.g. in Euclidean distance)” (Zhang, K. et al., 2018, p3).

This library refers to this intuitive insight above to apply the model to text summarizations. Like videos, semantic feature representation based on representation learning of manifolds is also possible in text summarizations.

The intuition in the design of their loss function is also suggestive. “The intuition behind our modeling is that the outputs should convey the same amount of information as the inputs. For summarization, this is precisely the goal: a good summary should be such that after viewing the summary, users would get about the same amount of information as if they had viewed the original video” (Zhang, K. et al., 2018, p7).

But the model in this library and Zhang, K. et al.(2018) are different in some respects from the relation with the specification of the Deep Learning library: [accel-brain-base](https://github.com/accel-brain/accel-brain-code/tree/master/Accel-Brain-Base). First, Encoder/Decoder based on LSTM is not designed as a hierarchical structure. Second, it is possible to introduce regularization techniques which are not discussed in Zhang, K. et al.(2018) such as the dropout, the gradient clipping, and limitation of weights. Third, the regression loss function for matching summaries is simplified in terms of calculation efficiency in this library.

Note that penalty terms that penalize mismatched pairs is not implemented due to an implementation issue.

References

Zhang, K., Grauman, K., & Sha, F. (2018). Retrospective Encoders for Video Summarization. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 383-399).

collect_params(select=None)[source]¶: Overrided collect_params in mxnet.gluon.HybridBlok.

compute_retrospective_loss(observed_arr, encoded_arr, decoded_arr, re_encoded_arr)[source]¶

Compute retrospective loss.

Returns:	The tuple data. - np.ndarray of delta. - np.ndarray of losses of each batch. - float of loss of all batch.

ctx¶: getter for mx.gpu() or mx.cpu().

encoder_decoder_controller¶: getter

forward_propagation(F, x)[source]¶

Hybrid forward with Gluon API.

Parameters:	F – mxnet.ndarray or mxnet.symbol. x – mxnet.ndarray of observed data points.
Returns:	mxnet.ndarray or mxnet.symbol of inferenced feature points.

get_ctx()[source]¶: getter for mx.gpu() or mx.cpu().

get_encoder_decoder_controller()[source]¶: getter

get_logs_arr()[source]¶: getter

get_retrospective_encoder()[source]¶: getter

hybrid_forward(F, x)[source]¶

Hybrid forward with Gluon API.

Parameters:	F – mxnet.ndarray or mxnet.symbol. x – mxnet.ndarray of observed data points.
Returns:	mxnet.ndarray or mxnet.symbol of inferenced feature points.

inference(observed_arr)[source]¶

Infernece by the model.

Parameters:	observed_arr – np.ndarray of observed data points.
Returns:	np.ndarray of inferenced feature points.

learn(iteratable_data)[source]¶

Learn the observed data points for vector representation of the input time-series.

Parameters:	iteratable_data – is-a IteratableData.

load_parameters(filename, ctx=None, allow_missing=False, ignore_extra=False)[source]¶

Load parameters to files.

Parameters:	filename – File name. ctx – mx.cpu() or mx.gpu(). allow_missing – bool of whether to silently skip loading parameters not represents in the file. ignore_extra – bool of whether to silently ignre parameters from the file that are not present in this Block.

logs_arr¶: getter

retrospective_encoder¶: getter

save_parameters(filename)[source]¶

Save parameters to files.

Parameters:	filename – File name.

set_ctx(value)[source]¶: setter for mx.gpu() or mx.cpu().

set_readonly(value)[source]¶: setter