pysummarization package¶
Subpackages¶
- pysummarization.abstractabledoc package
- pysummarization.abstractablesemantics package
- pysummarization.computabledistance package
- pysummarization.nlpbase package
- pysummarization.readablewebpdf package
- pysummarization.similarityfilter package
- pysummarization.tokenizabledoc package
- pysummarization.vectorizablesentence package
- pysummarization.vectorizabletoken package
- Subpackages
- Submodules
- pysummarization.vectorizabletoken.dbm_like_skip_gram_vectorizer module
- pysummarization.vectorizabletoken.encoder_decoder module
- pysummarization.vectorizabletoken.skip_gram_vectorizer module
- pysummarization.vectorizabletoken.t_hot_vectorizer module
- pysummarization.vectorizabletoken.tfidf_vectorizer module
- Module contents
Submodules¶
pysummarization.abstractable_doc module¶
-
class
pysummarization.abstractable_doc.
AbstractableDoc
[source]¶ Bases:
object
Automatic abstraction and summarization. This is the filtering approach.
This interface is designed the Strategy Pattern.
References
- Luhn, Hans Peter. “The automatic creation of literature abstracts.” IBM Journal of research and development 2.2 (1958): 159-165.
- http://www.oreilly.co.jp/books/9784873116792/
pysummarization.abstractable_semantics module¶
-
class
pysummarization.abstractable_semantics.
AbstractableSemantics
[source]¶ Bases:
object
Automatic abstraction and summarization with the Neural Network language model approach.
This interface is designed the Strategy Pattern.
References:
-
inference
(observed_arr)[source]¶ Infernece by the model.
Parameters: observed_arr – np.ndarray of observed data points. Returns: np.ndarray of inferenced feature points.
-
learn
(iteratable_data)[source]¶ Learn the observed data points for vector representation of the input time-series.
Parameters: iteratable_data – is-a IteratableData.
-
summarize
(test_arr, vectorizable_token, sentence_list, limit=5)[source]¶ Summarize input document.
Parameters: - test_arr – np.ndarray of observed data points..
- vectorizable_token – is-a VectorizableToken.
- sentence_list – list of all sentences.
- limit – The number of selected abstract sentence.
Returns: np.ndarray of scores.
-
pysummarization.computable_distance module¶
pysummarization.n_gram module¶
-
class
pysummarization.n_gram.
Ngram
[source]¶ Bases:
object
N-gram
-
generate_ngram_data_set
(token_list, n=2)[source]¶ Generate the N-gram’s pair.
Parameters: - token_list – The list of tokens.
- N (n) –
Returns: zip of Tuple(Training N-gram data, Target N-gram data)
-
pysummarization.nlp_base module¶
-
class
pysummarization.nlp_base.
NlpBase
[source]¶ Bases:
object
The base class for NLP.
-
delimiter_list
¶ getter
-
listup_sentence
(data, counter=0)[source]¶ Divide string into sentence list.
Parameters: - data – string.
- counter – recursive counter.
Returns: List of sentences.
-
token
¶ getter
-
tokenizable_doc
¶ getter
-
pysummarization.readable_web_pdf module¶
pysummarization.similarity_filter module¶
-
class
pysummarization.similarity_filter.
SimilarityFilter
[source]¶ Bases:
object
Abstract class for filtering mutually similar sentences.
-
calculate
(token_list_x, token_list_y)[source]¶ Calculate similarity.
Abstract method.
Parameters: - token_list_x – [token, token, token, …]
- token_list_y – [token, token, token, …]
Returns: Similarity.
-
count
(token_list)[source]¶ Count the number of tokens in token_list.
Parameters: token_list – The list of tokens. Returns: the numbers} Return type: {token
-
nlp_base
¶ getter
-
similar_filter_r
(sentence_list)[source]¶ Filter mutually similar sentences.
Parameters: sentence_list – The list of sentences. Returns: The list of filtered sentences.
-
similarity_limit
¶ getter
-
pysummarization.tokenizable_doc module¶
pysummarization.vectorizable_sentence module¶
pysummarization.vectorizable_token module¶
pysummarization.web_scraping module¶
-
class
pysummarization.web_scraping.
WebScraping
[source]¶ Bases:
object
Object of Web-scraping.
This is only a demo.
-
readable_web_pdf
¶ getter
-