Multi-document summarization based on document clustering and neural sentence fusion
Fuad, Tanvir Ahmed
University of Lethbridge. Faculty of Arts and Science
Lethbridge, Alta. : Universtiy of Lethbridge, Department of Mathematics and Computer Science
In this thesis, we have approached a technique for tackling abstractive text summarization tasks with state-of-the-art results. We have proposed a novel method to improve multidocument summarization. The lack of large multi-document human-authored summaries needed to train seq2seq encoder-decoder models and the inaccuracy in representing multiple long documents into a fixed size vector inspired us to design complementary models for two different tasks such as sentence clustering and neural sentence fusion. In this thesis, we minimize the risk of producing incorrect fact by encoding a related set of sentences as an input to the encoder. We applied our complementary models to implement a full abstractive multi-document summarization system which simultaneously considers importance, coverage, and diversity under a desired length limit. We conduct extensive experiments for all the proposed models which bring significant improvements over the state-of-the-art methods across different evaluation metrics.
automatic text summarization , sentence fusion , mutli document summarization , text clustering , abstractive text summarization , tensor2tensor , Document clustering , Text processing (Computer science) , Automatic abstracting , Electronic information resources -- Abstracting and indexing , Dissertations, Academic