Toward abstractive multi-document summarization using submodular function-based framework, sentence compression and merging
Tanvee, Moin Mahmud
University of Lethbridge. Faculty of Arts and Science
Lethbridge, Alta : University of Lethbridge, Dept. of Mathematics and Computer Science
Automatic multi-document summarization is a process of generating a summary that contains the most important information from multiple documents. In this thesis, we design an automatic multi-document summarization system using different abstraction-based methods and submodularity. Our proposed model considers summarization as a budgeted submodular function maximization problem. The model integrates three important measures of a summary - namely importance, coverage, and non-redundancy, and we design a submodular function for each of them. In addition, we integrate sentence compression and sentence merging. When evaluated on the DUC 2004 data set, our generic summarizer has outperformed the state-of-the-art summarization systems in terms of ROUGE-1 recall and f1-measure. For query-focused summarization, we used the DUC 2007 data set where our system achieves statistically similar results to several well-established methods in terms of the ROUGE-2 measure.
automatic text summarization , abstraction-based , submodular function , generic-focused summarization , query-focused summarization , greedy algorithm , Natural language processing (Computer science) -- Research , Querying (Computer science) , Database searching , Parsing (Computer grammar) , Information retrieval , Question-answering systems -- Research , Computer science -- Mathematics