Automatic text summarization in digital libraries

dc.contributor.authorMlynarski, Angela
dc.contributor.authorUniversity of Lethbridge. Faculty of Arts and Science
dc.contributor.supervisorWitten, Ian
dc.date.accessioned2007-05-23T17:29:48Z
dc.date.available2007-05-23T17:29:48Z
dc.date.issued2006
dc.degree.levelMasters
dc.descriptionxiii, 142 leaves ; 28 cm.en
dc.description.abstractA digital library is a collection of services and information objects for storing, accessing, and retrieving digital objects. Automatic text summarization presents salient information in a condensed form suitable for user needs. This thesis amalgamates digital libraries and automatic text summarization by extending the Greenstone Digital Library software suite to include the University of Lethbridge Summarizer. The tool generates summaries, nouns, and non phrases for use as metadata for searching and browsing digital collections. Digital collections of newspapers, PDFs, and eBooks were created with summary metadata. PDF documents were processed the fastest at 1.8 MB/hr, followed by the newspapers at 1.3 MB/hr, with eBooks being the slowest at 0.9 MV/hr. Qualitative analysis on four genres: newspaper, M.Sc. thesis, novel, and poetry, revealed narrative newspapers were most suitable for automatically generated summarization. The other genres suffered from incoherence and information loss. Overall, summaries for digital collections are suitable when used with newspaper documents and unsuitable for other genres.en
dc.identifier.urihttps://hdl.handle.net/10133/270
dc.language.isoen_USen
dc.publisherLethbridge, Alta. : University of Lethbridge, Faculty of Arts and Science, 2006en
dc.publisher.departmentDepartment of Mathematics and Computer Science
dc.publisher.facultyArts and Science
dc.relation.ispartofseriesThesis (University of Lethbridge. Faculty of Arts and Science)en
dc.subjectDissertations, Academicen
dc.subjectDigital librariesen
dc.subjectAutomatic abstractingen
dc.subjectCluster analysis -- Computer programsen
dc.subjectComputational linguisticsen
dc.titleAutomatic text summarization in digital librariesen
dc.typeThesisen
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
MR17413.pdf
Size:
4.27 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.88 KB
Format:
Item-specific license agreed upon to submission
Description: