Gurmukhi Punjabi (PA) as a low resource-language through the lens of the BLARK model.

Thumbnail Image
Kaur, Kirandeep
Journal Title
Journal ISSN
Volume Title
Lethbridge, Alta. : University of Lethbridge, Dept. of English
We are venturing into the next phase of digital divide (unequal access to digital technology), where the languages which are not ready for Natural Language Processing (NLP) are at the most risk of losing out on the developments in the fields of Speech and Language technologies. This has brought forth a big gap between the readiness of different languages in terms of taking advantage of the recent developments in the field of computational technologies. Common Language Resources and Technology Infrastructure (CLARIN) - a large-scale pan-European collaborative effort to create, coordinate and make language resources and technology available and readily usable, has developed the Basic Language Resource Kit (BLARK) model to assess the readiness for speech and language technology developments in any language. Punjabi, despite being a major language with millions of native speakers and a significant diaspora population around the world, has received limited attention in the computational technologies. The thesis aims to provide a comprehensive overview of the existing resources, tools, and techniques for Punjabi NLP, as well as to identify the gaps and opportunities for future research using BLARK model as a framework. The thesis, after giving the current (sorry) state of Punjabi in terms of its readiness for computation technologies, concludes with some suggestions for directions and effort which are needed for making Punjabi ready for development of speech and language technologies. The thesis contributes to the field of Punjabi language processing by proposing a generic model for comparing and enhancing Punjabi linguistic resources.
Punjabi language , Gurmukhi , Low-resource language , BLARK model , Basic Language Resource kit , Computational technology , Language processing , Speech technologies , Language technologies