Computer program categorization with machine learning

Thumbnail Image
Rafee, Md Mahmudul Hasan
University of Lethbridge. Faculty of Arts and Science
Journal Title
Journal ISSN
Volume Title
Lethbridge, Alta. : Universtiy of Lethbridge, Department of Mathematics and Computer Science
Machine learning techniques have been applied to improve the learning process and to learn about the utilization of natural languages. Previous research has shown that similar techniques can be applied in the analysis of computer programming (artificial) languages. Several studies have demonstrated the influence of sociolinguistic characteristics such as age, gender, region, and social status in natural languages. This research focuses on determining the impact of sociolinguistic characteristics of the author, particularly gender and region on computer programs. We use machine learning and statistical techniques to find out the similarities and dissimilarities in the use of programming language based on the gender and region of the programmer. The results of various experiments are promising. We demonstrate that we can predict the gender of programmers with 83.1% accuracy and the region of the programmer with 92.5% accuracy.
artificial language , linguistics , machine learning , programmer characteristics , sociolinguistic characteristics , text categorization