Computer program categorization with machine learning
Loading...
Date
2017
Authors
Rafee, Md Mahmudul Hasan
University of Lethbridge. Faculty of Arts and Science
Journal Title
Journal ISSN
Volume Title
Publisher
Lethbridge, Alta. : Universtiy of Lethbridge, Department of Mathematics and Computer Science
Abstract
Machine learning techniques have been applied to improve the learning process and to
learn about the utilization of natural languages. Previous research has shown that similar
techniques can be applied in the analysis of computer programming (artificial) languages.
Several studies have demonstrated the influence of sociolinguistic characteristics such as
age, gender, region, and social status in natural languages. This research focuses on determining
the impact of sociolinguistic characteristics of the author, particularly gender and
region on computer programs. We use machine learning and statistical techniques to find
out the similarities and dissimilarities in the use of programming language based on the
gender and region of the programmer. The results of various experiments are promising.
We demonstrate that we can predict the gender of programmers with 83.1% accuracy and
the region of the programmer with 92.5% accuracy.
Description
Keywords
artificial language , linguistics , machine learning , programmer characteristics , sociolinguistic characteristics , text categorization