Computer program complexity and its correlation with program features and sociolinguistics

Thumbnail Image
Alam, Sowkat
University of Lethbridge. Faculty of Arts and Science
Journal Title
Journal ISSN
Volume Title
Lethbridge, Alta. : University of Lethbridge, Dept. of Mathematics and Computer Science
Machine learning techniques have been widely used to understand the use of various sociolinguistic characteristics. These techniques can also be applied to analyze artificial languages. This research focuses on the influence of socio-characteristics, especially region and gender, on an artificial language (programming language). Software complexity features, 103 programming features, and their correlations (using pearson correlation) are also explored in this work. Machine learning and statistical techniques are used to determine whether any dissimilarities or similarities exist in the use of C++ programming language. We show that machine learning models can predict the region of programmers with 78.36\% accuracy and the gender of programmers with 62.63\% accuracy. We hypothesize that feature frequency difference may be a reason for lower accuracy in the gender-based program classification. We also demonstrate that some features such as for-loops and if-else conditions are closely correlated to the complexity of a computer program.
Artificial intelligence , Computer programming , Dissertations, Academic , Machine learning , Programming (Computers) , Programming languages (Computers) , Sociolinguistics