Machine learning in the classification of computer code

Thumbnail Image
Tasnim, Nazia
University of Lethbridge. Faculty of Arts and Science
Journal Title
Journal ISSN
Volume Title
Lethbridge, Alta. : University of Lethbridge, Dept. of Mathematics and Computer Science
Machine learning approaches are a well-established method to analyze natural language. Sociolinguistic characteristics, such as the author's gender, experience, and age, have compelling effects on natural language use. Previous research has shown that a computer program can be analyzed using similar linguistics-based approaches. In this research, we are using machine learning techniques to analyze computer programs based on the author's programming experience. We use machine learning and statistical approaches to determine which features are most significant in the classification of a computer program according to the author's programming experience. Several experiments have been carried out on a dataset consisting of computer programs written in C++, and the results are encouraging. The experimental results estimate that the author's programming experience can be predicted with an accuracy of 69%.
Artificial intelligence , Classification , Computer programming , Machine Learning , Programming languages (Electronic computers) , Sociolinguistics