Computer program categorization with machine learning

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Lethbridge, Alta. : Universtiy of Lethbridge, Department of Mathematics and Computer Science

Abstract

Machine learning techniques have been applied to improve the learning process and to learn about the utilization of natural languages. Previous research has shown that similar techniques can be applied in the analysis of computer programming (artificial) languages. Several studies have demonstrated the influence of sociolinguistic characteristics such as age, gender, region, and social status in natural languages. This research focuses on determining the impact of sociolinguistic characteristics of the author, particularly gender and region on computer programs. We use machine learning and statistical techniques to find out the similarities and dissimilarities in the use of programming language based on the gender and region of the programmer. The results of various experiments are promising. We demonstrate that we can predict the gender of programmers with 83.1% accuracy and the region of the programmer with 92.5% accuracy.

Description

Citation

Endorsement

Review

Supplemented By

Referenced By