Do sociolinguistic variations exist In programming?

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Lethbridge, Alta. : University of Lethbridge, Dept. of Mathematics and Computer Science

Abstract

Machine learning techniques are currently widely used in the analysis of natural language. This thesis focuses on extending these techniques for analysis of programming languages. In particular we are interested in determining whether there are differences in the use of programming languages that might be associated with the authors’ gender. There are currently few studies that address possible relationships between linguistics and programming. In this thesis we use computer programs as the samples in our dataset. These programs have been written using the C++ programming language. We also acquired sociolinguistic information about the programmers, with the focus especially on gender. We use machine learning and statistical techniques to identify patterns (in language use) that are consistent for male and female programmers. The results of numerous experiments are encouraging. We demonstrate that we can predict the gender of programmers with 71% accuracy and detect similarities or dissimilarities in their programming style.

Description

Citation

Endorsement

Review

Supplemented By

Referenced By