Code authorship attribution using content-based and non-content-based features
dc.contributor.author | Bayrami, Parinaz | |
dc.contributor.author | University of Lethbridge. Faculty of Arts and Science | |
dc.contributor.supervisor | Rice, Jacqueline E. | |
dc.date.accessioned | 2021-09-09T15:20:31Z | |
dc.date.available | 2021-09-09T15:20:31Z | |
dc.date.issued | 2021 | |
dc.degree.level | Masters | en_US |
dc.description.abstract | Machine learning approaches are widely used in natural language analysis. Previous research has shown that similar techniques can be applied in the analysis of computer programming (artificial) languages. In this thesis, we focus on identifying the authors of computer programs by using machine learning techniques. We extend these techniques to determine which features capture the writing style of authors in the classification of a computer program according to the author's identity. We then propose a novel approach for computer program author identification. In this method, program features from the text documents are combined with authors' sociological features (gender and region) to develop the classification model. Several experiments have been conducted on two datasets composed of computer programs written in C++, and the results are encouraging. According to the experimental results, the author's identity can be predicted with a $75\%$ accuracy rate. | en_US |
dc.identifier.uri | https://hdl.handle.net/10133/6015 | |
dc.language.iso | en_US | en_US |
dc.proquest.subject | Computer science [0984] | en_US |
dc.proquest.subject | Artificial intelligence [0800] | en_US |
dc.proquest.subject | Mathematics [0405] | en_US |
dc.proquestyes | Yes | en_US |
dc.publisher | Lethbridge, Alta. : University of Lethbridge, Dept. of Mathematics and Computer Science | en_US |
dc.publisher.department | Department of Mathematics and Computer Science | en_US |
dc.publisher.faculty | Arts and Science | en_US |
dc.relation.ispartofseries | Thesis (University of Lethbridge. Faculty of Arts and Science) | en_US |
dc.subject | Research Subject Categories::TECHNOLOGY | en_US |
dc.subject | Artificial intelligence | en_US |
dc.subject | Authorship | en_US |
dc.subject | C++/CLI (Computer program language) | en_US |
dc.subject | Code generators | en_US |
dc.subject | Computer programming | en_US |
dc.subject | Dissertations, Academic | en_US |
dc.subject | Machine learning | en_US |
dc.title | Code authorship attribution using content-based and non-content-based features | en_US |
dc.type | Thesis | en_US |