Code authorship attribution using content-based and non-content-based features

dc.contributor.authorBayrami, Parinaz
dc.contributor.authorUniversity of Lethbridge. Faculty of Arts and Science
dc.contributor.supervisorRice, Jacqueline E.
dc.date.accessioned2021-09-09T15:20:31Z
dc.date.available2021-09-09T15:20:31Z
dc.date.issued2021
dc.degree.levelMastersen_US
dc.description.abstractMachine learning approaches are widely used in natural language analysis. Previous research has shown that similar techniques can be applied in the analysis of computer programming (artificial) languages. In this thesis, we focus on identifying the authors of computer programs by using machine learning techniques. We extend these techniques to determine which features capture the writing style of authors in the classification of a computer program according to the author's identity. We then propose a novel approach for computer program author identification. In this method, program features from the text documents are combined with authors' sociological features (gender and region) to develop the classification model. Several experiments have been conducted on two datasets composed of computer programs written in C++, and the results are encouraging. According to the experimental results, the author's identity can be predicted with a $75\%$ accuracy rate.en_US
dc.identifier.urihttps://hdl.handle.net/10133/6015
dc.language.isoen_USen_US
dc.proquest.subjectComputer science [0984]en_US
dc.proquest.subjectArtificial intelligence [0800]en_US
dc.proquest.subjectMathematics [0405]en_US
dc.proquestyesYesen_US
dc.publisherLethbridge, Alta. : University of Lethbridge, Dept. of Mathematics and Computer Scienceen_US
dc.publisher.departmentDepartment of Mathematics and Computer Scienceen_US
dc.publisher.facultyArts and Scienceen_US
dc.relation.ispartofseriesThesis (University of Lethbridge. Faculty of Arts and Science)en_US
dc.subjectResearch Subject Categories::TECHNOLOGYen_US
dc.subjectArtificial intelligenceen_US
dc.subjectAuthorshipen_US
dc.subjectC++/CLI (Computer program language)en_US
dc.subjectCode generatorsen_US
dc.subjectComputer programmingen_US
dc.subjectDissertations, Academicen_US
dc.subjectMachine learningen_US
dc.titleCode authorship attribution using content-based and non-content-based featuresen_US
dc.typeThesisen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
BAYRAMI_PARINAZ_MSC_2021.pdf
Size:
8.41 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
3.25 KB
Format:
Item-specific license agreed upon to submission
Description: