Investigating the impact of programming styles to improve code quality using machine learning and sociolinguistic features

dc.contributor.authorAbdullah, Deen Mohammad
dc.contributor.authorUniversity of Lethbridge. Faculty of Arts and Science
dc.contributor.supervisorRice, Jacqueline E.
dc.date.accessioned2026-02-04T23:08:31Z
dc.date.available2026-02-04T23:08:31Z
dc.date.issued2025
dc.degree.levelPh.D
dc.description.abstractIn this research we investigated whether sociolinguistic factors such as gender, region, and expertise influence programming styles and code quality. We collected and processed over 700,000 C++ programs from GitHub and Codeforces to build data sets for training Random Forest and BERT models to classify programmer groups. While capturing stylistic patterns, experimental results showed that context-based models outperform metrics-based models. To measure code quality, we combined the Maintainability Index and difficulty metrics to label code as compliant or non-compliant. We further fine-tuned the T5 model for code transformation to generate stylistically improved code. However, due to the limitations of encoder–decoder LLMs, the generated code samples were non-executable. To address this, we developed a CodeBERT-based recommendation model that generates targeted, metric-driven guidance to improve code quality. Finally, we implemented a prototype tool that combines classifications, code quality, and improvement suggestions, providing pedagogically meaningful feedback for learners and researchers.
dc.embargoNo
dc.identifier.urihttps://hdl.handle.net/10133/7296
dc.language.isoen
dc.publisherLethbridge, Alta. : University of Lethbridge, Dept. of Mathematics and Computer Science
dc.publisher.departmentDepartment of Mathematics and Computer Science
dc.publisher.facultyArts and Science
dc.relation.ispartofseriesThesis (University of Lethbridge. Faculty of Arts and Science)
dc.subjectprogramming styles
dc.subjectcode quality
dc.subjectsociolinguistic factors
dc.subjectcoding style
dc.subjectsoftware metrics
dc.subjectlarge language models
dc.subject.lcshDissertations, Academic
dc.subject.lcshComputer programmers
dc.subject.lcshComputer programming
dc.subject.lcshSociolinguistics
dc.subject.lcshSoftware measurement
dc.titleInvestigating the impact of programming styles to improve code quality using machine learning and sociolinguistic features
dc.typeThesis
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ABDULLAH_DEEN_PHD_2025.pdf
Size:
3.01 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
3.33 KB
Format:
Item-specific license agreed upon to submission
Description: