Investigating the impact of programming styles to improve code quality using machine learning and sociolinguistic features
| dc.contributor.author | Abdullah, Deen Mohammad | |
| dc.contributor.author | University of Lethbridge. Faculty of Arts and Science | |
| dc.contributor.supervisor | Rice, Jacqueline E. | |
| dc.date.accessioned | 2026-02-04T23:08:31Z | |
| dc.date.available | 2026-02-04T23:08:31Z | |
| dc.date.issued | 2025 | |
| dc.degree.level | Ph.D | |
| dc.description.abstract | In this research we investigated whether sociolinguistic factors such as gender, region, and expertise influence programming styles and code quality. We collected and processed over 700,000 C++ programs from GitHub and Codeforces to build data sets for training Random Forest and BERT models to classify programmer groups. While capturing stylistic patterns, experimental results showed that context-based models outperform metrics-based models. To measure code quality, we combined the Maintainability Index and difficulty metrics to label code as compliant or non-compliant. We further fine-tuned the T5 model for code transformation to generate stylistically improved code. However, due to the limitations of encoder–decoder LLMs, the generated code samples were non-executable. To address this, we developed a CodeBERT-based recommendation model that generates targeted, metric-driven guidance to improve code quality. Finally, we implemented a prototype tool that combines classifications, code quality, and improvement suggestions, providing pedagogically meaningful feedback for learners and researchers. | |
| dc.embargo | No | |
| dc.identifier.uri | https://hdl.handle.net/10133/7296 | |
| dc.language.iso | en | |
| dc.publisher | Lethbridge, Alta. : University of Lethbridge, Dept. of Mathematics and Computer Science | |
| dc.publisher.department | Department of Mathematics and Computer Science | |
| dc.publisher.faculty | Arts and Science | |
| dc.relation.ispartofseries | Thesis (University of Lethbridge. Faculty of Arts and Science) | |
| dc.subject | programming styles | |
| dc.subject | code quality | |
| dc.subject | sociolinguistic factors | |
| dc.subject | coding style | |
| dc.subject | software metrics | |
| dc.subject | large language models | |
| dc.subject.lcsh | Dissertations, Academic | |
| dc.subject.lcsh | Computer programmers | |
| dc.subject.lcsh | Computer programming | |
| dc.subject.lcsh | Sociolinguistics | |
| dc.subject.lcsh | Software measurement | |
| dc.title | Investigating the impact of programming styles to improve code quality using machine learning and sociolinguistic features | |
| dc.type | Thesis |