Investigating the impact of programming styles to improve code quality using machine learning and sociolinguistic features

Abdullah, Deen Mohammad; University of Lethbridge. Faculty of Arts and Science

Investigating the impact of programming styles to improve code quality using machine learning and sociolinguistic features

Files

ABDULLAH_DEEN_PHD_2025.pdf (3.01 MB)

Date

2025

Authors

Abdullah, Deen Mohammad

University of Lethbridge. Faculty of Arts and Science

Publisher

Lethbridge, Alta. : University of Lethbridge, Dept. of Mathematics and Computer Science

Abstract

In this research we investigated whether sociolinguistic factors such as gender, region, and expertise influence programming styles and code quality. We collected and processed over 700,000 C++ programs from GitHub and Codeforces to build data sets for training Random Forest and BERT models to classify programmer groups. While capturing stylistic patterns, experimental results showed that context-based models outperform metrics-based models. To measure code quality, we combined the Maintainability Index and difficulty metrics to label code as compliant or non-compliant. We further fine-tuned the T5 model for code transformation to generate stylistically improved code. However, due to the limitations of encoder–decoder LLMs, the generated code samples were non-executable. To address this, we developed a CodeBERT-based recommendation model that generates targeted, metric-driven guidance to improve code quality. Finally, we implemented a prototype tool that combines classifications, code quality, and improvement suggestions, providing pedagogically meaningful feedback for learners and researchers.

Keywords

programming styles, code quality, sociolinguistic factors, coding style, software metrics, large language models

URI

https://hdl.handle.net/10133/7296

Collections

Arts and Science, Faculty of
University of Lethbridge Theses

Full item page

Library

Investigating the impact of programming styles to improve code quality using machine learning and sociolinguistic features

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

Students

Information for

Campus

Follow us on social media: