Improving software security via the use of pre-trained code large language models in vulnerabilities detection

dc.contributor.authorOladokun, Olanrewaju E.
dc.contributor.authorUniversity of Lethbridge. Faculty of Arts and Science
dc.contributor.supervisorRice, Jackie
dc.date.accessioned2025-10-01T20:18:39Z
dc.date.available2025-10-01T20:18:39Z
dc.date.issued2025
dc.degree.levelMasters
dc.description.abstractThe ubiquity and dependence on software systems by people, businesses and organizations in the 21st century has resulted in an upsurge in cyber-attacks in recent times. These attacks are generally characterized by different levels of sophistication, occurrence and complexity that makes it difficult for conventional cybersecurity approaches to adequately mitigate them. Although cybercriminals, including hackers, are usually blamed for most cyber-attacks, the fundamental cause is, however, typically associated with the inherent security weaknesses. These weaknesses are the loopholes in the software source code programs through which hackers exploit systems inways that constitute cybercrimes. Hence, in recent years, various AI-based approaches have been proposed or explored in studies to address this challenge. These innovative methods are aimed at complementing the conventional approaches (including awareness training, malware scanning, and manual code inspection) that have been adopted over the years. In our research, we experimented with the use of emerging AI models called Large Language Models (LLMs) in the detection of vulnerabilities in a software system. As a case study, we used Android software since current statistics reveal that over 71 percent of all mobile phones across the world are based on this software. In our experiment, we utilized LVDAndro: a recently released open-source Android vulnerabilities-dataset for training my selected LLMs, which were CodeBERT and GraphCodeBERT. The goal was to detect vulnerabilities in Android code bases. Overall, our approach achieved better performance (0.99 Accuracy, 0.99 F1) in Android vulnerability detection compared to the classical Machine Learning (ML) (0.94 Accuracy, 0.94 F1) model used in the previous study.
dc.embargoNo
dc.identifier.urihttps://hdl.handle.net/10133/7150
dc.language.isoen
dc.publisherLethbridge, Alta. : University of Lethbridge, Dept. of Mathematics and Computer Science
dc.publisher.departmentDepartment of Mathematics and Computer Science
dc.publisher.facultyArts and Science
dc.relation.ispartofseriesThesis (University of Lethbridge. Faculty of Arts and Science)
dc.subjectSoftware security
dc.subjectLarge Language Models
dc.subjectArtificial intelligence
dc.subjectAI
dc.subjectAndroid
dc.subject.lcshDissertations, Academic
dc.titleImproving software security via the use of pre-trained code large language models in vulnerabilities detection
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
OLADOKUN_OLANREWAJU__MSC_2025.pdf
Size:
1.72 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
3.33 KB
Format:
Item-specific license agreed upon to submission
Description: