Improving software security via the use of pre-trained code large language models in vulnerabilities detection

Oladokun, Olanrewaju E.; University of Lethbridge. Faculty of Arts and Science

Improving software security via the use of pre-trained code large language models in vulnerabilities detection

dc.contributor.author	Oladokun, Olanrewaju E.
dc.contributor.author	University of Lethbridge. Faculty of Arts and Science
dc.contributor.supervisor	Rice, Jackie
dc.date.accessioned	2025-10-01T20:18:39Z
dc.date.available	2025-10-01T20:18:39Z
dc.date.issued	2025
dc.degree.level	Masters
dc.description.abstract	The ubiquity and dependence on software systems by people, businesses and organizations in the 21st century has resulted in an upsurge in cyber-attacks in recent times. These attacks are generally characterized by different levels of sophistication, occurrence and complexity that makes it difficult for conventional cybersecurity approaches to adequately mitigate them. Although cybercriminals, including hackers, are usually blamed for most cyber-attacks, the fundamental cause is, however, typically associated with the inherent security weaknesses. These weaknesses are the loopholes in the software source code programs through which hackers exploit systems inways that constitute cybercrimes. Hence, in recent years, various AI-based approaches have been proposed or explored in studies to address this challenge. These innovative methods are aimed at complementing the conventional approaches (including awareness training, malware scanning, and manual code inspection) that have been adopted over the years. In our research, we experimented with the use of emerging AI models called Large Language Models (LLMs) in the detection of vulnerabilities in a software system. As a case study, we used Android software since current statistics reveal that over 71 percent of all mobile phones across the world are based on this software. In our experiment, we utilized LVDAndro: a recently released open-source Android vulnerabilities-dataset for training my selected LLMs, which were CodeBERT and GraphCodeBERT. The goal was to detect vulnerabilities in Android code bases. Overall, our approach achieved better performance (0.99 Accuracy, 0.99 F1) in Android vulnerability detection compared to the classical Machine Learning (ML) (0.94 Accuracy, 0.94 F1) model used in the previous study.
dc.embargo	No
dc.identifier.uri	https://hdl.handle.net/10133/7150
dc.language.iso	en
dc.publisher	Lethbridge, Alta. : University of Lethbridge, Dept. of Mathematics and Computer Science
dc.publisher.department	Department of Mathematics and Computer Science
dc.publisher.faculty	Arts and Science
dc.relation.ispartofseries	Thesis (University of Lethbridge. Faculty of Arts and Science)
dc.subject	Software security
dc.subject	Large Language Models
dc.subject	Artificial intelligence
dc.subject	AI
dc.subject	Android
dc.subject.lcsh	Dissertations, Academic
dc.title	Improving software security via the use of pre-trained code large language models in vulnerabilities detection
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: OLADOKUN_OLANREWAJU__MSC_2025.pdf
Size:: 1.72 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 3.33 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Arts and Science, Faculty of
University of Lethbridge Theses

Library

Improving software security via the use of pre-trained code large language models in vulnerabilities detection

Files

Original bundle

License bundle

Collections

Students

Information for

Campus

Follow us on social media: