Improving software security via the use of pre-trained code large language models in vulnerabilities detection

Oladokun, Olanrewaju E.; University of Lethbridge. Faculty of Arts and Science

Improving software security via the use of pre-trained code large language models in vulnerabilities detection

Files

OLADOKUN_OLANREWAJU__MSC_2025.pdf (1.72 MB)

Date

2025

Authors

Oladokun, Olanrewaju E.

University of Lethbridge. Faculty of Arts and Science

Publisher

Lethbridge, Alta. : University of Lethbridge, Dept. of Mathematics and Computer Science

Abstract

The ubiquity and dependence on software systems by people, businesses and organizations in the 21st century has resulted in an upsurge in cyber-attacks in recent times. These attacks are generally characterized by different levels of sophistication, occurrence and complexity that makes it difficult for conventional cybersecurity approaches to adequately mitigate them. Although cybercriminals, including hackers, are usually blamed for most cyber-attacks, the fundamental cause is, however, typically associated with the inherent security weaknesses. These weaknesses are the loopholes in the software source code programs through which hackers exploit systems inways that constitute cybercrimes. Hence, in recent years, various AI-based approaches have been proposed or explored in studies to address this challenge. These innovative methods are aimed at complementing the conventional approaches (including awareness training, malware scanning, and manual code inspection) that have been adopted over the years. In our research, we experimented with the use of emerging AI models called Large Language Models (LLMs) in the detection of vulnerabilities in a software system. As a case study, we used Android software since current statistics reveal that over 71 percent of all mobile phones across the world are based on this software. In our experiment, we utilized LVDAndro: a recently released open-source Android vulnerabilities-dataset for training my selected LLMs, which were CodeBERT and GraphCodeBERT. The goal was to detect vulnerabilities in Android code bases. Overall, our approach achieved better performance (0.99 Accuracy, 0.99 F1) in Android vulnerability detection compared to the classical Machine Learning (ML) (0.94 Accuracy, 0.94 F1) model used in the previous study.

Keywords

Software security, Large Language Models, Artificial intelligence, AI, Android

URI

https://hdl.handle.net/10133/7150

Collections

Arts and Science, Faculty of
University of Lethbridge Theses

Full item page

Library

Improving software security via the use of pre-trained code large language models in vulnerabilities detection

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

Students

Information for

Campus

Follow us on social media: