A Systematic Literature Review on Software Vulnerability Detection Using Machine Learning Approaches

Ramadan, Aya El-Rahman Kamal El-Deen; Bahaa, Ahmed; Ghoneim, Amr

doi:10.21608/fcihib.2022.87660.1058

A Systematic Literature Review on Software Vulnerability Detection Using Machine Learning Approaches

Document Type : Original Article

Authors

¹ Department of Information Systems, Faculty of Computers and Artificial Intelligence, Helwan University, Cairo, Egypt

² Department of Information Systems, Faculty of Computers and Artificial Intelligence, Helwan University, Helwan 11795, Egypt Department of Information Systems, Faculty of Computers and Artificial Intelligence, Beni-Suef University, Beni-Suef 62521, Egypt

³ faculty of computer and Aritifal intelligence Helwan university

10.21608/fcihib.2022.87660.1058

Abstract

Software vulnerabilities are security flaws, defects, or weaknesses in software architecture, design, or implementation. With the explosion of open source code available for analysis, there is a chance to learn about bug patterns that can lead to security vulnerabilities to assist in the discovery of vulnerabilities. Recent advances in deep learning in natural language processing, speech recognition, and image processing have demonstrated the great potential of neural models to understand natural language. This has encouraged researchers in the cybersecurity sector and software engineering to utilize deep learning to learn and understand vulnerable code patterns and semantics that indicate vulnerable code properties. In this paper, we review and analyze the recent state-of-the-art research adopting machine learning and deep learning techniques to detect software vulnerabilities, aiming to investigate how to leverage neural techniques for learning and understanding code semantics to facilitate vulnerability detection. From this paper's results, 12 primary studies were found from the search processes. 7 out of them were published in IEEE, 2 were published in ACM, 2 were published in Springer and the rest of them were published in different conferences and journals. Most primary studies worked on NVD and SARD datasets, and others used open-source projects. Results show that machine learning and deep learning techniques give promising results in the automatic detection of vulnerabilities, but there are still some gaps in existing models that need to be addressed in future research.

Keywords