Development of Machine Learning Algorithm Based Graphs for Android Malware Classification
Date
2022-04
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Al-Neelain University
Abstract
Abstract
The number of mobile devices’ users such as; smartphones and tablets, is
increasing. The invention of smartphones is one of the most important
achievements in the Twenty-First century. Smartphones play a crucial role in
our daily lives, in various fields. The Android operating system is one of the most
widely used platforms these days, the rapid increase in the use of Android and
free applications has contributed to a significant increase in building applications
loaded with Malwares, which causes damage For devices such as (Adware, bot,
Trojans horse) or be a reason for stealing sensitive information for users such as
(Spyware, ransomwares) that locks the data on the victim’s device through
encryption and demand payment for decryption of the data or re-access to the
victim. These applications need to use a number of sensitive permission files
during installation and runtime, Malware developers exploit this to launch
attacks on users.
In this research, an approach is proposed and developed based on the most
imperative permissions and API calls. This was done by using a data set (Drebin)
from the (Drebin) project that contains 15,036 applications and then identifying
and extracting the most important features based on the graph that are effective
in a process of detecting malware applications. Then use machine learning
vitechniques to train and classify the malware detection tool. It was done by using
four machine learning algorithms which are Random Forest Algorithm, K-
Nearest Neighbor Algorithm, Decision Tree Algorithm and Logistic Regression
Algorithm. The results of the experiment showed that this approach achieves an
accuracy rate in the (KNN) algorithm and (DT) algorithm to 96% and an
accuracy rate of up to 95% in the (Logistic Regression) algorithm. The best
accuracy rate is 97% and the recall rate is 96%. When using the (Random Forest)
algorithm, which proves the effectiveness and advantages of this approach.
Description
A thesis submitted in fulfillment of requirement for the
degree of philosophy in Computer Science
Keywords
Machine Learning