Develop Algorithm to deal with Missing Values in Data Mining
Date
2022-06
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Al-Neelain University
Abstract
Abstract
Missing values in datasets can lead to some problems for many
machines learning approaches. As a result, before modeling your prediction
task, it's a good idea to find and fix missing values for each column in your
input data. This is known as missing data imputation, or simply imputing.
Making use of a model to predict missing values is a prominent
technique to missing data imputation. While any of a variety of models can be
used to impute missing values, the k-nearest neighbor (KNN) approach, has
identified to be highly successful.
Current kNN imputation approaches for missing data are built around
Minkowski distance or Euclidean distance or equivalents, and have also been
demonstrated to be highly effective for numerical variables. To manage the
heterogeneous (combined) data, we propose WKNN (Weighted kNN)
imputation, an innovative kNN imputation approach for adaptively imputing
missing data. Instead of using classic distance metric approaches WKNN
determines k nearest neighbors for each missing value by measuring the
weighted hamming distance between data and all the training data. This type
of distance measure can handle either numerical and categorical. WKNN
considers all imputed instances as observational data, which is combined
alongside completed instances to successively impute further missing data.
We examine the suggested model through using the KNN classification
and show that the weighted hamming distance exceeds the Minkowski
distance and Euclidean distance in aspects of identifying the proximity
relation (nearness) between two instances as well as handling the mixed
attributes. Furthermore, results from experiments suggest that the WKNN
model is far more efficient than current kNN imputation approaches when it
comes from providing a good dataset for analysis or predicting.
Description
Thesis submitted in the Requirements for the Degree of Doctor of
Philosophy in Information Technology
Keywords
Algorithm