PHD theses : Computer Science

Permanent URI for this collectionhttps://repository.neelain.edu.sd/handle/123456789/12169

Browse

Search Results

Now showing 1 - 1 of 1
  • Thumbnail Image
    Item
    Develop Algorithm to deal with Missing Values in Data Mining
    (Al-Neelain University, 2022-06) Moez Mutasim Ali Abdelmagid
    Abstract Missing values in datasets can lead to some problems for many machines learning approaches. As a result, before modeling your prediction task, it's a good idea to find and fix missing values for each column in your input data. This is known as missing data imputation, or simply imputing. Making use of a model to predict missing values is a prominent technique to missing data imputation. While any of a variety of models can be used to impute missing values, the k-nearest neighbor (KNN) approach, has identified to be highly successful. Current kNN imputation approaches for missing data are built around Minkowski distance or Euclidean distance or equivalents, and have also been demonstrated to be highly effective for numerical variables. To manage the heterogeneous (combined) data, we propose WKNN (Weighted kNN) imputation, an innovative kNN imputation approach for adaptively imputing missing data. Instead of using classic distance metric approaches WKNN determines k nearest neighbors for each missing value by measuring the weighted hamming distance between data and all the training data. This type of distance measure can handle either numerical and categorical. WKNN considers all imputed instances as observational data, which is combined alongside completed instances to successively impute further missing data. We examine the suggested model through using the KNN classification and show that the weighted hamming distance exceeds the Minkowski distance and Euclidean distance in aspects of identifying the proximity relation (nearness) between two instances as well as handling the mixed attributes. Furthermore, results from experiments suggest that the WKNN model is far more efficient than current kNN imputation approaches when it comes from providing a good dataset for analysis or predicting.