DESIGNING A PREDICTIVE MODEL FOR DIAGNOSING DIABETES USING MACHINE LEARNING AND DATA MINING TECHNIQUES
DOI:
https://doi.org/10.61212/Keywords:
Diabetes mellitus, Machine learning, Data mining, Predictive modelAbstract
Diabetes mellitus poses a growing global health burden, demanding timely and accurate diagnostic tools to improve patient outcomes. This research develops and evaluates a predictive model for diagnosing diabetes by leveraging machine learning and data mining techniques. Using a dataset collected from Iraqi medical institutions, the study applied several supervised classification algorithms—including Naïve Bayes, Random Forest, K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Random Tree—across four distinct preprocessing scenarios. These scenarios included steps such as noise filtering, class balancing using SMOTE, and feature selection to enhance model accuracy and robustness. The best-case scenario, which combined all preprocessing techniques, yielded the highest performance: the Random Forest classifier achieved an accuracy of 99.1%, precision of 0.97, F-measure of 0.95 and an AUC of 1. Conversely, the Naïve Bayes algorithm, under the baseline (raw data) scenario, recorded the lowest performance with an accuracy of 87.6%, precision of 0.74, F-measure of 0.75 and an AUC of 0.96. The findings underscore that advanced preprocessing pipelines significantly improve predictive performance and offer a practical framework for early diabetes detection, particularly in low-resource healthcare environments.
References
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Journal of Scientific Development for Studies and Research (JSD)

This work is licensed under a Creative Commons Attribution 4.0 International License.