A Comparative Study of Distance Metrics in Machine Learning for Credit Card Fraud Detection
Keywords:
Fraud detection, unsupervised learning, k-means clustering, distance metrics, class imbalanceAbstract
Credit card fraud detection is a critical problem due to the increasing volume of online transactions and the high costs associated with fraudulent activities. The extreme imbalance in the data, where fraudulent transactions are rare compared to legitimate ones, makes fraud detection difficult. This work explores using different distance metrics—Euclidean, Manhattan, and Minkowski—in clustering algorithms such as K-Means to identify fraudulent activity in a credit card dataset. The substantial imbalance is highlighted by the dataset, which consists of European credit card transactions, of which 492 instances are fraudulent out of 284,807 transactions.
The research utilizes undersampling, oversampling, and SMOTE techniques to address class imbalance. Different distance metrics were used to analyze each technique's performance to determine the best clustering strategy. The study found that the Euclidean distance metric produced the best results out of all sampling techniques when applied to the K-Means algorithm. It emphasizes how crucial it is to deal with class disparities and use unsupervised methods for fraud detection in practical settings. Nevertheless, the study also highlights certain drawbacks, such as dataset limitations that made it impossible to categorize various forms of fraud. In future research, there is a scope for improvements in fraud detection systems, particularly in terms of finding enhanced algorithms and expanding data availability.
Downloads
Published
How to Cite
License
Copyright (c) 2024 Mr. Vaishnav Menon, Mr. R Jai Akash, Niveditta Batra
This work is licensed under a Creative Commons Attribution 4.0 International License.
Articles in the Graduate Journal of Interdisciplinary Research, Reports and Reviews (Grad. J. InteR3) by Vyom Hans Publications are published and licensed under a Creative Commons Attribution- CC-BY 4.0 International License.