Show simple item record

dc.contributor.authorNasir, Murtaza
dc.contributor.authorDag, Ali
dc.contributor.authorSimsek, Serhat
dc.contributor.authorIvanov, Anton
dc.contributor.authorOztekin, Asil
dc.date.accessioned2023-02-06T21:46:05Z
dc.date.available2023-02-06T21:46:05Z
dc.date.issued2022-10-02
dc.identifier.citationMurtaza Nasir, Ali Dag, Serhat Simsek, Anton Ivanov & Asil Oztekin (2022) Improving Imbalanced Machine Learning with Neighborhood-Informed Synthetic Sample Placement, Journal of Management Information Systems, 39:4, 1116-1145, DOI: 10.1080/07421222.2022.2127453
dc.identifier.issn0742-1222
dc.identifier.urihttps://doi.org/10.1080/07421222.2022.2127453
dc.identifier.urihttps://soar.wichita.edu/handle/10057/25001
dc.descriptionClick on the DOI to access this article (may not be free).
dc.description.abstractMachine learning is widely used in information systems design. Yet, training algorithms on imbalanced datasets may severely affect performance on unseen data. For example, in some cases in healthcare, fintech, or cybersecurity contexts, certain subclasses are difficult to learn because they are underrepresented in training data. Our study offers a flexible and efficient solution based on a new synthetic average neighborhood sampling algorithm (SANSA), which, in contrast to other solutions, introduces a novel ?placement? parameter that can be tuned to adapt to each dataset?s unique manifestation of the imbalance. This package can be downloaded for R1. We tested SANSA against seven existing sampling methods used in conjunction with the four most frequently used machine learning models trained on 14 benchmark datasets. Our results provide suggestive evidence that SANSA offers a feasible solution to the imbalance problem for most datasets. Our findings provide practical recommendations for how SANSA can be effectively implemented while reducing the complexity level of an imbalanced learning pipeline.
dc.language.isoen_US
dc.publisherRoutledge
dc.relation.ispartofseriesJournal of Management Information Systems
dc.relation.ispartofseriesVolume 39, No. 4
dc.subjectImbalanced data
dc.subjectOversampling
dc.subjectUndersampling
dc.subjectMachine learning
dc.subjectPredictive analytics
dc.subjectClassification prediction performance
dc.subjectAlgorithm training
dc.titleImproving imbalanced machine learning with neighborhood-informed synthetic sample placement
dc.typeArticle
dc.rights.holderRights managed by Taylor & Francis


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record