No Cover Image

Conference Paper/Proceeding/Abstract 690 views

Modelling imbalanced classes

Humphrey Brydon, Rénette Blignaut, Retha Luus, Isabella M Venter, Desireé Cranfield Orcid Logo

SAS Global Forum 2020 conference proceedings, Pages: 5021 - 2020

Swansea University Author: Desireé Cranfield Orcid Logo

Abstract

In this study separate sampling was applied to various modelling procedures to assist in theidentification of the most important variables describing smartphone users who are securitycompliant. Initial analysis of the data found that only 7% of smartphone users reportedapplying security measures to...

Full description

Published in: SAS Global Forum 2020 conference proceedings
Published: Washington DC 2020
Online Access: https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2020/5021-2020.pdf
URI: https://cronfa.swan.ac.uk/Record/cronfa53910
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract: In this study separate sampling was applied to various modelling procedures to assist in theidentification of the most important variables describing smartphone users who are securitycompliant. Initial analysis of the data found that only 7% of smartphone users reportedapplying security measures to protect their phones and/or their personal information storedon their devices.Due to the class imbalance in the target variable, predictive modelling procedures failed toproduce accurate models. Separate sampling proportions were introduced to establish ifclassification accuracy could be improved. This study tested target class over-samplingratios of 20%, 30%, 40% and 50% and compared the results of the models fitted on thesedata sets to those fitted on the original data where no separate sampling was applied.Models fitted included: decision trees, 5-fold cross-validated decision trees, logisticregression, neural networks and gradient boosted decision trees. The results showed thatthe logistic regression and neural network models produced unstable models regardless ofthe target class ratios. More stable models were however reported for the decision trees, 5-fold cross-validated decision trees and gradient boosted decision trees.Variables found to influence mobile security compliance included age, gender and varioussecurity/privacy related behaviors.
College: Faculty of Humanities and Social Sciences
Start Page: 5021
End Page: 2020