No Cover Image

Conference Paper/Proceeding/Abstract 1383 views

Modelling imbalanced classes

Humphrey Brydon, Rénette Blignaut, Retha Luus, Isabella M Venter, Desireé Cranfield Orcid Logo

Proceedings of the SAS® Global Forum 2020 Conference

Swansea University Author: Desireé Cranfield Orcid Logo

Abstract

In this study separate sampling was applied to various modelling procedures to assist in theidentification of the most important variables describing smartphone users who are securitycompliant. Initial analysis of the data found that only 7% of smartphone users reportedapplying security measures to...

Full description

Published in: Proceedings of the SAS® Global Forum 2020 Conference
Published: Cary, NC SAS Institute Inc. 2020
Online Access: https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2020/5021-2020.pdf
URI: https://cronfa.swan.ac.uk/Record/cronfa53910
first_indexed 2020-04-07T20:05:50Z
last_indexed 2025-03-05T05:04:23Z
id cronfa53910
recordtype SURis
fullrecord <?xml version="1.0"?><rfc1807><datestamp>2025-03-04T14:50:37.2874073</datestamp><bib-version>v2</bib-version><id>53910</id><entry>2020-04-07</entry><title>Modelling imbalanced classes</title><swanseaauthors><author><sid>3f8fe4194470d374d18e4738089a6ab1</sid><ORCID>0000-0002-3082-687X</ORCID><firstname>Desire&#xE9;</firstname><surname>Cranfield</surname><name>Desire&#xE9; Cranfield</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2020-04-07</date><deptcode>CBAE</deptcode><abstract>In this study separate sampling was applied to various modelling procedures to assist in theidentification of the most important variables describing smartphone users who are securitycompliant. Initial analysis of the data found that only 7% of smartphone users reportedapplying security measures to protect their phones and/or their personal information storedon their devices.Due to the class imbalance in the target variable, predictive modelling procedures failed toproduce accurate models. Separate sampling proportions were introduced to establish ifclassification accuracy could be improved. This study tested target class over-samplingratios of 20%, 30%, 40% and 50% and compared the results of the models fitted on thesedata sets to those fitted on the original data where no separate sampling was applied.Models fitted included: decision trees, 5-fold cross-validated decision trees, logisticregression, neural networks and gradient boosted decision trees. The results showed thatthe logistic regression and neural network models produced unstable models regardless ofthe target class ratios. More stable models were however reported for the decision trees, 5-fold cross-validated decision trees and gradient boosted decision trees.Variables found to influence mobile security compliance included age, gender and varioussecurity/privacy related behaviors.</abstract><type>Conference Paper/Proceeding/Abstract</type><journal>Proceedings of the SAS&#xAE; Global Forum 2020 Conference</journal><volume/><journalNumber/><paginationStart/><paginationEnd/><publisher>SAS Institute Inc.</publisher><placeOfPublication>Cary, NC</placeOfPublication><isbnPrint/><isbnElectronic/><issnPrint/><issnElectronic/><keywords/><publishedDay>1</publishedDay><publishedMonth>4</publishedMonth><publishedYear>2020</publishedYear><publishedDate>2020-04-01</publishedDate><doi/><url>https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2020/5021-2020.pdf</url><notes/><college>COLLEGE NANME</college><department>Management School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>CBAE</DepartmentCode><institution>Swansea University</institution><apcterm>Not Required</apcterm><funders/><projectreference/><lastEdited>2025-03-04T14:50:37.2874073</lastEdited><Created>2020-04-07T13:16:18.8217285</Created><path><level id="1">Faculty of Humanities and Social Sciences</level><level id="2">School of Management - Business Management</level></path><authors><author><firstname>Humphrey</firstname><surname>Brydon</surname><order>1</order></author><author><firstname>R&#xE9;nette</firstname><surname>Blignaut</surname><order>2</order></author><author><firstname>Retha</firstname><surname>Luus</surname><order>3</order></author><author><firstname>Isabella M</firstname><surname>Venter</surname><order>4</order></author><author><firstname>Desire&#xE9;</firstname><surname>Cranfield</surname><orcid>0000-0002-3082-687X</orcid><order>5</order></author></authors><documents/><OutputDurs/></rfc1807>
spelling 2025-03-04T14:50:37.2874073 v2 53910 2020-04-07 Modelling imbalanced classes 3f8fe4194470d374d18e4738089a6ab1 0000-0002-3082-687X Desireé Cranfield Desireé Cranfield true false 2020-04-07 CBAE In this study separate sampling was applied to various modelling procedures to assist in theidentification of the most important variables describing smartphone users who are securitycompliant. Initial analysis of the data found that only 7% of smartphone users reportedapplying security measures to protect their phones and/or their personal information storedon their devices.Due to the class imbalance in the target variable, predictive modelling procedures failed toproduce accurate models. Separate sampling proportions were introduced to establish ifclassification accuracy could be improved. This study tested target class over-samplingratios of 20%, 30%, 40% and 50% and compared the results of the models fitted on thesedata sets to those fitted on the original data where no separate sampling was applied.Models fitted included: decision trees, 5-fold cross-validated decision trees, logisticregression, neural networks and gradient boosted decision trees. The results showed thatthe logistic regression and neural network models produced unstable models regardless ofthe target class ratios. More stable models were however reported for the decision trees, 5-fold cross-validated decision trees and gradient boosted decision trees.Variables found to influence mobile security compliance included age, gender and varioussecurity/privacy related behaviors. Conference Paper/Proceeding/Abstract Proceedings of the SAS® Global Forum 2020 Conference SAS Institute Inc. Cary, NC 1 4 2020 2020-04-01 https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2020/5021-2020.pdf COLLEGE NANME Management School COLLEGE CODE CBAE Swansea University Not Required 2025-03-04T14:50:37.2874073 2020-04-07T13:16:18.8217285 Faculty of Humanities and Social Sciences School of Management - Business Management Humphrey Brydon 1 Rénette Blignaut 2 Retha Luus 3 Isabella M Venter 4 Desireé Cranfield 0000-0002-3082-687X 5
title Modelling imbalanced classes
spellingShingle Modelling imbalanced classes
Desireé Cranfield
title_short Modelling imbalanced classes
title_full Modelling imbalanced classes
title_fullStr Modelling imbalanced classes
title_full_unstemmed Modelling imbalanced classes
title_sort Modelling imbalanced classes
author_id_str_mv 3f8fe4194470d374d18e4738089a6ab1
author_id_fullname_str_mv 3f8fe4194470d374d18e4738089a6ab1_***_Desireé Cranfield
author Desireé Cranfield
author2 Humphrey Brydon
Rénette Blignaut
Retha Luus
Isabella M Venter
Desireé Cranfield
format Conference Paper/Proceeding/Abstract
container_title Proceedings of the SAS® Global Forum 2020 Conference
publishDate 2020
institution Swansea University
publisher SAS Institute Inc.
college_str Faculty of Humanities and Social Sciences
hierarchytype
hierarchy_top_id facultyofhumanitiesandsocialsciences
hierarchy_top_title Faculty of Humanities and Social Sciences
hierarchy_parent_id facultyofhumanitiesandsocialsciences
hierarchy_parent_title Faculty of Humanities and Social Sciences
department_str School of Management - Business Management{{{_:::_}}}Faculty of Humanities and Social Sciences{{{_:::_}}}School of Management - Business Management
url https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2020/5021-2020.pdf
document_store_str 0
active_str 0
description In this study separate sampling was applied to various modelling procedures to assist in theidentification of the most important variables describing smartphone users who are securitycompliant. Initial analysis of the data found that only 7% of smartphone users reportedapplying security measures to protect their phones and/or their personal information storedon their devices.Due to the class imbalance in the target variable, predictive modelling procedures failed toproduce accurate models. Separate sampling proportions were introduced to establish ifclassification accuracy could be improved. This study tested target class over-samplingratios of 20%, 30%, 40% and 50% and compared the results of the models fitted on thesedata sets to those fitted on the original data where no separate sampling was applied.Models fitted included: decision trees, 5-fold cross-validated decision trees, logisticregression, neural networks and gradient boosted decision trees. The results showed thatthe logistic regression and neural network models produced unstable models regardless ofthe target class ratios. More stable models were however reported for the decision trees, 5-fold cross-validated decision trees and gradient boosted decision trees.Variables found to influence mobile security compliance included age, gender and varioussecurity/privacy related behaviors.
published_date 2020-04-01T04:47:49Z
_version_ 1857437113595723776
score 11.096563