Conference Paper/Proceeding/Abstract 691 views
Modelling imbalanced classes
SAS Global Forum 2020 conference proceedings, Pages: 5021 - 2020
Swansea University Author:
Desireé Cranfield
Abstract
In this study separate sampling was applied to various modelling procedures to assist in theidentification of the most important variables describing smartphone users who are securitycompliant. Initial analysis of the data found that only 7% of smartphone users reportedapplying security measures to...
Published in: | SAS Global Forum 2020 conference proceedings |
---|---|
Published: |
Washington DC
2020
|
Online Access: |
https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2020/5021-2020.pdf |
URI: | https://cronfa.swan.ac.uk/Record/cronfa53910 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
first_indexed |
2020-04-07T20:05:50Z |
---|---|
last_indexed |
2021-08-03T03:10:38Z |
id |
cronfa53910 |
recordtype |
SURis |
fullrecord |
<?xml version="1.0"?><rfc1807><datestamp>2021-08-02T11:22:40.9033134</datestamp><bib-version>v2</bib-version><id>53910</id><entry>2020-04-07</entry><title>Modelling imbalanced classes</title><swanseaauthors><author><sid>3f8fe4194470d374d18e4738089a6ab1</sid><ORCID>0000-0002-3082-687X</ORCID><firstname>Desireé</firstname><surname>Cranfield</surname><name>Desireé Cranfield</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2020-04-07</date><deptcode>BBU</deptcode><abstract>In this study separate sampling was applied to various modelling procedures to assist in theidentification of the most important variables describing smartphone users who are securitycompliant. Initial analysis of the data found that only 7% of smartphone users reportedapplying security measures to protect their phones and/or their personal information storedon their devices.Due to the class imbalance in the target variable, predictive modelling procedures failed toproduce accurate models. Separate sampling proportions were introduced to establish ifclassification accuracy could be improved. This study tested target class over-samplingratios of 20%, 30%, 40% and 50% and compared the results of the models fitted on thesedata sets to those fitted on the original data where no separate sampling was applied.Models fitted included: decision trees, 5-fold cross-validated decision trees, logisticregression, neural networks and gradient boosted decision trees. The results showed thatthe logistic regression and neural network models produced unstable models regardless ofthe target class ratios. More stable models were however reported for the decision trees, 5-fold cross-validated decision trees and gradient boosted decision trees.Variables found to influence mobile security compliance included age, gender and varioussecurity/privacy related behaviors.</abstract><type>Conference Paper/Proceeding/Abstract</type><journal>SAS Global Forum 2020 conference proceedings</journal><volume/><journalNumber/><paginationStart>5021</paginationStart><paginationEnd>2020</paginationEnd><publisher/><placeOfPublication>Washington DC</placeOfPublication><isbnPrint/><isbnElectronic/><issnPrint/><issnElectronic/><keywords/><publishedDay>1</publishedDay><publishedMonth>4</publishedMonth><publishedYear>2020</publishedYear><publishedDate>2020-04-01</publishedDate><doi/><url>https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2020/5021-2020.pdf</url><notes/><college>COLLEGE NANME</college><department>Business</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>BBU</DepartmentCode><institution>Swansea University</institution><apcterm/><lastEdited>2021-08-02T11:22:40.9033134</lastEdited><Created>2020-04-07T13:16:18.8217285</Created><path><level id="1">Faculty of Humanities and Social Sciences</level><level id="2">School of Management - Business Management</level></path><authors><author><firstname>Humphrey</firstname><surname>Brydon</surname><order>1</order></author><author><firstname>Rénette</firstname><surname>Blignaut</surname><order>2</order></author><author><firstname>Retha</firstname><surname>Luus</surname><order>3</order></author><author><firstname>Isabella M</firstname><surname>Venter</surname><order>4</order></author><author><firstname>Desireé</firstname><surname>Cranfield</surname><orcid>0000-0002-3082-687X</orcid><order>5</order></author></authors><documents/><OutputDurs/></rfc1807> |
spelling |
2021-08-02T11:22:40.9033134 v2 53910 2020-04-07 Modelling imbalanced classes 3f8fe4194470d374d18e4738089a6ab1 0000-0002-3082-687X Desireé Cranfield Desireé Cranfield true false 2020-04-07 BBU In this study separate sampling was applied to various modelling procedures to assist in theidentification of the most important variables describing smartphone users who are securitycompliant. Initial analysis of the data found that only 7% of smartphone users reportedapplying security measures to protect their phones and/or their personal information storedon their devices.Due to the class imbalance in the target variable, predictive modelling procedures failed toproduce accurate models. Separate sampling proportions were introduced to establish ifclassification accuracy could be improved. This study tested target class over-samplingratios of 20%, 30%, 40% and 50% and compared the results of the models fitted on thesedata sets to those fitted on the original data where no separate sampling was applied.Models fitted included: decision trees, 5-fold cross-validated decision trees, logisticregression, neural networks and gradient boosted decision trees. The results showed thatthe logistic regression and neural network models produced unstable models regardless ofthe target class ratios. More stable models were however reported for the decision trees, 5-fold cross-validated decision trees and gradient boosted decision trees.Variables found to influence mobile security compliance included age, gender and varioussecurity/privacy related behaviors. Conference Paper/Proceeding/Abstract SAS Global Forum 2020 conference proceedings 5021 2020 Washington DC 1 4 2020 2020-04-01 https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2020/5021-2020.pdf COLLEGE NANME Business COLLEGE CODE BBU Swansea University 2021-08-02T11:22:40.9033134 2020-04-07T13:16:18.8217285 Faculty of Humanities and Social Sciences School of Management - Business Management Humphrey Brydon 1 Rénette Blignaut 2 Retha Luus 3 Isabella M Venter 4 Desireé Cranfield 0000-0002-3082-687X 5 |
title |
Modelling imbalanced classes |
spellingShingle |
Modelling imbalanced classes Desireé Cranfield |
title_short |
Modelling imbalanced classes |
title_full |
Modelling imbalanced classes |
title_fullStr |
Modelling imbalanced classes |
title_full_unstemmed |
Modelling imbalanced classes |
title_sort |
Modelling imbalanced classes |
author_id_str_mv |
3f8fe4194470d374d18e4738089a6ab1 |
author_id_fullname_str_mv |
3f8fe4194470d374d18e4738089a6ab1_***_Desireé Cranfield |
author |
Desireé Cranfield |
author2 |
Humphrey Brydon Rénette Blignaut Retha Luus Isabella M Venter Desireé Cranfield |
format |
Conference Paper/Proceeding/Abstract |
container_title |
SAS Global Forum 2020 conference proceedings |
container_start_page |
5021 |
publishDate |
2020 |
institution |
Swansea University |
college_str |
Faculty of Humanities and Social Sciences |
hierarchytype |
|
hierarchy_top_id |
facultyofhumanitiesandsocialsciences |
hierarchy_top_title |
Faculty of Humanities and Social Sciences |
hierarchy_parent_id |
facultyofhumanitiesandsocialsciences |
hierarchy_parent_title |
Faculty of Humanities and Social Sciences |
department_str |
School of Management - Business Management{{{_:::_}}}Faculty of Humanities and Social Sciences{{{_:::_}}}School of Management - Business Management |
url |
https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2020/5021-2020.pdf |
document_store_str |
0 |
active_str |
0 |
description |
In this study separate sampling was applied to various modelling procedures to assist in theidentification of the most important variables describing smartphone users who are securitycompliant. Initial analysis of the data found that only 7% of smartphone users reportedapplying security measures to protect their phones and/or their personal information storedon their devices.Due to the class imbalance in the target variable, predictive modelling procedures failed toproduce accurate models. Separate sampling proportions were introduced to establish ifclassification accuracy could be improved. This study tested target class over-samplingratios of 20%, 30%, 40% and 50% and compared the results of the models fitted on thesedata sets to those fitted on the original data where no separate sampling was applied.Models fitted included: decision trees, 5-fold cross-validated decision trees, logisticregression, neural networks and gradient boosted decision trees. The results showed thatthe logistic regression and neural network models produced unstable models regardless ofthe target class ratios. More stable models were however reported for the decision trees, 5-fold cross-validated decision trees and gradient boosted decision trees.Variables found to influence mobile security compliance included age, gender and varioussecurity/privacy related behaviors. |
published_date |
2020-04-01T04:07:08Z |
_version_ |
1763753516267995136 |
score |
11.012678 |