No Cover Image

Journal article 457 views 55 downloads

A Clustering System for Dynamic Data Streams Based on Metaheuristic Optimisation

Jia Ming Yeoh, Fabio Caraffini Orcid Logo, Elmina Homapour Orcid Logo, Valentino Santucci Orcid Logo, Alfredo Milani Orcid Logo

Mathematics, Volume: 7, Issue: 12, Start page: 1229

Swansea University Author: Fabio Caraffini Orcid Logo

  • 60941_VoR.pdf

    PDF | Version of Record

    Copyright: 2019 by the authors. This is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license

    Download (493.11KB)

Check full text

DOI (Published version): 10.3390/math7121229

Abstract

This article presents the Optimised Stream clustering algorithm (OpStream), a novel approach to cluster dynamic data streams. The proposed system displays desirable features, such as a low number of parameters and good scalability capabilities to both high-dimensional data and numbers of clusters in...

Full description

Published in: Mathematics
ISSN: 2227-7390
Published: MDPI AG 2019
Online Access: Check full text

URI: https://cronfa.swan.ac.uk/Record/cronfa60941
first_indexed 2022-09-21T14:03:00Z
last_indexed 2023-01-13T19:21:26Z
id cronfa60941
recordtype SURis
fullrecord <?xml version="1.0"?><rfc1807><datestamp>2022-09-21T15:04:26.8828650</datestamp><bib-version>v2</bib-version><id>60941</id><entry>2022-08-28</entry><title>A Clustering System for Dynamic Data Streams Based on Metaheuristic Optimisation</title><swanseaauthors><author><sid>d0b8d4e63d512d4d67a02a23dd20dfdb</sid><ORCID>0000-0001-9199-7368</ORCID><firstname>Fabio</firstname><surname>Caraffini</surname><name>Fabio Caraffini</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2022-08-28</date><deptcode>MACS</deptcode><abstract>This article presents the Optimised Stream clustering algorithm (OpStream), a novel approach to cluster dynamic data streams. The proposed system displays desirable features, such as a low number of parameters and good scalability capabilities to both high-dimensional data and numbers of clusters in the dataset, and it is based on a hybrid structure using deterministic clustering methods and stochastic optimisation approaches to optimally centre the clusters. Similar to other state-of-the-art methods available in the literature, it uses &#x201C;microclusters&#x201D; and other established techniques, such as density based clustering. Unlike other methods, it makes use of metaheuristic optimisation to maximise performances during the initialisation phase, which precedes the classic online phase. Experimental results show that OpStream outperforms the state-of-the-art methods in several cases, and it is always competitive against other comparison algorithms regardless of the chosen optimisation method. Three variants of OpStream, each coming with a different optimisation algorithm, are presented in this study. A thorough sensitive analysis is performed by using the best variant to point out OpStream&#x2019;s robustness to noise and resiliency to parameter changes</abstract><type>Journal Article</type><journal>Mathematics</journal><volume>7</volume><journalNumber>12</journalNumber><paginationStart>1229</paginationStart><paginationEnd/><publisher>MDPI AG</publisher><placeOfPublication/><isbnPrint/><isbnElectronic/><issnPrint/><issnElectronic>2227-7390</issnElectronic><keywords>dynamic stream clustering; online clustering; metaheuristics; optimisation; population based algorithms; density based clustering; k-means centroid; concept drift; concept evolution</keywords><publishedDay>12</publishedDay><publishedMonth>12</publishedMonth><publishedYear>2019</publishedYear><publishedDate>2019-12-12</publishedDate><doi>10.3390/math7121229</doi><url/><notes/><college>COLLEGE NANME</college><department>Mathematics and Computer Science School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>MACS</DepartmentCode><institution>Swansea University</institution><apcterm/><funders>This research received no external funding.</funders><projectreference/><lastEdited>2022-09-21T15:04:26.8828650</lastEdited><Created>2022-08-28T20:15:02.0058430</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Mathematics and Computer Science - Computer Science</level></path><authors><author><firstname>Jia Ming</firstname><surname>Yeoh</surname><order>1</order></author><author><firstname>Fabio</firstname><surname>Caraffini</surname><orcid>0000-0001-9199-7368</orcid><order>2</order></author><author><firstname>Elmina</firstname><surname>Homapour</surname><orcid>0000-0001-9756-2744</orcid><order>3</order></author><author><firstname>Valentino</firstname><surname>Santucci</surname><orcid>0000-0003-1483-7998</orcid><order>4</order></author><author><firstname>Alfredo</firstname><surname>Milani</surname><orcid>0000-0003-4534-1805</orcid><order>5</order></author></authors><documents><document><filename>60941__25187__d36c9908c994446a839861289a4da2b6.pdf</filename><originalFilename>60941_VoR.pdf</originalFilename><uploaded>2022-09-21T15:03:22.0105657</uploaded><type>Output</type><contentLength>504942</contentLength><contentType>application/pdf</contentType><version>Version of Record</version><cronfaStatus>true</cronfaStatus><documentNotes>Copyright: 2019 by the authors. This is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language><licence>http://creativecommons.org/licenses/by/4.0/</licence></document></documents><OutputDurs/></rfc1807>
spelling 2022-09-21T15:04:26.8828650 v2 60941 2022-08-28 A Clustering System for Dynamic Data Streams Based on Metaheuristic Optimisation d0b8d4e63d512d4d67a02a23dd20dfdb 0000-0001-9199-7368 Fabio Caraffini Fabio Caraffini true false 2022-08-28 MACS This article presents the Optimised Stream clustering algorithm (OpStream), a novel approach to cluster dynamic data streams. The proposed system displays desirable features, such as a low number of parameters and good scalability capabilities to both high-dimensional data and numbers of clusters in the dataset, and it is based on a hybrid structure using deterministic clustering methods and stochastic optimisation approaches to optimally centre the clusters. Similar to other state-of-the-art methods available in the literature, it uses “microclusters” and other established techniques, such as density based clustering. Unlike other methods, it makes use of metaheuristic optimisation to maximise performances during the initialisation phase, which precedes the classic online phase. Experimental results show that OpStream outperforms the state-of-the-art methods in several cases, and it is always competitive against other comparison algorithms regardless of the chosen optimisation method. Three variants of OpStream, each coming with a different optimisation algorithm, are presented in this study. A thorough sensitive analysis is performed by using the best variant to point out OpStream’s robustness to noise and resiliency to parameter changes Journal Article Mathematics 7 12 1229 MDPI AG 2227-7390 dynamic stream clustering; online clustering; metaheuristics; optimisation; population based algorithms; density based clustering; k-means centroid; concept drift; concept evolution 12 12 2019 2019-12-12 10.3390/math7121229 COLLEGE NANME Mathematics and Computer Science School COLLEGE CODE MACS Swansea University This research received no external funding. 2022-09-21T15:04:26.8828650 2022-08-28T20:15:02.0058430 Faculty of Science and Engineering School of Mathematics and Computer Science - Computer Science Jia Ming Yeoh 1 Fabio Caraffini 0000-0001-9199-7368 2 Elmina Homapour 0000-0001-9756-2744 3 Valentino Santucci 0000-0003-1483-7998 4 Alfredo Milani 0000-0003-4534-1805 5 60941__25187__d36c9908c994446a839861289a4da2b6.pdf 60941_VoR.pdf 2022-09-21T15:03:22.0105657 Output 504942 application/pdf Version of Record true Copyright: 2019 by the authors. This is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license true eng http://creativecommons.org/licenses/by/4.0/
title A Clustering System for Dynamic Data Streams Based on Metaheuristic Optimisation
spellingShingle A Clustering System for Dynamic Data Streams Based on Metaheuristic Optimisation
Fabio Caraffini
title_short A Clustering System for Dynamic Data Streams Based on Metaheuristic Optimisation
title_full A Clustering System for Dynamic Data Streams Based on Metaheuristic Optimisation
title_fullStr A Clustering System for Dynamic Data Streams Based on Metaheuristic Optimisation
title_full_unstemmed A Clustering System for Dynamic Data Streams Based on Metaheuristic Optimisation
title_sort A Clustering System for Dynamic Data Streams Based on Metaheuristic Optimisation
author_id_str_mv d0b8d4e63d512d4d67a02a23dd20dfdb
author_id_fullname_str_mv d0b8d4e63d512d4d67a02a23dd20dfdb_***_Fabio Caraffini
author Fabio Caraffini
author2 Jia Ming Yeoh
Fabio Caraffini
Elmina Homapour
Valentino Santucci
Alfredo Milani
format Journal article
container_title Mathematics
container_volume 7
container_issue 12
container_start_page 1229
publishDate 2019
institution Swansea University
issn 2227-7390
doi_str_mv 10.3390/math7121229
publisher MDPI AG
college_str Faculty of Science and Engineering
hierarchytype
hierarchy_top_id facultyofscienceandengineering
hierarchy_top_title Faculty of Science and Engineering
hierarchy_parent_id facultyofscienceandengineering
hierarchy_parent_title Faculty of Science and Engineering
department_str School of Mathematics and Computer Science - Computer Science{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Mathematics and Computer Science - Computer Science
document_store_str 1
active_str 0
description This article presents the Optimised Stream clustering algorithm (OpStream), a novel approach to cluster dynamic data streams. The proposed system displays desirable features, such as a low number of parameters and good scalability capabilities to both high-dimensional data and numbers of clusters in the dataset, and it is based on a hybrid structure using deterministic clustering methods and stochastic optimisation approaches to optimally centre the clusters. Similar to other state-of-the-art methods available in the literature, it uses “microclusters” and other established techniques, such as density based clustering. Unlike other methods, it makes use of metaheuristic optimisation to maximise performances during the initialisation phase, which precedes the classic online phase. Experimental results show that OpStream outperforms the state-of-the-art methods in several cases, and it is always competitive against other comparison algorithms regardless of the chosen optimisation method. Three variants of OpStream, each coming with a different optimisation algorithm, are presented in this study. A thorough sensitive analysis is performed by using the best variant to point out OpStream’s robustness to noise and resiliency to parameter changes
published_date 2019-12-12T14:17:15Z
_version_ 1821958928840261632
score 11.048149