No Cover Image

Journal article 457 views 55 downloads

A Clustering System for Dynamic Data Streams Based on Metaheuristic Optimisation

Jia Ming Yeoh, Fabio Caraffini Orcid Logo, Elmina Homapour Orcid Logo, Valentino Santucci Orcid Logo, Alfredo Milani Orcid Logo

Mathematics, Volume: 7, Issue: 12, Start page: 1229

Swansea University Author: Fabio Caraffini Orcid Logo

  • 60941_VoR.pdf

    PDF | Version of Record

    Copyright: 2019 by the authors. This is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license

    Download (493.11KB)

Check full text

DOI (Published version): 10.3390/math7121229

Abstract

This article presents the Optimised Stream clustering algorithm (OpStream), a novel approach to cluster dynamic data streams. The proposed system displays desirable features, such as a low number of parameters and good scalability capabilities to both high-dimensional data and numbers of clusters in...

Full description

Published in: Mathematics
ISSN: 2227-7390
Published: MDPI AG 2019
Online Access: Check full text

URI: https://cronfa.swan.ac.uk/Record/cronfa60941
Abstract: This article presents the Optimised Stream clustering algorithm (OpStream), a novel approach to cluster dynamic data streams. The proposed system displays desirable features, such as a low number of parameters and good scalability capabilities to both high-dimensional data and numbers of clusters in the dataset, and it is based on a hybrid structure using deterministic clustering methods and stochastic optimisation approaches to optimally centre the clusters. Similar to other state-of-the-art methods available in the literature, it uses “microclusters” and other established techniques, such as density based clustering. Unlike other methods, it makes use of metaheuristic optimisation to maximise performances during the initialisation phase, which precedes the classic online phase. Experimental results show that OpStream outperforms the state-of-the-art methods in several cases, and it is always competitive against other comparison algorithms regardless of the chosen optimisation method. Three variants of OpStream, each coming with a different optimisation algorithm, are presented in this study. A thorough sensitive analysis is performed by using the best variant to point out OpStream’s robustness to noise and resiliency to parameter changes
Keywords: dynamic stream clustering; online clustering; metaheuristics; optimisation; population based algorithms; density based clustering; k-means centroid; concept drift; concept evolution
College: Faculty of Science and Engineering
Funders: This research received no external funding.
Issue: 12
Start Page: 1229