Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs

Rohanian, Morteza; Hough, Julian; Purver, Matthew

doi:10.21437/interspeech.2021-1633

Conference Paper/Proceeding/Abstract 447 views

Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs

Morteza Rohanian, Julian Hough

, Matthew Purver

Interspeech 2021

Swansea University Author: Julian Hough

Full text not available from this repository: check for access using links below.

DOI (Published version): 10.21437/interspeech.2021-1633

Abstract

We present two multimodal fusion-based deep learning models that consume ASR transcribed speech and acoustic data simultaneously to classify whether a speaker in a structured diagnostic task has Alzheimer's Disease and to what degree, evaluating the ADReSSo challenge 2021 data. Our best model,...

Full description

Published in:	Interspeech 2021
Published:	ISCA ISCA 2021
URI:	https://cronfa.swan.ac.uk/Record/cronfa64932

first_indexed	2023-11-07T22:06:25Z
last_indexed	2024-11-25T14:15:01Z
id	cronfa64932
recordtype	SURis
fullrecord	<?xml version="1.0"?><rfc1807><datestamp>2024-07-11T14:15:59.7177201</datestamp><bib-version>v2</bib-version><id>64932</id><entry>2023-11-07</entry><title>Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs</title><swanseaauthors><author><sid>082d773ae261d2bbf49434dd2608ab40</sid><ORCID>0000-0002-4345-6759</ORCID><firstname>Julian</firstname><surname>Hough</surname><name>Julian Hough</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2023-11-07</date><deptcode>MACS</deptcode><abstract>We present two multimodal fusion-based deep learning models that consume ASR transcribed speech and acoustic data simultaneously to classify whether a speaker in a structured diagnostic task has Alzheimer's Disease and to what degree, evaluating the ADReSSo challenge 2021 data. Our best model, a BiLSTM with highway layers using words, word probabilities, disfluency features, pause information, and a variety of acoustic features, achieves an accuracy of 84% and RSME error prediction of 4.26 on MMSE cognitive scores. While predicting cognitive decline is more challenging, our models show improvement using the multimodal approach and word probabilities, disfluency and pause information over word-only models. We show considerable gains for AD classification using multimodal fusion and gating, which can effectively deal with noisy inputs from acoustic features and ASR hypotheses.</abstract><type>Conference Paper/Proceeding/Abstract</type><journal>Interspeech 2021</journal><volume/><journalNumber/><paginationStart/><paginationEnd/><publisher>ISCA</publisher><placeOfPublication>ISCA</placeOfPublication><isbnPrint/><isbnElectronic/><issnPrint/><issnElectronic/><keywords/><publishedDay>30</publishedDay><publishedMonth>8</publishedMonth><publishedYear>2021</publishedYear><publishedDate>2021-08-30</publishedDate><doi>10.21437/interspeech.2021-1633</doi><url/><notes/><college>COLLEGE NANME</college><department>Mathematics and Computer Science School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>MACS</DepartmentCode><institution>Swansea University</institution><apcterm>Not Required</apcterm><funders/><projectreference/><lastEdited>2024-07-11T14:15:59.7177201</lastEdited><Created>2023-11-07T22:02:32.7357830</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Mathematics and Computer Science - Computer Science</level></path><authors><author><firstname>Morteza</firstname><surname>Rohanian</surname><order>1</order></author><author><firstname>Julian</firstname><surname>Hough</surname><orcid>0000-0002-4345-6759</orcid><order>2</order></author><author><firstname>Matthew</firstname><surname>Purver</surname><order>3</order></author></authors><documents/><OutputDurs/></rfc1807>
spelling	2024-07-11T14:15:59.7177201 v2 64932 2023-11-07 Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs 082d773ae261d2bbf49434dd2608ab40 0000-0002-4345-6759 Julian Hough Julian Hough true false 2023-11-07 MACS We present two multimodal fusion-based deep learning models that consume ASR transcribed speech and acoustic data simultaneously to classify whether a speaker in a structured diagnostic task has Alzheimer's Disease and to what degree, evaluating the ADReSSo challenge 2021 data. Our best model, a BiLSTM with highway layers using words, word probabilities, disfluency features, pause information, and a variety of acoustic features, achieves an accuracy of 84% and RSME error prediction of 4.26 on MMSE cognitive scores. While predicting cognitive decline is more challenging, our models show improvement using the multimodal approach and word probabilities, disfluency and pause information over word-only models. We show considerable gains for AD classification using multimodal fusion and gating, which can effectively deal with noisy inputs from acoustic features and ASR hypotheses. Conference Paper/Proceeding/Abstract Interspeech 2021 ISCA ISCA 30 8 2021 2021-08-30 10.21437/interspeech.2021-1633 COLLEGE NANME Mathematics and Computer Science School COLLEGE CODE MACS Swansea University Not Required 2024-07-11T14:15:59.7177201 2023-11-07T22:02:32.7357830 Faculty of Science and Engineering School of Mathematics and Computer Science - Computer Science Morteza Rohanian 1 Julian Hough 0000-0002-4345-6759 2 Matthew Purver 3
title	Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs
spellingShingle	Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs Julian Hough
title_short	Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs
title_full	Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs
title_fullStr	Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs
title_full_unstemmed	Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs
title_sort	Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs
author_id_str_mv	082d773ae261d2bbf49434dd2608ab40
author_id_fullname_str_mv	082d773ae261d2bbf49434dd2608ab40_***_Julian Hough
author	Julian Hough
author2	Morteza Rohanian Julian Hough Matthew Purver
format	Conference Paper/Proceeding/Abstract
container_title	Interspeech 2021
publishDate	2021
institution	Swansea University
doi_str_mv	10.21437/interspeech.2021-1633
publisher	ISCA
college_str	Faculty of Science and Engineering
hierarchytype
hierarchy_top_id	facultyofscienceandengineering
hierarchy_top_title	Faculty of Science and Engineering
hierarchy_parent_id	facultyofscienceandengineering
hierarchy_parent_title	Faculty of Science and Engineering
department_str	School of Mathematics and Computer Science - Computer Science{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Mathematics and Computer Science - Computer Science
document_store_str	0
active_str	0
description	We present two multimodal fusion-based deep learning models that consume ASR transcribed speech and acoustic data simultaneously to classify whether a speaker in a structured diagnostic task has Alzheimer's Disease and to what degree, evaluating the ADReSSo challenge 2021 data. Our best model, a BiLSTM with highway layers using words, word probabilities, disfluency features, pause information, and a variety of acoustic features, achieves an accuracy of 84% and RSME error prediction of 4.26 on MMSE cognitive scores. While predicting cognitive decline is more challenging, our models show improvement using the multimodal approach and word probabilities, disfluency and pause information over word-only models. We show considerable gains for AD classification using multimodal fusion and gating, which can effectively deal with noisy inputs from acoustic features and ASR hypotheses.
published_date	2021-08-30T08:06:17Z
_version_	1828635989356052480
score	11.056659

Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs

Similar Items