No Cover Image

Conference Paper/Proceeding/Abstract 195 views

Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs

Morteza Rohanian, Julian Hough Orcid Logo, Matthew Purver

Interspeech 2021

Swansea University Author: Julian Hough Orcid Logo

Full text not available from this repository: check for access using links below.

DOI (Published version): 10.21437/interspeech.2021-1633

Abstract

We present two multimodal fusion-based deep learning models that consume ASR transcribed speech and acoustic data simultaneously to classify whether a speaker in a structured diagnostic task has Alzheimer's Disease and to what degree, evaluating the ADReSSo challenge 2021 data. Our best model,...

Full description

Published in: Interspeech 2021
Published: ISCA ISCA 2021
URI: https://cronfa.swan.ac.uk/Record/cronfa64932
Tags: Add Tag
No Tags, Be the first to tag this record!
first_indexed 2023-11-07T22:06:25Z
last_indexed 2023-11-07T22:06:25Z
id cronfa64932
recordtype SURis
fullrecord <?xml version="1.0" encoding="utf-8"?><rfc1807 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><bib-version>v2</bib-version><id>64932</id><entry>2023-11-07</entry><title>Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs</title><swanseaauthors><author><sid>082d773ae261d2bbf49434dd2608ab40</sid><ORCID>0000-0002-4345-6759</ORCID><firstname>Julian</firstname><surname>Hough</surname><name>Julian Hough</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2023-11-07</date><deptcode>MACS</deptcode><abstract>We present two multimodal fusion-based deep learning models that consume ASR transcribed speech and acoustic data simultaneously to classify whether a speaker in a structured diagnostic task has Alzheimer's Disease and to what degree, evaluating the ADReSSo challenge 2021 data. Our best model, a BiLSTM with highway layers using words, word probabilities, disfluency features, pause information, and a variety of acoustic features, achieves an accuracy of 84% and RSME error prediction of 4.26 on MMSE cognitive scores. While predicting cognitive decline is more challenging, our models show improvement using the multimodal approach and word probabilities, disfluency and pause information over word-only models. We show considerable gains for AD classification using multimodal fusion and gating, which can effectively deal with noisy inputs from acoustic features and ASR hypotheses.</abstract><type>Conference Paper/Proceeding/Abstract</type><journal>Interspeech 2021</journal><volume/><journalNumber/><paginationStart/><paginationEnd/><publisher>ISCA</publisher><placeOfPublication>ISCA</placeOfPublication><isbnPrint/><isbnElectronic/><issnPrint/><issnElectronic/><keywords/><publishedDay>30</publishedDay><publishedMonth>8</publishedMonth><publishedYear>2021</publishedYear><publishedDate>2021-08-30</publishedDate><doi>10.21437/interspeech.2021-1633</doi><url/><notes/><college>COLLEGE NANME</college><department>Mathematics and Computer Science School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>MACS</DepartmentCode><institution>Swansea University</institution><apcterm>Not Required</apcterm><funders/><projectreference/><lastEdited>2024-07-11T14:15:59.7177201</lastEdited><Created>2023-11-07T22:02:32.7357830</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Mathematics and Computer Science - Computer Science</level></path><authors><author><firstname>Morteza</firstname><surname>Rohanian</surname><order>1</order></author><author><firstname>Julian</firstname><surname>Hough</surname><orcid>0000-0002-4345-6759</orcid><order>2</order></author><author><firstname>Matthew</firstname><surname>Purver</surname><order>3</order></author></authors><documents/><OutputDurs/></rfc1807>
spelling v2 64932 2023-11-07 Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs 082d773ae261d2bbf49434dd2608ab40 0000-0002-4345-6759 Julian Hough Julian Hough true false 2023-11-07 MACS We present two multimodal fusion-based deep learning models that consume ASR transcribed speech and acoustic data simultaneously to classify whether a speaker in a structured diagnostic task has Alzheimer's Disease and to what degree, evaluating the ADReSSo challenge 2021 data. Our best model, a BiLSTM with highway layers using words, word probabilities, disfluency features, pause information, and a variety of acoustic features, achieves an accuracy of 84% and RSME error prediction of 4.26 on MMSE cognitive scores. While predicting cognitive decline is more challenging, our models show improvement using the multimodal approach and word probabilities, disfluency and pause information over word-only models. We show considerable gains for AD classification using multimodal fusion and gating, which can effectively deal with noisy inputs from acoustic features and ASR hypotheses. Conference Paper/Proceeding/Abstract Interspeech 2021 ISCA ISCA 30 8 2021 2021-08-30 10.21437/interspeech.2021-1633 COLLEGE NANME Mathematics and Computer Science School COLLEGE CODE MACS Swansea University Not Required 2024-07-11T14:15:59.7177201 2023-11-07T22:02:32.7357830 Faculty of Science and Engineering School of Mathematics and Computer Science - Computer Science Morteza Rohanian 1 Julian Hough 0000-0002-4345-6759 2 Matthew Purver 3
title Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs
spellingShingle Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs
Julian Hough
title_short Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs
title_full Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs
title_fullStr Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs
title_full_unstemmed Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs
title_sort Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs
author_id_str_mv 082d773ae261d2bbf49434dd2608ab40
author_id_fullname_str_mv 082d773ae261d2bbf49434dd2608ab40_***_Julian Hough
author Julian Hough
author2 Morteza Rohanian
Julian Hough
Matthew Purver
format Conference Paper/Proceeding/Abstract
container_title Interspeech 2021
publishDate 2021
institution Swansea University
doi_str_mv 10.21437/interspeech.2021-1633
publisher ISCA
college_str Faculty of Science and Engineering
hierarchytype
hierarchy_top_id facultyofscienceandengineering
hierarchy_top_title Faculty of Science and Engineering
hierarchy_parent_id facultyofscienceandengineering
hierarchy_parent_title Faculty of Science and Engineering
department_str School of Mathematics and Computer Science - Computer Science{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Mathematics and Computer Science - Computer Science
document_store_str 0
active_str 0
description We present two multimodal fusion-based deep learning models that consume ASR transcribed speech and acoustic data simultaneously to classify whether a speaker in a structured diagnostic task has Alzheimer's Disease and to what degree, evaluating the ADReSSo challenge 2021 data. Our best model, a BiLSTM with highway layers using words, word probabilities, disfluency features, pause information, and a variety of acoustic features, achieves an accuracy of 84% and RSME error prediction of 4.26 on MMSE cognitive scores. While predicting cognitive decline is more challenging, our models show improvement using the multimodal approach and word probabilities, disfluency and pause information over word-only models. We show considerable gains for AD classification using multimodal fusion and gating, which can effectively deal with noisy inputs from acoustic features and ASR hypotheses.
published_date 2021-08-30T14:15:59Z
_version_ 1804288664854855680
score 11.016235