No Cover Image

E-Thesis 352 views 111 downloads

Objective assessment of speech intelligibility. / Wei Ming Liu

Swansea University Author: Wei Ming Liu

Abstract

This thesis addresses the topic of objective speech intelligibility assessment. Speech intelligibility is becoming an important issue due most possibly to the rapid growth in digital communication systems in recent decades; as well as the increasing demand for security-based applications where intel...

Full description

Published: 2008
Institution: Swansea University
Degree level: Doctoral
Degree name: Ph.D
URI: https://cronfa.swan.ac.uk/Record/cronfa42738
Tags: Add Tag
No Tags, Be the first to tag this record!
first_indexed 2018-08-02T18:55:25Z
last_indexed 2018-08-03T10:10:58Z
id cronfa42738
recordtype RisThesis
fullrecord <?xml version="1.0"?><rfc1807><datestamp>2018-08-02T16:24:30.2894046</datestamp><bib-version>v2</bib-version><id>42738</id><entry>2018-08-02</entry><title>Objective assessment of speech intelligibility.</title><swanseaauthors><author><sid>4561378b366645078603f8678a37bd0c</sid><ORCID>NULL</ORCID><firstname>Wei Ming</firstname><surname>Liu</surname><name>Wei Ming Liu</name><active>true</active><ethesisStudent>true</ethesisStudent></author></swanseaauthors><date>2018-08-02</date><abstract>This thesis addresses the topic of objective speech intelligibility assessment. Speech intelligibility is becoming an important issue due most possibly to the rapid growth in digital communication systems in recent decades; as well as the increasing demand for security-based applications where intelligibility, rather than the overall quality, is the priority. Afterall, the loss of intelligibility means that communication does not exist. This research sets out to investigate the potential of automatic speech recognition (ASR) in intelligibility assessment, the motivation being the obvious link between word recognition and intelligibility. As a pre-cursor, quality measures are first considered since intelligibility is an attribute encompassed in overall quality. Here, 9 prominent quality measures including the state-of-the-art Perceptual Evaluation of Speech Quality (PESQ) are assessed. A large range of degradations are considered including additive noise and those introduced by coding and enhancement schemes. Experimental results show that apart from Weighted Spectral Slope (WSS), generally the quality scores from all other quality measures considered here correlate poorly with intelligibility. Poor correlations are observed especially when dealing with speech-like noises and degradations introduced by enhancement processes. ASR is then considered where various word recognition statistics, namely word accuracy, percentage correct, deletion, substitution and insertion are assessed as potential intelligibility measure. One critical contribution is the observation that there are links between different ASR statistics and different forms of degradation. Such links enable suitable statistics to be chosen for intelligibility assessment in different applications. In overall word accuracy from an ASR system trained on clean signals has the highest correlation with intelligibility. However, as is the case with quality measures, none of the ASR scores correlate well in the context of enhancement schemes since such processes are known to improve machine-based scores without necessarily improving intelligibility. This demonstrates the limitation of ASR in intelligibility assessment. As an extension to word modelling in ASR, one major contribution of this work relates to the novel use of a data-driven (DD) classifier in this context. The classifier is trained on intelligibility information and its output scores relate directly to intelligibility rather than indirectly through quality or ASR scores as in earlier attempts. A critical obstacle with the development of such a DD classifier is establishing the large amount of ground truth necessary for training. This leads to the next significant contribution, namely the proposal of a convenient strategy to generate potentially unlimited amounts of synthetic ground truth based on a well-supported hypothesis that speech processings rarely improve intelligibility. Subsequent contributions include the search for good features that could enhance classification accuracy. Scores given by quality measures and ASR are indicative of intelligibility hence could serve as potential features for the data-driven intelligibility classifier. Both are in investigated in this research and results show ASR-based features to be superior. A final contribution is a novel feature set based on the concept of anchor models where each anchor represents a chosen degradation. Signal intelligibility is characterised by the similarity between the degradation under test and a cohort of degradation anchors. The anchoring feature set leads to an average classification accuracy of 88% with synthetic ground truth and 82% with human ground truth evaluation sets. The latter compares favourably with 69% achieved by WSS (the best quality measure) and 68% by word accuracy from a clean-trained ASR (the best ASR-based measure) which are assessed on identical test sets.</abstract><type>E-Thesis</type><journal/><journalNumber></journalNumber><paginationStart/><paginationEnd/><publisher/><placeOfPublication/><isbnPrint/><issnPrint/><issnElectronic/><keywords>Computer science.</keywords><publishedDay>31</publishedDay><publishedMonth>12</publishedMonth><publishedYear>2008</publishedYear><publishedDate>2008-12-31</publishedDate><doi/><url/><notes/><college>COLLEGE NANME</college><department>Engineering</department><CollegeCode>COLLEGE CODE</CollegeCode><institution>Swansea University</institution><degreelevel>Doctoral</degreelevel><degreename>Ph.D</degreename><apcterm/><lastEdited>2018-08-02T16:24:30.2894046</lastEdited><Created>2018-08-02T16:24:30.2894046</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Engineering and Applied Sciences - Uncategorised</level></path><authors><author><firstname>Wei Ming</firstname><surname>Liu</surname><orcid>NULL</orcid><order>1</order></author></authors><documents><document><filename>0042738-02082018162518.pdf</filename><originalFilename>10807507.pdf</originalFilename><uploaded>2018-08-02T16:25:18.1030000</uploaded><type>Output</type><contentLength>19620630</contentLength><contentType>application/pdf</contentType><version>E-Thesis</version><cronfaStatus>true</cronfaStatus><embargoDate>2018-08-02T16:25:18.1030000</embargoDate><copyrightCorrect>false</copyrightCorrect></document></documents><OutputDurs/></rfc1807>
spelling 2018-08-02T16:24:30.2894046 v2 42738 2018-08-02 Objective assessment of speech intelligibility. 4561378b366645078603f8678a37bd0c NULL Wei Ming Liu Wei Ming Liu true true 2018-08-02 This thesis addresses the topic of objective speech intelligibility assessment. Speech intelligibility is becoming an important issue due most possibly to the rapid growth in digital communication systems in recent decades; as well as the increasing demand for security-based applications where intelligibility, rather than the overall quality, is the priority. Afterall, the loss of intelligibility means that communication does not exist. This research sets out to investigate the potential of automatic speech recognition (ASR) in intelligibility assessment, the motivation being the obvious link between word recognition and intelligibility. As a pre-cursor, quality measures are first considered since intelligibility is an attribute encompassed in overall quality. Here, 9 prominent quality measures including the state-of-the-art Perceptual Evaluation of Speech Quality (PESQ) are assessed. A large range of degradations are considered including additive noise and those introduced by coding and enhancement schemes. Experimental results show that apart from Weighted Spectral Slope (WSS), generally the quality scores from all other quality measures considered here correlate poorly with intelligibility. Poor correlations are observed especially when dealing with speech-like noises and degradations introduced by enhancement processes. ASR is then considered where various word recognition statistics, namely word accuracy, percentage correct, deletion, substitution and insertion are assessed as potential intelligibility measure. One critical contribution is the observation that there are links between different ASR statistics and different forms of degradation. Such links enable suitable statistics to be chosen for intelligibility assessment in different applications. In overall word accuracy from an ASR system trained on clean signals has the highest correlation with intelligibility. However, as is the case with quality measures, none of the ASR scores correlate well in the context of enhancement schemes since such processes are known to improve machine-based scores without necessarily improving intelligibility. This demonstrates the limitation of ASR in intelligibility assessment. As an extension to word modelling in ASR, one major contribution of this work relates to the novel use of a data-driven (DD) classifier in this context. The classifier is trained on intelligibility information and its output scores relate directly to intelligibility rather than indirectly through quality or ASR scores as in earlier attempts. A critical obstacle with the development of such a DD classifier is establishing the large amount of ground truth necessary for training. This leads to the next significant contribution, namely the proposal of a convenient strategy to generate potentially unlimited amounts of synthetic ground truth based on a well-supported hypothesis that speech processings rarely improve intelligibility. Subsequent contributions include the search for good features that could enhance classification accuracy. Scores given by quality measures and ASR are indicative of intelligibility hence could serve as potential features for the data-driven intelligibility classifier. Both are in investigated in this research and results show ASR-based features to be superior. A final contribution is a novel feature set based on the concept of anchor models where each anchor represents a chosen degradation. Signal intelligibility is characterised by the similarity between the degradation under test and a cohort of degradation anchors. The anchoring feature set leads to an average classification accuracy of 88% with synthetic ground truth and 82% with human ground truth evaluation sets. The latter compares favourably with 69% achieved by WSS (the best quality measure) and 68% by word accuracy from a clean-trained ASR (the best ASR-based measure) which are assessed on identical test sets. E-Thesis Computer science. 31 12 2008 2008-12-31 COLLEGE NANME Engineering COLLEGE CODE Swansea University Doctoral Ph.D 2018-08-02T16:24:30.2894046 2018-08-02T16:24:30.2894046 Faculty of Science and Engineering School of Engineering and Applied Sciences - Uncategorised Wei Ming Liu NULL 1 0042738-02082018162518.pdf 10807507.pdf 2018-08-02T16:25:18.1030000 Output 19620630 application/pdf E-Thesis true 2018-08-02T16:25:18.1030000 false
title Objective assessment of speech intelligibility.
spellingShingle Objective assessment of speech intelligibility.
Wei Ming Liu
title_short Objective assessment of speech intelligibility.
title_full Objective assessment of speech intelligibility.
title_fullStr Objective assessment of speech intelligibility.
title_full_unstemmed Objective assessment of speech intelligibility.
title_sort Objective assessment of speech intelligibility.
author_id_str_mv 4561378b366645078603f8678a37bd0c
author_id_fullname_str_mv 4561378b366645078603f8678a37bd0c_***_Wei Ming Liu
author Wei Ming Liu
author2 Wei Ming Liu
format E-Thesis
publishDate 2008
institution Swansea University
college_str Faculty of Science and Engineering
hierarchytype
hierarchy_top_id facultyofscienceandengineering
hierarchy_top_title Faculty of Science and Engineering
hierarchy_parent_id facultyofscienceandengineering
hierarchy_parent_title Faculty of Science and Engineering
department_str School of Engineering and Applied Sciences - Uncategorised{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Engineering and Applied Sciences - Uncategorised
document_store_str 1
active_str 0
description This thesis addresses the topic of objective speech intelligibility assessment. Speech intelligibility is becoming an important issue due most possibly to the rapid growth in digital communication systems in recent decades; as well as the increasing demand for security-based applications where intelligibility, rather than the overall quality, is the priority. Afterall, the loss of intelligibility means that communication does not exist. This research sets out to investigate the potential of automatic speech recognition (ASR) in intelligibility assessment, the motivation being the obvious link between word recognition and intelligibility. As a pre-cursor, quality measures are first considered since intelligibility is an attribute encompassed in overall quality. Here, 9 prominent quality measures including the state-of-the-art Perceptual Evaluation of Speech Quality (PESQ) are assessed. A large range of degradations are considered including additive noise and those introduced by coding and enhancement schemes. Experimental results show that apart from Weighted Spectral Slope (WSS), generally the quality scores from all other quality measures considered here correlate poorly with intelligibility. Poor correlations are observed especially when dealing with speech-like noises and degradations introduced by enhancement processes. ASR is then considered where various word recognition statistics, namely word accuracy, percentage correct, deletion, substitution and insertion are assessed as potential intelligibility measure. One critical contribution is the observation that there are links between different ASR statistics and different forms of degradation. Such links enable suitable statistics to be chosen for intelligibility assessment in different applications. In overall word accuracy from an ASR system trained on clean signals has the highest correlation with intelligibility. However, as is the case with quality measures, none of the ASR scores correlate well in the context of enhancement schemes since such processes are known to improve machine-based scores without necessarily improving intelligibility. This demonstrates the limitation of ASR in intelligibility assessment. As an extension to word modelling in ASR, one major contribution of this work relates to the novel use of a data-driven (DD) classifier in this context. The classifier is trained on intelligibility information and its output scores relate directly to intelligibility rather than indirectly through quality or ASR scores as in earlier attempts. A critical obstacle with the development of such a DD classifier is establishing the large amount of ground truth necessary for training. This leads to the next significant contribution, namely the proposal of a convenient strategy to generate potentially unlimited amounts of synthetic ground truth based on a well-supported hypothesis that speech processings rarely improve intelligibility. Subsequent contributions include the search for good features that could enhance classification accuracy. Scores given by quality measures and ASR are indicative of intelligibility hence could serve as potential features for the data-driven intelligibility classifier. Both are in investigated in this research and results show ASR-based features to be superior. A final contribution is a novel feature set based on the concept of anchor models where each anchor represents a chosen degradation. Signal intelligibility is characterised by the similarity between the degradation under test and a cohort of degradation anchors. The anchoring feature set leads to an average classification accuracy of 88% with synthetic ground truth and 82% with human ground truth evaluation sets. The latter compares favourably with 69% achieved by WSS (the best quality measure) and 68% by word accuracy from a clean-trained ASR (the best ASR-based measure) which are assessed on identical test sets.
published_date 2008-12-31T03:53:33Z
_version_ 1763752661231861760
score 11.012678