Best of Both Worlds: Making High Accuracy Non-incremental Transformer-based Disfluency Detection Incremental

Rohanian, Morteza; Hough, Julian

doi:10.18653/v1/2021.acl-long.286

Conference Paper/Proceeding/Abstract 674 views

Best of Both Worlds: Making High Accuracy Non-incremental Transformer-based Disfluency Detection Incremental

Morteza Rohanian, Julian Hough

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Volume: 1 (Long Papers)

Swansea University Author: Julian Hough

Full text not available from this repository: check for access using links below.

DOI (Published version): 10.18653/v1/2021.acl-long.286

Abstract

While Transformer-based text classifiers pre-trained on large volumes of text have yielded significant improvements on a wide range of computational linguistics tasks, their implementations have been unsuitable for live incremental processing thus far, operating only on the level of complete sentenc...

Full description

Published in:	Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing
ISBN:	978-1-954085-52-7
Published:	Stroudsburg, PA, USA Association for Computational Linguistics 2021
URI:	https://cronfa.swan.ac.uk/Record/cronfa64934

first_indexed	2023-11-07T22:32:07Z
last_indexed	2024-11-25T14:15:02Z
id	cronfa64934
recordtype	SURis
fullrecord	<?xml version="1.0"?><rfc1807><datestamp>2024-03-12T14:12:11.4450515</datestamp><bib-version>v2</bib-version><id>64934</id><entry>2023-11-07</entry><title>Best of Both Worlds: Making High Accuracy Non-incremental Transformer-based Disfluency Detection Incremental</title><swanseaauthors><author><sid>082d773ae261d2bbf49434dd2608ab40</sid><ORCID>0000-0002-4345-6759</ORCID><firstname>Julian</firstname><surname>Hough</surname><name>Julian Hough</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2023-11-07</date><deptcode>MACS</deptcode><abstract>While Transformer-based text classifiers pre-trained on large volumes of text have yielded significant improvements on a wide range of computational linguistics tasks, their implementations have been unsuitable for live incremental processing thus far, operating only on the level of complete sentence inputs. We address the challenge of introducing methods for word-by-word left-to-right incremental processing to Transformers such as BERT, models without an intrinsic sense of linear order. We modify the training method and live decoding of non-incremental models to detect speech disfluencies with minimum latency and without pre-segmentation of dialogue acts. We experiment with several decoding methods to predict the rightward context of the word currently being processed using a GPT-2 language model and apply a BERT-based disfluency detector to sequences, including predicted words. We show our method of incrementalising Transformers maintains most of their high non-incremental performance while operating strictly incrementally. We also evaluate our models’ incremental performance to establish the trade-off between incremental performance and final performance, using different prediction strategies. We apply our system to incremental speech recognition results as they arrive into a live system and achieve state-of-the-art results in this setting.</abstract><type>Conference Paper/Proceeding/Abstract</type><journal>Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing</journal><volume>1 (Long Papers)</volume><journalNumber/><paginationStart/><paginationEnd/><publisher>Association for Computational Linguistics</publisher><placeOfPublication>Stroudsburg, PA, USA</placeOfPublication><isbnPrint/><isbnElectronic>978-1-954085-52-7</isbnElectronic><issnPrint/><issnElectronic/><keywords/><publishedDay>1</publishedDay><publishedMonth>8</publishedMonth><publishedYear>2021</publishedYear><publishedDate>2021-08-01</publishedDate><doi>10.18653/v1/2021.acl-long.286</doi><url/><notes/><college>COLLEGE NANME</college><department>Mathematics and Computer Science School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>MACS</DepartmentCode><institution>Swansea University</institution><apcterm/><funders/><projectreference/><lastEdited>2024-03-12T14:12:11.4450515</lastEdited><Created>2023-11-07T22:09:55.0021775</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Mathematics and Computer Science - Computer Science</level></path><authors><author><firstname>Morteza</firstname><surname>Rohanian</surname><order>1</order></author><author><firstname>Julian</firstname><surname>Hough</surname><orcid>0000-0002-4345-6759</orcid><order>2</order></author></authors><documents/><OutputDurs/></rfc1807>
spelling	2024-03-12T14:12:11.4450515 v2 64934 2023-11-07 Best of Both Worlds: Making High Accuracy Non-incremental Transformer-based Disfluency Detection Incremental 082d773ae261d2bbf49434dd2608ab40 0000-0002-4345-6759 Julian Hough Julian Hough true false 2023-11-07 MACS While Transformer-based text classifiers pre-trained on large volumes of text have yielded significant improvements on a wide range of computational linguistics tasks, their implementations have been unsuitable for live incremental processing thus far, operating only on the level of complete sentence inputs. We address the challenge of introducing methods for word-by-word left-to-right incremental processing to Transformers such as BERT, models without an intrinsic sense of linear order. We modify the training method and live decoding of non-incremental models to detect speech disfluencies with minimum latency and without pre-segmentation of dialogue acts. We experiment with several decoding methods to predict the rightward context of the word currently being processed using a GPT-2 language model and apply a BERT-based disfluency detector to sequences, including predicted words. We show our method of incrementalising Transformers maintains most of their high non-incremental performance while operating strictly incrementally. We also evaluate our models’ incremental performance to establish the trade-off between incremental performance and final performance, using different prediction strategies. We apply our system to incremental speech recognition results as they arrive into a live system and achieve state-of-the-art results in this setting. Conference Paper/Proceeding/Abstract Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing 1 (Long Papers) Association for Computational Linguistics Stroudsburg, PA, USA 978-1-954085-52-7 1 8 2021 2021-08-01 10.18653/v1/2021.acl-long.286 COLLEGE NANME Mathematics and Computer Science School COLLEGE CODE MACS Swansea University 2024-03-12T14:12:11.4450515 2023-11-07T22:09:55.0021775 Faculty of Science and Engineering School of Mathematics and Computer Science - Computer Science Morteza Rohanian 1 Julian Hough 0000-0002-4345-6759 2
title	Best of Both Worlds: Making High Accuracy Non-incremental Transformer-based Disfluency Detection Incremental
spellingShingle	Best of Both Worlds: Making High Accuracy Non-incremental Transformer-based Disfluency Detection Incremental Julian Hough
title_short	Best of Both Worlds: Making High Accuracy Non-incremental Transformer-based Disfluency Detection Incremental
title_full	Best of Both Worlds: Making High Accuracy Non-incremental Transformer-based Disfluency Detection Incremental
title_fullStr	Best of Both Worlds: Making High Accuracy Non-incremental Transformer-based Disfluency Detection Incremental
title_full_unstemmed	Best of Both Worlds: Making High Accuracy Non-incremental Transformer-based Disfluency Detection Incremental
title_sort	Best of Both Worlds: Making High Accuracy Non-incremental Transformer-based Disfluency Detection Incremental
author_id_str_mv	082d773ae261d2bbf49434dd2608ab40
author_id_fullname_str_mv	082d773ae261d2bbf49434dd2608ab40_***_Julian Hough
author	Julian Hough
author2	Morteza Rohanian Julian Hough
format	Conference Paper/Proceeding/Abstract
container_title	Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing
container_volume	1 (Long Papers)
publishDate	2021
institution	Swansea University
isbn	978-1-954085-52-7
doi_str_mv	10.18653/v1/2021.acl-long.286
publisher	Association for Computational Linguistics
college_str	Faculty of Science and Engineering
hierarchytype
hierarchy_top_id	facultyofscienceandengineering
hierarchy_top_title	Faculty of Science and Engineering
hierarchy_parent_id	facultyofscienceandengineering
hierarchy_parent_title	Faculty of Science and Engineering
department_str	School of Mathematics and Computer Science - Computer Science{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Mathematics and Computer Science - Computer Science
document_store_str	0
active_str	0
description	While Transformer-based text classifiers pre-trained on large volumes of text have yielded significant improvements on a wide range of computational linguistics tasks, their implementations have been unsuitable for live incremental processing thus far, operating only on the level of complete sentence inputs. We address the challenge of introducing methods for word-by-word left-to-right incremental processing to Transformers such as BERT, models without an intrinsic sense of linear order. We modify the training method and live decoding of non-incremental models to detect speech disfluencies with minimum latency and without pre-segmentation of dialogue acts. We experiment with several decoding methods to predict the rightward context of the word currently being processed using a GPT-2 language model and apply a BERT-based disfluency detector to sequences, including predicted words. We show our method of incrementalising Transformers maintains most of their high non-incremental performance while operating strictly incrementally. We also evaluate our models’ incremental performance to establish the trade-off between incremental performance and final performance, using different prediction strategies. We apply our system to incremental speech recognition results as they arrive into a live system and achieve state-of-the-art results in this setting.
published_date	2021-08-01T07:13:58Z
_version_	1850742132490371072
score	11.08895

Best of Both Worlds: Making High Accuracy Non-incremental Transformer-based Disfluency Detection Incremental

Similar Items