No Cover Image

Journal article 761 views 106 downloads

Re-Engineered Word Embeddings for Improved Document-Level Sentiment Analysis

Scott Yang Orcid Logo, Farzin Deravi

Applied Sciences, Volume: 12, Issue: 18, Start page: 9287

Swansea University Author: Scott Yang Orcid Logo

  • 61289.VOR.pdf

    PDF | Version of Record

    Copyright: © 2022 by the authors. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.

    Download (1.77MB)

Check full text

DOI (Published version): 10.3390/app12189287

Abstract

In this paper, a novel re-engineering mechanism for the generation of word embeddings is proposed for document-level sentiment analysis. Current approaches to sentiment analysis often integrate feature engineering with classification, without optimizing the feature vectors explicitly. Engineering fe...

Full description

Published in: Applied Sciences
ISSN: 2076-3417
Published: MDPI AG 2022
Online Access: Check full text

URI: https://cronfa.swan.ac.uk/Record/cronfa61289
Tags: Add Tag
No Tags, Be the first to tag this record!
first_indexed 2022-09-20T15:34:18Z
last_indexed 2023-01-13T19:21:58Z
id cronfa61289
recordtype SURis
fullrecord <?xml version="1.0"?><rfc1807><datestamp>2022-10-12T14:26:59.5698930</datestamp><bib-version>v2</bib-version><id>61289</id><entry>2022-09-20</entry><title>Re-Engineered Word Embeddings for Improved Document-Level Sentiment Analysis</title><swanseaauthors><author><sid>81dc663ca0e68c60908d35b1d2ec3a9b</sid><ORCID>0000-0002-6618-7483</ORCID><firstname>Scott</firstname><surname>Yang</surname><name>Scott Yang</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2022-09-20</date><deptcode>SCS</deptcode><abstract>In this paper, a novel re-engineering mechanism for the generation of word embeddings is proposed for document-level sentiment analysis. Current approaches to sentiment analysis often integrate feature engineering with classification, without optimizing the feature vectors explicitly. Engineering feature vectors to match the data between the training set and query sample as proposed in this paper could be a promising way for boosting the classification performance in machine learning applications. The proposed mechanism is designed to re-engineer the feature components from a set of embedding vectors for greatly increased between-class separation, hence better leveraging the informative content of the documents. The proposed mechanism was evaluated using four public benchmarking datasets for both two-way and five-way semantic classifications. The resulting embeddings have demonstrated substantially improved performance for a range of sentiment analysis tasks. Tests using all the four datasets achieved by far the best classification results compared with the state-of-the-art.</abstract><type>Journal Article</type><journal>Applied Sciences</journal><volume>12</volume><journalNumber>18</journalNumber><paginationStart>9287</paginationStart><paginationEnd/><publisher>MDPI AG</publisher><placeOfPublication/><isbnPrint/><isbnElectronic/><issnPrint/><issnElectronic>2076-3417</issnElectronic><keywords>sentiment analysis; semantic classification; feature re-engineering; NLP</keywords><publishedDay>16</publishedDay><publishedMonth>9</publishedMonth><publishedYear>2022</publishedYear><publishedDate>2022-09-16</publishedDate><doi>10.3390/app12189287</doi><url/><notes/><college>COLLEGE NANME</college><department>Computer Science</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>SCS</DepartmentCode><institution>Swansea University</institution><apcterm>SU College/Department paid the OA fee</apcterm><funders>Swansea University</funders><projectreference/><lastEdited>2022-10-12T14:26:59.5698930</lastEdited><Created>2022-09-20T16:28:03.8981891</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Mathematics and Computer Science - Computer Science</level></path><authors><author><firstname>Scott</firstname><surname>Yang</surname><orcid>0000-0002-6618-7483</orcid><order>1</order></author><author><firstname>Farzin</firstname><surname>Deravi</surname><order>2</order></author></authors><documents><document><filename>61289__25165__0dbe035591024cd1b167a0c610d0441d.pdf</filename><originalFilename>61289.VOR.pdf</originalFilename><uploaded>2022-09-20T16:32:20.4680138</uploaded><type>Output</type><contentLength>1861110</contentLength><contentType>application/pdf</contentType><version>Version of Record</version><cronfaStatus>true</cronfaStatus><documentNotes>Copyright: &#xA9; 2022 by the authors. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language><licence>https://creativecommons.org/licenses/by/4.0/</licence></document></documents><OutputDurs/></rfc1807>
spelling 2022-10-12T14:26:59.5698930 v2 61289 2022-09-20 Re-Engineered Word Embeddings for Improved Document-Level Sentiment Analysis 81dc663ca0e68c60908d35b1d2ec3a9b 0000-0002-6618-7483 Scott Yang Scott Yang true false 2022-09-20 SCS In this paper, a novel re-engineering mechanism for the generation of word embeddings is proposed for document-level sentiment analysis. Current approaches to sentiment analysis often integrate feature engineering with classification, without optimizing the feature vectors explicitly. Engineering feature vectors to match the data between the training set and query sample as proposed in this paper could be a promising way for boosting the classification performance in machine learning applications. The proposed mechanism is designed to re-engineer the feature components from a set of embedding vectors for greatly increased between-class separation, hence better leveraging the informative content of the documents. The proposed mechanism was evaluated using four public benchmarking datasets for both two-way and five-way semantic classifications. The resulting embeddings have demonstrated substantially improved performance for a range of sentiment analysis tasks. Tests using all the four datasets achieved by far the best classification results compared with the state-of-the-art. Journal Article Applied Sciences 12 18 9287 MDPI AG 2076-3417 sentiment analysis; semantic classification; feature re-engineering; NLP 16 9 2022 2022-09-16 10.3390/app12189287 COLLEGE NANME Computer Science COLLEGE CODE SCS Swansea University SU College/Department paid the OA fee Swansea University 2022-10-12T14:26:59.5698930 2022-09-20T16:28:03.8981891 Faculty of Science and Engineering School of Mathematics and Computer Science - Computer Science Scott Yang 0000-0002-6618-7483 1 Farzin Deravi 2 61289__25165__0dbe035591024cd1b167a0c610d0441d.pdf 61289.VOR.pdf 2022-09-20T16:32:20.4680138 Output 1861110 application/pdf Version of Record true Copyright: © 2022 by the authors. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. true eng https://creativecommons.org/licenses/by/4.0/
title Re-Engineered Word Embeddings for Improved Document-Level Sentiment Analysis
spellingShingle Re-Engineered Word Embeddings for Improved Document-Level Sentiment Analysis
Scott Yang
title_short Re-Engineered Word Embeddings for Improved Document-Level Sentiment Analysis
title_full Re-Engineered Word Embeddings for Improved Document-Level Sentiment Analysis
title_fullStr Re-Engineered Word Embeddings for Improved Document-Level Sentiment Analysis
title_full_unstemmed Re-Engineered Word Embeddings for Improved Document-Level Sentiment Analysis
title_sort Re-Engineered Word Embeddings for Improved Document-Level Sentiment Analysis
author_id_str_mv 81dc663ca0e68c60908d35b1d2ec3a9b
author_id_fullname_str_mv 81dc663ca0e68c60908d35b1d2ec3a9b_***_Scott Yang
author Scott Yang
author2 Scott Yang
Farzin Deravi
format Journal article
container_title Applied Sciences
container_volume 12
container_issue 18
container_start_page 9287
publishDate 2022
institution Swansea University
issn 2076-3417
doi_str_mv 10.3390/app12189287
publisher MDPI AG
college_str Faculty of Science and Engineering
hierarchytype
hierarchy_top_id facultyofscienceandengineering
hierarchy_top_title Faculty of Science and Engineering
hierarchy_parent_id facultyofscienceandengineering
hierarchy_parent_title Faculty of Science and Engineering
department_str School of Mathematics and Computer Science - Computer Science{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Mathematics and Computer Science - Computer Science
document_store_str 1
active_str 0
description In this paper, a novel re-engineering mechanism for the generation of word embeddings is proposed for document-level sentiment analysis. Current approaches to sentiment analysis often integrate feature engineering with classification, without optimizing the feature vectors explicitly. Engineering feature vectors to match the data between the training set and query sample as proposed in this paper could be a promising way for boosting the classification performance in machine learning applications. The proposed mechanism is designed to re-engineer the feature components from a set of embedding vectors for greatly increased between-class separation, hence better leveraging the informative content of the documents. The proposed mechanism was evaluated using four public benchmarking datasets for both two-way and five-way semantic classifications. The resulting embeddings have demonstrated substantially improved performance for a range of sentiment analysis tasks. Tests using all the four datasets achieved by far the best classification results compared with the state-of-the-art.
published_date 2022-09-16T04:20:01Z
_version_ 1763754326834020352
score 11.035634