No Cover Image

Book chapter 532 views 148 downloads

R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora

Ben Blamey, Tom Crick Orcid Logo, Giles Oatley

Research and Development in Intelligent Systems XXIX, Pages: 207 - 212

Swansea University Author: Tom Crick Orcid Logo

DOI (Published version): 10.1007/978-1-4471-4739-8_16

Abstract

Binary sentiment classification, or sentiment analysis, is the task of computing the sentiment of a document, i.e. whether it contains broadly positive or negative opinions. The topic is well-studied, and the intuitive approach of using words as classification features is the basis of most technique...

Full description

Published in: Research and Development in Intelligent Systems XXIX
ISBN: 978-1-4471-4738-1 978-1-4471-4739-8
Published: Cambridge, UK Springer 2012
Online Access: https://link.springer.com/chapter/10.1007%2F978-1-4471-4739-8_16
URI: https://cronfa.swan.ac.uk/Record/cronfa43404
Tags: Add Tag
No Tags, Be the first to tag this record!
first_indexed 2018-08-14T15:01:05Z
last_indexed 2023-01-11T14:20:04Z
id cronfa43404
recordtype SURis
fullrecord <?xml version="1.0"?><rfc1807><datestamp>2022-12-18T17:47:51.9290759</datestamp><bib-version>v2</bib-version><id>43404</id><entry>2018-08-14</entry><title>R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora</title><swanseaauthors><author><sid>200c66ef0fc55391f736f6e926fb4b99</sid><ORCID>0000-0001-5196-9389</ORCID><firstname>Tom</firstname><surname>Crick</surname><name>Tom Crick</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2018-08-14</date><deptcode>EDUC</deptcode><abstract>Binary sentiment classification, or sentiment analysis, is the task of computing the sentiment of a document, i.e. whether it contains broadly positive or negative opinions. The topic is well-studied, and the intuitive approach of using words as classification features is the basis of most techniques documented in the literature. The alternative character n-gram language model has been applied successfully to a range of NLP tasks, but its effectiveness at sentiment classification seems to be under-investigated, and results are mixed. We present an investigation of the application of the character n-gram model to text classification of corpora from online social networks, the first such documented study, where text is known to be rich in so-called unnatural language, also introducing a novel corpus of Facebook photo comments. Despite hoping that the flexibility of the character n-gram approach would be well-suited to unnatural language phenomenon, we find little improvement over the baseline algorithms employing the word n-gram language model.</abstract><type>Book chapter</type><journal>Research and Development in Intelligent Systems XXIX</journal><volume/><journalNumber/><paginationStart>207</paginationStart><paginationEnd>212</paginationEnd><publisher>Springer</publisher><placeOfPublication>Cambridge, UK</placeOfPublication><isbnPrint>978-1-4471-4738-1</isbnPrint><isbnElectronic>978-1-4471-4739-8</isbnElectronic><issnPrint/><issnElectronic/><keywords/><publishedDay>11</publishedDay><publishedMonth>12</publishedMonth><publishedYear>2012</publishedYear><publishedDate>2012-12-11</publishedDate><doi>10.1007/978-1-4471-4739-8_16</doi><url>https://link.springer.com/chapter/10.1007%2F978-1-4471-4739-8_16</url><notes>32nd SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence (AI-2012)</notes><college>COLLEGE NANME</college><department>Education</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>EDUC</DepartmentCode><institution>Swansea University</institution><apcterm/><funders/><projectreference/><lastEdited>2022-12-18T17:47:51.9290759</lastEdited><Created>2018-08-14T15:45:23.6969221</Created><path><level id="1">Faculty of Humanities and Social Sciences</level><level id="2">School of Social Sciences - Education and Childhood Studies</level></path><authors><author><firstname>Ben</firstname><surname>Blamey</surname><order>1</order></author><author><firstname>Tom</firstname><surname>Crick</surname><orcid>0000-0001-5196-9389</orcid><order>2</order></author><author><firstname>Giles</firstname><surname>Oatley</surname><order>3</order></author></authors><documents><document><filename>0043404-12092018063905.pdf</filename><originalFilename>blamey-et-al-2012.pdf</originalFilename><uploaded>2018-09-12T06:39:05.2830000</uploaded><type>Output</type><contentLength>97959</contentLength><contentType>application/pdf</contentType><version>Accepted Manuscript</version><cronfaStatus>true</cronfaStatus><embargoDate>2018-09-12T00:00:00.0000000</embargoDate><copyrightCorrect>true</copyrightCorrect><language>eng</language></document></documents><OutputDurs/></rfc1807>
spelling 2022-12-18T17:47:51.9290759 v2 43404 2018-08-14 R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora 200c66ef0fc55391f736f6e926fb4b99 0000-0001-5196-9389 Tom Crick Tom Crick true false 2018-08-14 EDUC Binary sentiment classification, or sentiment analysis, is the task of computing the sentiment of a document, i.e. whether it contains broadly positive or negative opinions. The topic is well-studied, and the intuitive approach of using words as classification features is the basis of most techniques documented in the literature. The alternative character n-gram language model has been applied successfully to a range of NLP tasks, but its effectiveness at sentiment classification seems to be under-investigated, and results are mixed. We present an investigation of the application of the character n-gram model to text classification of corpora from online social networks, the first such documented study, where text is known to be rich in so-called unnatural language, also introducing a novel corpus of Facebook photo comments. Despite hoping that the flexibility of the character n-gram approach would be well-suited to unnatural language phenomenon, we find little improvement over the baseline algorithms employing the word n-gram language model. Book chapter Research and Development in Intelligent Systems XXIX 207 212 Springer Cambridge, UK 978-1-4471-4738-1 978-1-4471-4739-8 11 12 2012 2012-12-11 10.1007/978-1-4471-4739-8_16 https://link.springer.com/chapter/10.1007%2F978-1-4471-4739-8_16 32nd SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence (AI-2012) COLLEGE NANME Education COLLEGE CODE EDUC Swansea University 2022-12-18T17:47:51.9290759 2018-08-14T15:45:23.6969221 Faculty of Humanities and Social Sciences School of Social Sciences - Education and Childhood Studies Ben Blamey 1 Tom Crick 0000-0001-5196-9389 2 Giles Oatley 3 0043404-12092018063905.pdf blamey-et-al-2012.pdf 2018-09-12T06:39:05.2830000 Output 97959 application/pdf Accepted Manuscript true 2018-09-12T00:00:00.0000000 true eng
title R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora
spellingShingle R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora
Tom Crick
title_short R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora
title_full R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora
title_fullStr R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora
title_full_unstemmed R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora
title_sort R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora
author_id_str_mv 200c66ef0fc55391f736f6e926fb4b99
author_id_fullname_str_mv 200c66ef0fc55391f736f6e926fb4b99_***_Tom Crick
author Tom Crick
author2 Ben Blamey
Tom Crick
Giles Oatley
format Book chapter
container_title Research and Development in Intelligent Systems XXIX
container_start_page 207
publishDate 2012
institution Swansea University
isbn 978-1-4471-4738-1
978-1-4471-4739-8
doi_str_mv 10.1007/978-1-4471-4739-8_16
publisher Springer
college_str Faculty of Humanities and Social Sciences
hierarchytype
hierarchy_top_id facultyofhumanitiesandsocialsciences
hierarchy_top_title Faculty of Humanities and Social Sciences
hierarchy_parent_id facultyofhumanitiesandsocialsciences
hierarchy_parent_title Faculty of Humanities and Social Sciences
department_str School of Social Sciences - Education and Childhood Studies{{{_:::_}}}Faculty of Humanities and Social Sciences{{{_:::_}}}School of Social Sciences - Education and Childhood Studies
url https://link.springer.com/chapter/10.1007%2F978-1-4471-4739-8_16
document_store_str 1
active_str 0
description Binary sentiment classification, or sentiment analysis, is the task of computing the sentiment of a document, i.e. whether it contains broadly positive or negative opinions. The topic is well-studied, and the intuitive approach of using words as classification features is the basis of most techniques documented in the literature. The alternative character n-gram language model has been applied successfully to a range of NLP tasks, but its effectiveness at sentiment classification seems to be under-investigated, and results are mixed. We present an investigation of the application of the character n-gram model to text classification of corpora from online social networks, the first such documented study, where text is known to be rich in so-called unnatural language, also introducing a novel corpus of Facebook photo comments. Despite hoping that the flexibility of the character n-gram approach would be well-suited to unnatural language phenomenon, we find little improvement over the baseline algorithms employing the word n-gram language model.
published_date 2012-12-11T03:54:40Z
_version_ 1763752731034517504
score 10.950466