R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora

Blamey, Ben; Crick, Tom; Oatley, Giles

doi:10.1007/978-1-4471-4739-8_16

Book chapter 1095 views 333 downloads

R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora

Ben Blamey, Tom Crick

, Giles Oatley

Research and Development in Intelligent Systems XXIX, Pages: 207 - 212

Swansea University Author: Tom Crick

PDF | Accepted Manuscript
Download (126.64KB)

DOI (Published version): 10.1007/978-1-4471-4739-8_16

Abstract

Binary sentiment classification, or sentiment analysis, is the task of computing the sentiment of a document, i.e. whether it contains broadly positive or negative opinions. The topic is well-studied, and the intuitive approach of using words as classification features is the basis of most technique...

Full description

Published in:	Research and Development in Intelligent Systems XXIX
ISBN:	978-1-4471-4738-1 978-1-4471-4739-8
Published:	Cambridge, UK Springer 2012
Online Access:	https://link.springer.com/chapter/10.1007%2F978-1-4471-4739-8_16
URI:	https://cronfa.swan.ac.uk/Record/cronfa43404

first_indexed	2018-08-14T15:01:05Z
last_indexed	2023-01-11T14:20:04Z
id	cronfa43404
recordtype	SURis
fullrecord	<?xml version="1.0"?><rfc1807><datestamp>2022-12-18T17:47:51.9290759</datestamp><bib-version>v2</bib-version><id>43404</id><entry>2018-08-14</entry><title>R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora</title><swanseaauthors><author><sid>200c66ef0fc55391f736f6e926fb4b99</sid><ORCID>0000-0001-5196-9389</ORCID><firstname>Tom</firstname><surname>Crick</surname><name>Tom Crick</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2018-08-14</date><deptcode>SOSS</deptcode><abstract>Binary sentiment classification, or sentiment analysis, is the task of computing the sentiment of a document, i.e. whether it contains broadly positive or negative opinions. The topic is well-studied, and the intuitive approach of using words as classification features is the basis of most techniques documented in the literature. The alternative character n-gram language model has been applied successfully to a range of NLP tasks, but its effectiveness at sentiment classification seems to be under-investigated, and results are mixed. We present an investigation of the application of the character n-gram model to text classification of corpora from online social networks, the first such documented study, where text is known to be rich in so-called unnatural language, also introducing a novel corpus of Facebook photo comments. Despite hoping that the flexibility of the character n-gram approach would be well-suited to unnatural language phenomenon, we find little improvement over the baseline algorithms employing the word n-gram language model.</abstract><type>Book chapter</type><journal>Research and Development in Intelligent Systems XXIX</journal><volume/><journalNumber/><paginationStart>207</paginationStart><paginationEnd>212</paginationEnd><publisher>Springer</publisher><placeOfPublication>Cambridge, UK</placeOfPublication><isbnPrint>978-1-4471-4738-1</isbnPrint><isbnElectronic>978-1-4471-4739-8</isbnElectronic><issnPrint/><issnElectronic/><keywords/><publishedDay>11</publishedDay><publishedMonth>12</publishedMonth><publishedYear>2012</publishedYear><publishedDate>2012-12-11</publishedDate><doi>10.1007/978-1-4471-4739-8_16</doi><url>https://link.springer.com/chapter/10.1007%2F978-1-4471-4739-8_16</url><notes>32nd SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence (AI-2012)</notes><college>COLLEGE NANME</college><department>Social Sciences School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>SOSS</DepartmentCode><institution>Swansea University</institution><apcterm/><funders/><projectreference/><lastEdited>2022-12-18T17:47:51.9290759</lastEdited><Created>2018-08-14T15:45:23.6969221</Created><path><level id="1">Faculty of Humanities and Social Sciences</level><level id="2">School of Social Sciences - Education and Childhood Studies</level></path><authors><author><firstname>Ben</firstname><surname>Blamey</surname><order>1</order></author><author><firstname>Tom</firstname><surname>Crick</surname><orcid>0000-0001-5196-9389</orcid><order>2</order></author><author><firstname>Giles</firstname><surname>Oatley</surname><order>3</order></author></authors><documents><document><filename>0043404-12092018063905.pdf</filename><originalFilename>blamey-et-al-2012.pdf</originalFilename><uploaded>2018-09-12T06:39:05.2830000</uploaded><type>Output</type><contentLength>97959</contentLength><contentType>application/pdf</contentType><version>Accepted Manuscript</version><cronfaStatus>true</cronfaStatus><embargoDate>2018-09-12T00:00:00.0000000</embargoDate><copyrightCorrect>true</copyrightCorrect><language>eng</language></document></documents><OutputDurs/></rfc1807>
spelling	2022-12-18T17:47:51.9290759 v2 43404 2018-08-14 R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora 200c66ef0fc55391f736f6e926fb4b99 0000-0001-5196-9389 Tom Crick Tom Crick true false 2018-08-14 SOSS Binary sentiment classification, or sentiment analysis, is the task of computing the sentiment of a document, i.e. whether it contains broadly positive or negative opinions. The topic is well-studied, and the intuitive approach of using words as classification features is the basis of most techniques documented in the literature. The alternative character n-gram language model has been applied successfully to a range of NLP tasks, but its effectiveness at sentiment classification seems to be under-investigated, and results are mixed. We present an investigation of the application of the character n-gram model to text classification of corpora from online social networks, the first such documented study, where text is known to be rich in so-called unnatural language, also introducing a novel corpus of Facebook photo comments. Despite hoping that the flexibility of the character n-gram approach would be well-suited to unnatural language phenomenon, we find little improvement over the baseline algorithms employing the word n-gram language model. Book chapter Research and Development in Intelligent Systems XXIX 207 212 Springer Cambridge, UK 978-1-4471-4738-1 978-1-4471-4739-8 11 12 2012 2012-12-11 10.1007/978-1-4471-4739-8_16 https://link.springer.com/chapter/10.1007%2F978-1-4471-4739-8_16 32nd SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence (AI-2012) COLLEGE NANME Social Sciences School COLLEGE CODE SOSS Swansea University 2022-12-18T17:47:51.9290759 2018-08-14T15:45:23.6969221 Faculty of Humanities and Social Sciences School of Social Sciences - Education and Childhood Studies Ben Blamey 1 Tom Crick 0000-0001-5196-9389 2 Giles Oatley 3 0043404-12092018063905.pdf blamey-et-al-2012.pdf 2018-09-12T06:39:05.2830000 Output 97959 application/pdf Accepted Manuscript true 2018-09-12T00:00:00.0000000 true eng
title	R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora
spellingShingle	R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora Tom Crick
title_short	R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora
title_full	R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora
title_fullStr	R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora
title_full_unstemmed	R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora
title_sort	R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora
author_id_str_mv	200c66ef0fc55391f736f6e926fb4b99
author_id_fullname_str_mv	200c66ef0fc55391f736f6e926fb4b99_***_Tom Crick
author	Tom Crick
author2	Ben Blamey Tom Crick Giles Oatley
format	Book chapter
container_title	Research and Development in Intelligent Systems XXIX
container_start_page	207
publishDate	2012
institution	Swansea University
isbn	978-1-4471-4738-1 978-1-4471-4739-8
doi_str_mv	10.1007/978-1-4471-4739-8_16
publisher	Springer
college_str	Faculty of Humanities and Social Sciences
hierarchytype
hierarchy_top_id	facultyofhumanitiesandsocialsciences
hierarchy_top_title	Faculty of Humanities and Social Sciences
hierarchy_parent_id	facultyofhumanitiesandsocialsciences
hierarchy_parent_title	Faculty of Humanities and Social Sciences
department_str	School of Social Sciences - Education and Childhood Studies{{{_:::_}}}Faculty of Humanities and Social Sciences{{{_:::_}}}School of Social Sciences - Education and Childhood Studies
url	https://link.springer.com/chapter/10.1007%2F978-1-4471-4739-8_16
document_store_str	1
active_str	0
description	Binary sentiment classification, or sentiment analysis, is the task of computing the sentiment of a document, i.e. whether it contains broadly positive or negative opinions. The topic is well-studied, and the intuitive approach of using words as classification features is the basis of most techniques documented in the literature. The alternative character n-gram language model has been applied successfully to a range of NLP tasks, but its effectiveness at sentiment classification seems to be under-investigated, and results are mixed. We present an investigation of the application of the character n-gram model to text classification of corpora from online social networks, the first such documented study, where text is known to be rich in so-called unnatural language, also introducing a novel corpus of Facebook photo comments. Despite hoping that the flexibility of the character n-gram approach would be well-suited to unnatural language phenomenon, we find little improvement over the baseline algorithms employing the word n-gram language model.
published_date	2012-12-11T04:24:47Z
_version_	1868575316556906496
score	11.109323

R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora

Similar Items