No Cover Image

Book Chapter 28 views 10 downloads

R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora / Ben Blamey; Tom Crick; Giles Oatley

Research and Development in Intelligent Systems XXIX, Pages: 207 - 212

Swansea University Author: Crick, Tom

DOI (Published version): 10.1007/978-1-4471-4739-8_16

Abstract

Binary sentiment classification, or sentiment analysis, is the task of computing the sentiment of a document, i.e. whether it contains broadly positive or negative opinions. The topic is well-studied, and the intuitive approach of using words as classification features is the basis of most technique...

Full description

Published in: Research and Development in Intelligent Systems XXIX
ISBN: 978-1-4471-4738-1 978-1-4471-4739-8
Published: Cambridge, UK Springer 2012
Online Access: https://link.springer.com/chapter/10.1007%2F978-1-4471-4739-8_16
URI: https://cronfa.swan.ac.uk/Record/cronfa43404
Tags: Add Tag
No Tags, Be the first to tag this record!
first_indexed 2018-08-14T15:01:05Z
last_indexed 2018-10-15T19:17:24Z
id cronfa43404
recordtype SURis
fullrecord <?xml version="1.0"?><rfc1807><datestamp>2018-10-15T14:56:34Z</datestamp><bib-version>v2</bib-version><id>43404</id><entry>2018-08-14</entry><title>R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora</title><alternativeTitle></alternativeTitle><author>Tom Crick</author><firstname>Tom</firstname><surname>Crick</surname><active>true</active><ORCID>0000-0001-5196-9389</ORCID><ethesisStudent>false</ethesisStudent><sid>200c66ef0fc55391f736f6e926fb4b99</sid><email>9971fd6d74987b78a0d7fce128f8c721</email><emailaddr>z93Ri4T5hwMLTfh+6XG11n2HZhUyFASdV1DFdgIIhKs=</emailaddr><date>2018-08-14</date><deptcode>EDUC</deptcode><abstract>Binary sentiment classification, or sentiment analysis, is the task of computing the sentiment of a document, i.e. whether it contains broadly positive or negative opinions. The topic is well-studied, and the intuitive approach of using words as classification features is the basis of most techniques documented in the literature. The alternative character n-gram language model has been applied successfully to a range of NLP tasks, but its effectiveness at sentiment classification seems to be under-investigated, and results are mixed. We present an investigation of the application of the character n-gram model to text classification of corpora from online social networks, the first such documented study, where text is known to be rich in so-called unnatural language, also introducing a novel corpus of Facebook photo comments. Despite hoping that the flexibility of the character n-gram approach would be well-suited to unnatural language phenomenon, we find little improvement over the baseline algorithms employing the word n-gram language model.</abstract><type>Chapter in book</type><journal>Research and Development in Intelligent Systems XXIX</journal><volume/><journalNumber/><paginationStart>207</paginationStart><paginationEnd>212</paginationEnd><publisher>Springer</publisher><placeOfPublication>Cambridge, UK</placeOfPublication><isbnPrint>978-1-4471-4738-1</isbnPrint><isbnElectronic>978-1-4471-4739-8</isbnElectronic><issnPrint/><issnElectronic/><keywords></keywords><publishedDay>11</publishedDay><publishedMonth>12</publishedMonth><publishedYear>2012</publishedYear><publishedDate>2012-12-11</publishedDate><doi>10.1007/978-1-4471-4739-8_16</doi><url>https://link.springer.com/chapter/10.1007%2F978-1-4471-4739-8_16</url><notes>32nd SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence (AI-2012)</notes><college>College of Arts and Humanities</college><department>School of Education</department><CollegeCode>CAAH</CollegeCode><DepartmentCode>EDUC</DepartmentCode><institution/><researchGroup>None</researchGroup><supervisor/><sponsorsfunders/><grantnumber/><degreelevel/><degreename></degreename><lastEdited>2018-10-15T14:56:34Z</lastEdited><Created>2018-08-14T15:45:23Z</Created><path><level id="1">College of Science</level><level id="2">Computer Science</level></path><authors><author><firstname>Ben</firstname><surname>Blamey</surname><orcid/><order>1</order></author><author><firstname>Tom</firstname><surname>Crick</surname><orcid>0000-0001-5196-9389</orcid><order>2</order></author><author><firstname>Giles</firstname><surname>Oatley</surname><orcid/><order>3</order></author></authors><documents><document><filename>0043404-12092018063905.pdf</filename><originalFilename>blamey-et-al-2012.pdf</originalFilename><uploaded>2018-09-12T06:39:05Z</uploaded><type>Output</type><contentLength>97959</contentLength><contentType>application/pdf</contentType><version>AM</version><cronfaStatus>true</cronfaStatus><action>Updated Copyright</action><actionDate>15/10/2018</actionDate><embargoDate>2018-09-12T00:00:00</embargoDate><documentNotes/><copyrightCorrect>true</copyrightCorrect><language>eng</language></document></documents></rfc1807>
spelling 2018-10-15T14:56:34Z v2 43404 2018-08-14 R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora Tom Crick Tom Crick true 0000-0001-5196-9389 false 200c66ef0fc55391f736f6e926fb4b99 9971fd6d74987b78a0d7fce128f8c721 z93Ri4T5hwMLTfh+6XG11n2HZhUyFASdV1DFdgIIhKs= 2018-08-14 EDUC Binary sentiment classification, or sentiment analysis, is the task of computing the sentiment of a document, i.e. whether it contains broadly positive or negative opinions. The topic is well-studied, and the intuitive approach of using words as classification features is the basis of most techniques documented in the literature. The alternative character n-gram language model has been applied successfully to a range of NLP tasks, but its effectiveness at sentiment classification seems to be under-investigated, and results are mixed. We present an investigation of the application of the character n-gram model to text classification of corpora from online social networks, the first such documented study, where text is known to be rich in so-called unnatural language, also introducing a novel corpus of Facebook photo comments. Despite hoping that the flexibility of the character n-gram approach would be well-suited to unnatural language phenomenon, we find little improvement over the baseline algorithms employing the word n-gram language model. Chapter in book Research and Development in Intelligent Systems XXIX 207 212 Springer Cambridge, UK 978-1-4471-4738-1 978-1-4471-4739-8 11 12 2012 2012-12-11 10.1007/978-1-4471-4739-8_16 https://link.springer.com/chapter/10.1007%2F978-1-4471-4739-8_16 32nd SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence (AI-2012) College of Arts and Humanities School of Education CAAH EDUC None 2018-10-15T14:56:34Z 2018-08-14T15:45:23Z College of Science Computer Science Ben Blamey 1 Tom Crick 0000-0001-5196-9389 2 Giles Oatley 3 0043404-12092018063905.pdf blamey-et-al-2012.pdf 2018-09-12T06:39:05Z Output 97959 application/pdf AM true Updated Copyright 15/10/2018 2018-09-12T00:00:00 true eng
title R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora
spellingShingle R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora
Crick, Tom
title_short R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora
title_full R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora
title_fullStr R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora
title_full_unstemmed R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora
title_sort R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora
author_id_str_mv 200c66ef0fc55391f736f6e926fb4b99
author_id_fullname_str_mv 200c66ef0fc55391f736f6e926fb4b99_***_Crick, Tom
author Crick, Tom
author2 Ben Blamey
Tom Crick
Giles Oatley
format Book Chapter
container_title Research and Development in Intelligent Systems XXIX
container_start_page 207
publishDate 2012
institution Swansea University
isbn 978-1-4471-4738-1
978-1-4471-4739-8
doi_str_mv 10.1007/978-1-4471-4739-8_16
publisher Springer
college_str College of Science
hierarchytype
hierarchy_top_id collegeofscience
hierarchy_top_title College of Science
hierarchy_parent_id collegeofscience
hierarchy_parent_title College of Science
department_str Computer Science{{{_:::_}}}College of Science{{{_:::_}}}Computer Science
url https://link.springer.com/chapter/10.1007%2F978-1-4471-4739-8_16
document_store_str 1
active_str 1
description Binary sentiment classification, or sentiment analysis, is the task of computing the sentiment of a document, i.e. whether it contains broadly positive or negative opinions. The topic is well-studied, and the intuitive approach of using words as classification features is the basis of most techniques documented in the literature. The alternative character n-gram language model has been applied successfully to a range of NLP tasks, but its effectiveness at sentiment classification seems to be under-investigated, and results are mixed. We present an investigation of the application of the character n-gram model to text classification of corpora from online social networks, the first such documented study, where text is known to be rich in so-called unnatural language, also introducing a novel corpus of Facebook photo comments. Despite hoping that the flexibility of the character n-gram approach would be well-suited to unnatural language phenomenon, we find little improvement over the baseline algorithms employing the word n-gram language model.
published_date 2012-12-11T22:20:15Z
_version_ 1639704444719333376
score 10.827766