Book chapter 532 views 148 downloads
R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora
Research and Development in Intelligent Systems XXIX, Pages: 207 - 212
Swansea University Author:
Tom Crick
-
PDF | Accepted Manuscript
Download (126.64KB)
DOI (Published version): 10.1007/978-1-4471-4739-8_16
Abstract
Binary sentiment classification, or sentiment analysis, is the task of computing the sentiment of a document, i.e. whether it contains broadly positive or negative opinions. The topic is well-studied, and the intuitive approach of using words as classification features is the basis of most technique...
Published in: | Research and Development in Intelligent Systems XXIX |
---|---|
ISBN: | 978-1-4471-4738-1 978-1-4471-4739-8 |
Published: |
Cambridge, UK
Springer
2012
|
Online Access: |
https://link.springer.com/chapter/10.1007%2F978-1-4471-4739-8_16 |
URI: | https://cronfa.swan.ac.uk/Record/cronfa43404 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
first_indexed |
2018-08-14T15:01:05Z |
---|---|
last_indexed |
2023-01-11T14:20:04Z |
id |
cronfa43404 |
recordtype |
SURis |
fullrecord |
<?xml version="1.0"?><rfc1807><datestamp>2022-12-18T17:47:51.9290759</datestamp><bib-version>v2</bib-version><id>43404</id><entry>2018-08-14</entry><title>R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora</title><swanseaauthors><author><sid>200c66ef0fc55391f736f6e926fb4b99</sid><ORCID>0000-0001-5196-9389</ORCID><firstname>Tom</firstname><surname>Crick</surname><name>Tom Crick</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2018-08-14</date><deptcode>EDUC</deptcode><abstract>Binary sentiment classification, or sentiment analysis, is the task of computing the sentiment of a document, i.e. whether it contains broadly positive or negative opinions. The topic is well-studied, and the intuitive approach of using words as classification features is the basis of most techniques documented in the literature. The alternative character n-gram language model has been applied successfully to a range of NLP tasks, but its effectiveness at sentiment classification seems to be under-investigated, and results are mixed. We present an investigation of the application of the character n-gram model to text classification of corpora from online social networks, the first such documented study, where text is known to be rich in so-called unnatural language, also introducing a novel corpus of Facebook photo comments. Despite hoping that the flexibility of the character n-gram approach would be well-suited to unnatural language phenomenon, we find little improvement over the baseline algorithms employing the word n-gram language model.</abstract><type>Book chapter</type><journal>Research and Development in Intelligent Systems XXIX</journal><volume/><journalNumber/><paginationStart>207</paginationStart><paginationEnd>212</paginationEnd><publisher>Springer</publisher><placeOfPublication>Cambridge, UK</placeOfPublication><isbnPrint>978-1-4471-4738-1</isbnPrint><isbnElectronic>978-1-4471-4739-8</isbnElectronic><issnPrint/><issnElectronic/><keywords/><publishedDay>11</publishedDay><publishedMonth>12</publishedMonth><publishedYear>2012</publishedYear><publishedDate>2012-12-11</publishedDate><doi>10.1007/978-1-4471-4739-8_16</doi><url>https://link.springer.com/chapter/10.1007%2F978-1-4471-4739-8_16</url><notes>32nd SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence (AI-2012)</notes><college>COLLEGE NANME</college><department>Education</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>EDUC</DepartmentCode><institution>Swansea University</institution><apcterm/><funders/><projectreference/><lastEdited>2022-12-18T17:47:51.9290759</lastEdited><Created>2018-08-14T15:45:23.6969221</Created><path><level id="1">Faculty of Humanities and Social Sciences</level><level id="2">School of Social Sciences - Education and Childhood Studies</level></path><authors><author><firstname>Ben</firstname><surname>Blamey</surname><order>1</order></author><author><firstname>Tom</firstname><surname>Crick</surname><orcid>0000-0001-5196-9389</orcid><order>2</order></author><author><firstname>Giles</firstname><surname>Oatley</surname><order>3</order></author></authors><documents><document><filename>0043404-12092018063905.pdf</filename><originalFilename>blamey-et-al-2012.pdf</originalFilename><uploaded>2018-09-12T06:39:05.2830000</uploaded><type>Output</type><contentLength>97959</contentLength><contentType>application/pdf</contentType><version>Accepted Manuscript</version><cronfaStatus>true</cronfaStatus><embargoDate>2018-09-12T00:00:00.0000000</embargoDate><copyrightCorrect>true</copyrightCorrect><language>eng</language></document></documents><OutputDurs/></rfc1807> |
spelling |
2022-12-18T17:47:51.9290759 v2 43404 2018-08-14 R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora 200c66ef0fc55391f736f6e926fb4b99 0000-0001-5196-9389 Tom Crick Tom Crick true false 2018-08-14 EDUC Binary sentiment classification, or sentiment analysis, is the task of computing the sentiment of a document, i.e. whether it contains broadly positive or negative opinions. The topic is well-studied, and the intuitive approach of using words as classification features is the basis of most techniques documented in the literature. The alternative character n-gram language model has been applied successfully to a range of NLP tasks, but its effectiveness at sentiment classification seems to be under-investigated, and results are mixed. We present an investigation of the application of the character n-gram model to text classification of corpora from online social networks, the first such documented study, where text is known to be rich in so-called unnatural language, also introducing a novel corpus of Facebook photo comments. Despite hoping that the flexibility of the character n-gram approach would be well-suited to unnatural language phenomenon, we find little improvement over the baseline algorithms employing the word n-gram language model. Book chapter Research and Development in Intelligent Systems XXIX 207 212 Springer Cambridge, UK 978-1-4471-4738-1 978-1-4471-4739-8 11 12 2012 2012-12-11 10.1007/978-1-4471-4739-8_16 https://link.springer.com/chapter/10.1007%2F978-1-4471-4739-8_16 32nd SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence (AI-2012) COLLEGE NANME Education COLLEGE CODE EDUC Swansea University 2022-12-18T17:47:51.9290759 2018-08-14T15:45:23.6969221 Faculty of Humanities and Social Sciences School of Social Sciences - Education and Childhood Studies Ben Blamey 1 Tom Crick 0000-0001-5196-9389 2 Giles Oatley 3 0043404-12092018063905.pdf blamey-et-al-2012.pdf 2018-09-12T06:39:05.2830000 Output 97959 application/pdf Accepted Manuscript true 2018-09-12T00:00:00.0000000 true eng |
title |
R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora |
spellingShingle |
R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora Tom Crick |
title_short |
R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora |
title_full |
R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora |
title_fullStr |
R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora |
title_full_unstemmed |
R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora |
title_sort |
R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora |
author_id_str_mv |
200c66ef0fc55391f736f6e926fb4b99 |
author_id_fullname_str_mv |
200c66ef0fc55391f736f6e926fb4b99_***_Tom Crick |
author |
Tom Crick |
author2 |
Ben Blamey Tom Crick Giles Oatley |
format |
Book chapter |
container_title |
Research and Development in Intelligent Systems XXIX |
container_start_page |
207 |
publishDate |
2012 |
institution |
Swansea University |
isbn |
978-1-4471-4738-1 978-1-4471-4739-8 |
doi_str_mv |
10.1007/978-1-4471-4739-8_16 |
publisher |
Springer |
college_str |
Faculty of Humanities and Social Sciences |
hierarchytype |
|
hierarchy_top_id |
facultyofhumanitiesandsocialsciences |
hierarchy_top_title |
Faculty of Humanities and Social Sciences |
hierarchy_parent_id |
facultyofhumanitiesandsocialsciences |
hierarchy_parent_title |
Faculty of Humanities and Social Sciences |
department_str |
School of Social Sciences - Education and Childhood Studies{{{_:::_}}}Faculty of Humanities and Social Sciences{{{_:::_}}}School of Social Sciences - Education and Childhood Studies |
url |
https://link.springer.com/chapter/10.1007%2F978-1-4471-4739-8_16 |
document_store_str |
1 |
active_str |
0 |
description |
Binary sentiment classification, or sentiment analysis, is the task of computing the sentiment of a document, i.e. whether it contains broadly positive or negative opinions. The topic is well-studied, and the intuitive approach of using words as classification features is the basis of most techniques documented in the literature. The alternative character n-gram language model has been applied successfully to a range of NLP tasks, but its effectiveness at sentiment classification seems to be under-investigated, and results are mixed. We present an investigation of the application of the character n-gram model to text classification of corpora from online social networks, the first such documented study, where text is known to be rich in so-called unnatural language, also introducing a novel corpus of Facebook photo comments. Despite hoping that the flexibility of the character n-gram approach would be well-suited to unnatural language phenomenon, we find little improvement over the baseline algorithms employing the word n-gram language model. |
published_date |
2012-12-11T03:54:40Z |
_version_ |
1763752731034517504 |
score |
10.950466 |