Book chapter 1007 views 300 downloads
R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora
Research and Development in Intelligent Systems XXIX, Pages: 207 - 212
Swansea University Author:
Tom Crick
-
PDF | Accepted Manuscript
Download (126.64KB)
DOI (Published version): 10.1007/978-1-4471-4739-8_16
Abstract
Binary sentiment classification, or sentiment analysis, is the task of computing the sentiment of a document, i.e. whether it contains broadly positive or negative opinions. The topic is well-studied, and the intuitive approach of using words as classification features is the basis of most technique...
| Published in: | Research and Development in Intelligent Systems XXIX |
|---|---|
| ISBN: | 978-1-4471-4738-1 978-1-4471-4739-8 |
| Published: |
Cambridge, UK
Springer
2012
|
| Online Access: |
https://link.springer.com/chapter/10.1007%2F978-1-4471-4739-8_16 |
| URI: | https://cronfa.swan.ac.uk/Record/cronfa43404 |
| first_indexed |
2018-08-14T15:01:05Z |
|---|---|
| last_indexed |
2023-01-11T14:20:04Z |
| id |
cronfa43404 |
| recordtype |
SURis |
| fullrecord |
<?xml version="1.0"?><rfc1807><datestamp>2022-12-18T17:47:51.9290759</datestamp><bib-version>v2</bib-version><id>43404</id><entry>2018-08-14</entry><title>R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora</title><swanseaauthors><author><sid>200c66ef0fc55391f736f6e926fb4b99</sid><ORCID>0000-0001-5196-9389</ORCID><firstname>Tom</firstname><surname>Crick</surname><name>Tom Crick</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2018-08-14</date><deptcode>SOSS</deptcode><abstract>Binary sentiment classification, or sentiment analysis, is the task of computing the sentiment of a document, i.e. whether it contains broadly positive or negative opinions. The topic is well-studied, and the intuitive approach of using words as classification features is the basis of most techniques documented in the literature. The alternative character n-gram language model has been applied successfully to a range of NLP tasks, but its effectiveness at sentiment classification seems to be under-investigated, and results are mixed. We present an investigation of the application of the character n-gram model to text classification of corpora from online social networks, the first such documented study, where text is known to be rich in so-called unnatural language, also introducing a novel corpus of Facebook photo comments. Despite hoping that the flexibility of the character n-gram approach would be well-suited to unnatural language phenomenon, we find little improvement over the baseline algorithms employing the word n-gram language model.</abstract><type>Book chapter</type><journal>Research and Development in Intelligent Systems XXIX</journal><volume/><journalNumber/><paginationStart>207</paginationStart><paginationEnd>212</paginationEnd><publisher>Springer</publisher><placeOfPublication>Cambridge, UK</placeOfPublication><isbnPrint>978-1-4471-4738-1</isbnPrint><isbnElectronic>978-1-4471-4739-8</isbnElectronic><issnPrint/><issnElectronic/><keywords/><publishedDay>11</publishedDay><publishedMonth>12</publishedMonth><publishedYear>2012</publishedYear><publishedDate>2012-12-11</publishedDate><doi>10.1007/978-1-4471-4739-8_16</doi><url>https://link.springer.com/chapter/10.1007%2F978-1-4471-4739-8_16</url><notes>32nd SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence (AI-2012)</notes><college>COLLEGE NANME</college><department>Social Sciences School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>SOSS</DepartmentCode><institution>Swansea University</institution><apcterm/><funders/><projectreference/><lastEdited>2022-12-18T17:47:51.9290759</lastEdited><Created>2018-08-14T15:45:23.6969221</Created><path><level id="1">Faculty of Humanities and Social Sciences</level><level id="2">School of Social Sciences - Education and Childhood Studies</level></path><authors><author><firstname>Ben</firstname><surname>Blamey</surname><order>1</order></author><author><firstname>Tom</firstname><surname>Crick</surname><orcid>0000-0001-5196-9389</orcid><order>2</order></author><author><firstname>Giles</firstname><surname>Oatley</surname><order>3</order></author></authors><documents><document><filename>0043404-12092018063905.pdf</filename><originalFilename>blamey-et-al-2012.pdf</originalFilename><uploaded>2018-09-12T06:39:05.2830000</uploaded><type>Output</type><contentLength>97959</contentLength><contentType>application/pdf</contentType><version>Accepted Manuscript</version><cronfaStatus>true</cronfaStatus><embargoDate>2018-09-12T00:00:00.0000000</embargoDate><copyrightCorrect>true</copyrightCorrect><language>eng</language></document></documents><OutputDurs/></rfc1807> |
| spelling |
2022-12-18T17:47:51.9290759 v2 43404 2018-08-14 R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora 200c66ef0fc55391f736f6e926fb4b99 0000-0001-5196-9389 Tom Crick Tom Crick true false 2018-08-14 SOSS Binary sentiment classification, or sentiment analysis, is the task of computing the sentiment of a document, i.e. whether it contains broadly positive or negative opinions. The topic is well-studied, and the intuitive approach of using words as classification features is the basis of most techniques documented in the literature. The alternative character n-gram language model has been applied successfully to a range of NLP tasks, but its effectiveness at sentiment classification seems to be under-investigated, and results are mixed. We present an investigation of the application of the character n-gram model to text classification of corpora from online social networks, the first such documented study, where text is known to be rich in so-called unnatural language, also introducing a novel corpus of Facebook photo comments. Despite hoping that the flexibility of the character n-gram approach would be well-suited to unnatural language phenomenon, we find little improvement over the baseline algorithms employing the word n-gram language model. Book chapter Research and Development in Intelligent Systems XXIX 207 212 Springer Cambridge, UK 978-1-4471-4738-1 978-1-4471-4739-8 11 12 2012 2012-12-11 10.1007/978-1-4471-4739-8_16 https://link.springer.com/chapter/10.1007%2F978-1-4471-4739-8_16 32nd SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence (AI-2012) COLLEGE NANME Social Sciences School COLLEGE CODE SOSS Swansea University 2022-12-18T17:47:51.9290759 2018-08-14T15:45:23.6969221 Faculty of Humanities and Social Sciences School of Social Sciences - Education and Childhood Studies Ben Blamey 1 Tom Crick 0000-0001-5196-9389 2 Giles Oatley 3 0043404-12092018063905.pdf blamey-et-al-2012.pdf 2018-09-12T06:39:05.2830000 Output 97959 application/pdf Accepted Manuscript true 2018-09-12T00:00:00.0000000 true eng |
| title |
R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora |
| spellingShingle |
R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora Tom Crick |
| title_short |
R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora |
| title_full |
R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora |
| title_fullStr |
R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora |
| title_full_unstemmed |
R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora |
| title_sort |
R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora |
| author_id_str_mv |
200c66ef0fc55391f736f6e926fb4b99 |
| author_id_fullname_str_mv |
200c66ef0fc55391f736f6e926fb4b99_***_Tom Crick |
| author |
Tom Crick |
| author2 |
Ben Blamey Tom Crick Giles Oatley |
| format |
Book chapter |
| container_title |
Research and Development in Intelligent Systems XXIX |
| container_start_page |
207 |
| publishDate |
2012 |
| institution |
Swansea University |
| isbn |
978-1-4471-4738-1 978-1-4471-4739-8 |
| doi_str_mv |
10.1007/978-1-4471-4739-8_16 |
| publisher |
Springer |
| college_str |
Faculty of Humanities and Social Sciences |
| hierarchytype |
|
| hierarchy_top_id |
facultyofhumanitiesandsocialsciences |
| hierarchy_top_title |
Faculty of Humanities and Social Sciences |
| hierarchy_parent_id |
facultyofhumanitiesandsocialsciences |
| hierarchy_parent_title |
Faculty of Humanities and Social Sciences |
| department_str |
School of Social Sciences - Education and Childhood Studies{{{_:::_}}}Faculty of Humanities and Social Sciences{{{_:::_}}}School of Social Sciences - Education and Childhood Studies |
| url |
https://link.springer.com/chapter/10.1007%2F978-1-4471-4739-8_16 |
| document_store_str |
1 |
| active_str |
0 |
| description |
Binary sentiment classification, or sentiment analysis, is the task of computing the sentiment of a document, i.e. whether it contains broadly positive or negative opinions. The topic is well-studied, and the intuitive approach of using words as classification features is the basis of most techniques documented in the literature. The alternative character n-gram language model has been applied successfully to a range of NLP tasks, but its effectiveness at sentiment classification seems to be under-investigated, and results are mixed. We present an investigation of the application of the character n-gram model to text classification of corpora from online social networks, the first such documented study, where text is known to be rich in so-called unnatural language, also introducing a novel corpus of Facebook photo comments. Despite hoping that the flexibility of the character n-gram approach would be well-suited to unnatural language phenomenon, we find little improvement over the baseline algorithms employing the word n-gram language model. |
| published_date |
2012-12-11T04:28:25Z |
| _version_ |
1857435893151825920 |
| score |
11.09784 |

