Journal article 492 views 48 downloads
Inferring Attention Shifts for Salient Instance Ranking
International Journal of Computer Vision, Volume: 132, Issue: 3, Pages: 964 - 986
Swansea University Authors: Avishek Siris, Gary Tam , Xianghua Xie , Rynson Lau
-
PDF | Version of Record
© The Author(s) 2023. This article is licensed under a Creative Commons Attribution 4.0 International License.
Download (4.23MB)
DOI (Published version): 10.1007/s11263-023-01906-7
Abstract
The human visual system has limited capacity in simultaneously processing multiple visual inputs. Consequently, humans rely on shifting their attention from one location to another. When viewing an image of complex scenes, psychology studies and behavioural observations show that humans prioritise a...
Published in: | International Journal of Computer Vision |
---|---|
ISSN: | 0920-5691 1573-1405 |
Published: |
Springer Science and Business Media LLC
2024
|
Online Access: |
Check full text
|
URI: | https://cronfa.swan.ac.uk/Record/cronfa64287 |
first_indexed |
2023-09-01T09:11:41Z |
---|---|
last_indexed |
2024-11-25T14:13:47Z |
id |
cronfa64287 |
recordtype |
SURis |
fullrecord |
<?xml version="1.0"?><rfc1807><datestamp>2024-07-11T15:13:54.6975541</datestamp><bib-version>v2</bib-version><id>64287</id><entry>2023-09-01</entry><title>Inferring Attention Shifts for Salient Instance Ranking</title><swanseaauthors><author><sid>896b738a2b485a166c052d94bca5fa68</sid><firstname>Avishek</firstname><surname>Siris</surname><name>Avishek Siris</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>e75a68e11a20e5f1da94ee6e28ff5e76</sid><ORCID>0000-0001-7387-5180</ORCID><firstname>Gary</firstname><surname>Tam</surname><name>Gary Tam</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>b334d40963c7a2f435f06d2c26c74e11</sid><ORCID>0000-0002-2701-8660</ORCID><firstname>Xianghua</firstname><surname>Xie</surname><name>Xianghua Xie</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>8d230434b6eadb1be5928241b0beecd0</sid><firstname>Rynson</firstname><surname>Lau</surname><name>Rynson Lau</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2023-09-01</date><deptcode>MACS</deptcode><abstract>The human visual system has limited capacity in simultaneously processing multiple visual inputs. Consequently, humans rely on shifting their attention from one location to another. When viewing an image of complex scenes, psychology studies and behavioural observations show that humans prioritise and sequentially shift attention among multiple visual stimuli. In this paper, we propose to predict the saliency rank of multiple objects by inferring human attention shift. We first construct a new large-scale salient object ranking dataset, with the saliency rank of objects defined by the order that an observer attends to these objects via attention shift. We then propose a new deep learning based model to leverage both bottom-up and top-down attention mechanisms for saliency rank prediction. Our model includes three novel modules: Spatial Mask Module (SMM), Selective Attention Module (SAM) and Salient Instance Edge Module (SIEM). SMM integrates bottom-up and semantic object properties to enhance contextual object features, from which SAM learns the dependencies between object features and image features for saliency reasoning. SIEM is designed to improve segmentation of salient objects, which helps further improve their rank predictions. Experimental results show that our proposed network achieves state-of-the-art performances on the salient object ranking task across multiple datasets.</abstract><type>Journal Article</type><journal>International Journal of Computer Vision</journal><volume>132</volume><journalNumber>3</journalNumber><paginationStart>964</paginationStart><paginationEnd>986</paginationEnd><publisher>Springer Science and Business Media LLC</publisher><placeOfPublication/><isbnPrint/><isbnElectronic/><issnPrint>0920-5691</issnPrint><issnElectronic>1573-1405</issnElectronic><keywords>Attention Shift, Saliency, Saliency Ranking, Salient Object Detection.</keywords><publishedDay>1</publishedDay><publishedMonth>3</publishedMonth><publishedYear>2024</publishedYear><publishedDate>2024-03-01</publishedDate><doi>10.1007/s11263-023-01906-7</doi><url/><notes/><college>COLLEGE NANME</college><department>Mathematics and Computer Science School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>MACS</DepartmentCode><institution>Swansea University</institution><apcterm>SU Library paid the OA fee (TA Institutional Deal)</apcterm><funders>This work was funded by a Swansea University Doctoral Training Postgraduate Research Scholarship 0301[164]. For the purpose of Open Access the author has applied a CC BY copyright licence to any Author Accepted Manuscript version arising from this submission. Jianbo Jiao is supported by the EPSRC Programme Grant Visual AI EP/T028572/1. Gary Tam is supported by the Royal Society grant IEC/NSFC/211159. This work was supported by the Research Grants Council of Hong Kong (Grant No.: 11205620), and a Strategic Research Grant from City University of Hong Kong (Ref.: 7005674).</funders><projectreference/><lastEdited>2024-07-11T15:13:54.6975541</lastEdited><Created>2023-09-01T10:01:59.2765646</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Mathematics and Computer Science - Computer Science</level></path><authors><author><firstname>Avishek</firstname><surname>Siris</surname><order>1</order></author><author><firstname>Jianbo</firstname><surname>Jiao</surname><order>2</order></author><author><firstname>Gary</firstname><surname>Tam</surname><orcid>0000-0001-7387-5180</orcid><order>3</order></author><author><firstname>Xianghua</firstname><surname>Xie</surname><orcid>0000-0002-2701-8660</orcid><order>4</order></author><author><firstname>Rynson</firstname><surname>Lau</surname><order>5</order></author></authors><documents><document><filename>64287__30034__a515b9124a2c47c1b4370771d3d23bc6.pdf</filename><originalFilename>64287.VOR.pdf</originalFilename><uploaded>2024-04-16T13:13:37.1189188</uploaded><type>Output</type><contentLength>4434052</contentLength><contentType>application/pdf</contentType><version>Version of Record</version><cronfaStatus>true</cronfaStatus><documentNotes>© The Author(s) 2023. This article is licensed under a Creative Commons Attribution 4.0 International License.</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language><licence>http://creativecommons.org/licenses/by/4.0/</licence></document></documents><OutputDurs/></rfc1807> |
spelling |
2024-07-11T15:13:54.6975541 v2 64287 2023-09-01 Inferring Attention Shifts for Salient Instance Ranking 896b738a2b485a166c052d94bca5fa68 Avishek Siris Avishek Siris true false e75a68e11a20e5f1da94ee6e28ff5e76 0000-0001-7387-5180 Gary Tam Gary Tam true false b334d40963c7a2f435f06d2c26c74e11 0000-0002-2701-8660 Xianghua Xie Xianghua Xie true false 8d230434b6eadb1be5928241b0beecd0 Rynson Lau Rynson Lau true false 2023-09-01 MACS The human visual system has limited capacity in simultaneously processing multiple visual inputs. Consequently, humans rely on shifting their attention from one location to another. When viewing an image of complex scenes, psychology studies and behavioural observations show that humans prioritise and sequentially shift attention among multiple visual stimuli. In this paper, we propose to predict the saliency rank of multiple objects by inferring human attention shift. We first construct a new large-scale salient object ranking dataset, with the saliency rank of objects defined by the order that an observer attends to these objects via attention shift. We then propose a new deep learning based model to leverage both bottom-up and top-down attention mechanisms for saliency rank prediction. Our model includes three novel modules: Spatial Mask Module (SMM), Selective Attention Module (SAM) and Salient Instance Edge Module (SIEM). SMM integrates bottom-up and semantic object properties to enhance contextual object features, from which SAM learns the dependencies between object features and image features for saliency reasoning. SIEM is designed to improve segmentation of salient objects, which helps further improve their rank predictions. Experimental results show that our proposed network achieves state-of-the-art performances on the salient object ranking task across multiple datasets. Journal Article International Journal of Computer Vision 132 3 964 986 Springer Science and Business Media LLC 0920-5691 1573-1405 Attention Shift, Saliency, Saliency Ranking, Salient Object Detection. 1 3 2024 2024-03-01 10.1007/s11263-023-01906-7 COLLEGE NANME Mathematics and Computer Science School COLLEGE CODE MACS Swansea University SU Library paid the OA fee (TA Institutional Deal) This work was funded by a Swansea University Doctoral Training Postgraduate Research Scholarship 0301[164]. For the purpose of Open Access the author has applied a CC BY copyright licence to any Author Accepted Manuscript version arising from this submission. Jianbo Jiao is supported by the EPSRC Programme Grant Visual AI EP/T028572/1. Gary Tam is supported by the Royal Society grant IEC/NSFC/211159. This work was supported by the Research Grants Council of Hong Kong (Grant No.: 11205620), and a Strategic Research Grant from City University of Hong Kong (Ref.: 7005674). 2024-07-11T15:13:54.6975541 2023-09-01T10:01:59.2765646 Faculty of Science and Engineering School of Mathematics and Computer Science - Computer Science Avishek Siris 1 Jianbo Jiao 2 Gary Tam 0000-0001-7387-5180 3 Xianghua Xie 0000-0002-2701-8660 4 Rynson Lau 5 64287__30034__a515b9124a2c47c1b4370771d3d23bc6.pdf 64287.VOR.pdf 2024-04-16T13:13:37.1189188 Output 4434052 application/pdf Version of Record true © The Author(s) 2023. This article is licensed under a Creative Commons Attribution 4.0 International License. true eng http://creativecommons.org/licenses/by/4.0/ |
title |
Inferring Attention Shifts for Salient Instance Ranking |
spellingShingle |
Inferring Attention Shifts for Salient Instance Ranking Avishek Siris Gary Tam Xianghua Xie Rynson Lau |
title_short |
Inferring Attention Shifts for Salient Instance Ranking |
title_full |
Inferring Attention Shifts for Salient Instance Ranking |
title_fullStr |
Inferring Attention Shifts for Salient Instance Ranking |
title_full_unstemmed |
Inferring Attention Shifts for Salient Instance Ranking |
title_sort |
Inferring Attention Shifts for Salient Instance Ranking |
author_id_str_mv |
896b738a2b485a166c052d94bca5fa68 e75a68e11a20e5f1da94ee6e28ff5e76 b334d40963c7a2f435f06d2c26c74e11 8d230434b6eadb1be5928241b0beecd0 |
author_id_fullname_str_mv |
896b738a2b485a166c052d94bca5fa68_***_Avishek Siris e75a68e11a20e5f1da94ee6e28ff5e76_***_Gary Tam b334d40963c7a2f435f06d2c26c74e11_***_Xianghua Xie 8d230434b6eadb1be5928241b0beecd0_***_Rynson Lau |
author |
Avishek Siris Gary Tam Xianghua Xie Rynson Lau |
author2 |
Avishek Siris Jianbo Jiao Gary Tam Xianghua Xie Rynson Lau |
format |
Journal article |
container_title |
International Journal of Computer Vision |
container_volume |
132 |
container_issue |
3 |
container_start_page |
964 |
publishDate |
2024 |
institution |
Swansea University |
issn |
0920-5691 1573-1405 |
doi_str_mv |
10.1007/s11263-023-01906-7 |
publisher |
Springer Science and Business Media LLC |
college_str |
Faculty of Science and Engineering |
hierarchytype |
|
hierarchy_top_id |
facultyofscienceandengineering |
hierarchy_top_title |
Faculty of Science and Engineering |
hierarchy_parent_id |
facultyofscienceandengineering |
hierarchy_parent_title |
Faculty of Science and Engineering |
department_str |
School of Mathematics and Computer Science - Computer Science{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Mathematics and Computer Science - Computer Science |
document_store_str |
1 |
active_str |
0 |
description |
The human visual system has limited capacity in simultaneously processing multiple visual inputs. Consequently, humans rely on shifting their attention from one location to another. When viewing an image of complex scenes, psychology studies and behavioural observations show that humans prioritise and sequentially shift attention among multiple visual stimuli. In this paper, we propose to predict the saliency rank of multiple objects by inferring human attention shift. We first construct a new large-scale salient object ranking dataset, with the saliency rank of objects defined by the order that an observer attends to these objects via attention shift. We then propose a new deep learning based model to leverage both bottom-up and top-down attention mechanisms for saliency rank prediction. Our model includes three novel modules: Spatial Mask Module (SMM), Selective Attention Module (SAM) and Salient Instance Edge Module (SIEM). SMM integrates bottom-up and semantic object properties to enhance contextual object features, from which SAM learns the dependencies between object features and image features for saliency reasoning. SIEM is designed to improve segmentation of salient objects, which helps further improve their rank predictions. Experimental results show that our proposed network achieves state-of-the-art performances on the salient object ranking task across multiple datasets. |
published_date |
2024-03-01T08:29:32Z |
_version_ |
1822118246184124416 |
score |
11.2797985 |