Conference Paper/Proceeding/Abstract 745 views 186 downloads
Using Runahead Execution to Hide Memory Latency in High Level Synthesis
Shane Fleming,
David B. Thomas
2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)
Swansea University Author: Shane Fleming
-
PDF | Accepted Manuscript
Download (2.72MB)
DOI (Published version): 10.1109/fccm.2017.33
Abstract
Reads and writes to global data in off-chip RAM can limit the performance achieved with HLS tools, as each access takes multiple cycles and usually blocks progress in the application state machine. This can be combated by using data prefetchers, which hide access time by predicting the next memory a...
Published in: | 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) |
---|---|
ISBN: | 978-1-5386-4038-8 978-1-5386-4037-1 |
Published: |
IEEE
2017
|
URI: | https://cronfa.swan.ac.uk/Record/cronfa57993 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
first_indexed |
2021-09-20T20:15:45Z |
---|---|
last_indexed |
2021-11-25T04:16:50Z |
id |
cronfa57993 |
recordtype |
SURis |
fullrecord |
<?xml version="1.0"?><rfc1807><datestamp>2021-11-24T16:38:09.7105760</datestamp><bib-version>v2</bib-version><id>57993</id><entry>2021-09-20</entry><title>Using Runahead Execution to Hide Memory Latency in High Level Synthesis</title><swanseaauthors><author><sid>fe23ad3ebacc194b4f4c480fdde55b95</sid><firstname>Shane</firstname><surname>Fleming</surname><name>Shane Fleming</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2021-09-20</date><deptcode>FGSEN</deptcode><abstract>Reads and writes to global data in off-chip RAM can limit the performance achieved with HLS tools, as each access takes multiple cycles and usually blocks progress in the application state machine. This can be combated by using data prefetchers, which hide access time by predicting the next memory access and loading it into a cache before it's required. Unfortunately, current prefetchers are only useful for memory accesses with known regular patterns, such as walking arrays, and are ineffective for those that use irregular patterns over application-specific data structures. In this work, we demonstrate prefetchers that are tailor-made for applications, even if they have irregular memory accesses. This is achieved through program slicing, a static analysis technique that extracts the memory structure of the input code and automatically constructs an application-specific prefetcher. Both our analysis and tool are fully automated and implemented as a new compiler flag in LegUp, an open source HLS tool. In this work we create a theoretical model showing that speedup must be between 1x and 2x, we also evaluate five benchmarks, achieving an average speedup of 1.38x with an average resource overhead of 1.15x.</abstract><type>Conference Paper/Proceeding/Abstract</type><journal>2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)</journal><volume/><journalNumber/><paginationStart/><paginationEnd/><publisher>IEEE</publisher><placeOfPublication/><isbnPrint>978-1-5386-4038-8</isbnPrint><isbnElectronic>978-1-5386-4037-1</isbnElectronic><issnPrint/><issnElectronic/><keywords/><publishedDay>3</publishedDay><publishedMonth>7</publishedMonth><publishedYear>2017</publishedYear><publishedDate>2017-07-03</publishedDate><doi>10.1109/fccm.2017.33</doi><url/><notes/><college>COLLEGE NANME</college><department>Science and Engineering - Faculty</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>FGSEN</DepartmentCode><institution>Swansea University</institution><apcterm>Another institution paid the OA fee</apcterm><funders>EPSRC</funders><lastEdited>2021-11-24T16:38:09.7105760</lastEdited><Created>2021-09-20T20:55:23.6695589</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Mathematics and Computer Science - Computer Science</level></path><authors><author><firstname>Shane</firstname><surname>Fleming</surname><order>1</order></author><author><firstname>David B.</firstname><surname>Thomas</surname><order>2</order></author></authors><documents><document><filename>57993__20949__dd2996edcd07412fb4de12d9a8f41c31.pdf</filename><originalFilename>relish_fccm2017.pdf</originalFilename><uploaded>2021-09-20T21:14:33.0074104</uploaded><type>Output</type><contentLength>2847092</contentLength><contentType>application/pdf</contentType><version>Accepted Manuscript</version><cronfaStatus>true</cronfaStatus><copyrightCorrect>true</copyrightCorrect><language>eng</language></document></documents><OutputDurs/></rfc1807> |
spelling |
2021-11-24T16:38:09.7105760 v2 57993 2021-09-20 Using Runahead Execution to Hide Memory Latency in High Level Synthesis fe23ad3ebacc194b4f4c480fdde55b95 Shane Fleming Shane Fleming true false 2021-09-20 FGSEN Reads and writes to global data in off-chip RAM can limit the performance achieved with HLS tools, as each access takes multiple cycles and usually blocks progress in the application state machine. This can be combated by using data prefetchers, which hide access time by predicting the next memory access and loading it into a cache before it's required. Unfortunately, current prefetchers are only useful for memory accesses with known regular patterns, such as walking arrays, and are ineffective for those that use irregular patterns over application-specific data structures. In this work, we demonstrate prefetchers that are tailor-made for applications, even if they have irregular memory accesses. This is achieved through program slicing, a static analysis technique that extracts the memory structure of the input code and automatically constructs an application-specific prefetcher. Both our analysis and tool are fully automated and implemented as a new compiler flag in LegUp, an open source HLS tool. In this work we create a theoretical model showing that speedup must be between 1x and 2x, we also evaluate five benchmarks, achieving an average speedup of 1.38x with an average resource overhead of 1.15x. Conference Paper/Proceeding/Abstract 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) IEEE 978-1-5386-4038-8 978-1-5386-4037-1 3 7 2017 2017-07-03 10.1109/fccm.2017.33 COLLEGE NANME Science and Engineering - Faculty COLLEGE CODE FGSEN Swansea University Another institution paid the OA fee EPSRC 2021-11-24T16:38:09.7105760 2021-09-20T20:55:23.6695589 Faculty of Science and Engineering School of Mathematics and Computer Science - Computer Science Shane Fleming 1 David B. Thomas 2 57993__20949__dd2996edcd07412fb4de12d9a8f41c31.pdf relish_fccm2017.pdf 2021-09-20T21:14:33.0074104 Output 2847092 application/pdf Accepted Manuscript true true eng |
title |
Using Runahead Execution to Hide Memory Latency in High Level Synthesis |
spellingShingle |
Using Runahead Execution to Hide Memory Latency in High Level Synthesis Shane Fleming |
title_short |
Using Runahead Execution to Hide Memory Latency in High Level Synthesis |
title_full |
Using Runahead Execution to Hide Memory Latency in High Level Synthesis |
title_fullStr |
Using Runahead Execution to Hide Memory Latency in High Level Synthesis |
title_full_unstemmed |
Using Runahead Execution to Hide Memory Latency in High Level Synthesis |
title_sort |
Using Runahead Execution to Hide Memory Latency in High Level Synthesis |
author_id_str_mv |
fe23ad3ebacc194b4f4c480fdde55b95 |
author_id_fullname_str_mv |
fe23ad3ebacc194b4f4c480fdde55b95_***_Shane Fleming |
author |
Shane Fleming |
author2 |
Shane Fleming David B. Thomas |
format |
Conference Paper/Proceeding/Abstract |
container_title |
2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) |
publishDate |
2017 |
institution |
Swansea University |
isbn |
978-1-5386-4038-8 978-1-5386-4037-1 |
doi_str_mv |
10.1109/fccm.2017.33 |
publisher |
IEEE |
college_str |
Faculty of Science and Engineering |
hierarchytype |
|
hierarchy_top_id |
facultyofscienceandengineering |
hierarchy_top_title |
Faculty of Science and Engineering |
hierarchy_parent_id |
facultyofscienceandengineering |
hierarchy_parent_title |
Faculty of Science and Engineering |
department_str |
School of Mathematics and Computer Science - Computer Science{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Mathematics and Computer Science - Computer Science |
document_store_str |
1 |
active_str |
0 |
description |
Reads and writes to global data in off-chip RAM can limit the performance achieved with HLS tools, as each access takes multiple cycles and usually blocks progress in the application state machine. This can be combated by using data prefetchers, which hide access time by predicting the next memory access and loading it into a cache before it's required. Unfortunately, current prefetchers are only useful for memory accesses with known regular patterns, such as walking arrays, and are ineffective for those that use irregular patterns over application-specific data structures. In this work, we demonstrate prefetchers that are tailor-made for applications, even if they have irregular memory accesses. This is achieved through program slicing, a static analysis technique that extracts the memory structure of the input code and automatically constructs an application-specific prefetcher. Both our analysis and tool are fully automated and implemented as a new compiler flag in LegUp, an open source HLS tool. In this work we create a theoretical model showing that speedup must be between 1x and 2x, we also evaluate five benchmarks, achieving an average speedup of 1.38x with an average resource overhead of 1.15x. |
published_date |
2017-07-03T04:14:09Z |
_version_ |
1763753957640896512 |
score |
11.017776 |