No Cover Image

Journal article 169 views 111 downloads

Efficient Memory Arbitration in High-Level Synthesis From Multi-Threaded Code

Jianyi Cheng Orcid Logo, Shane Fleming, Yu Ting Chen, Jason Anderson, John Wickerson Orcid Logo, George A. Constantinides Orcid Logo

IEEE Transactions on Computers, Volume: 71, Issue: 4, Pages: 933 - 946

Swansea University Author: Shane Fleming

Abstract

High-level synthesis (HLS) is an increasingly popular method for generating hardware from a description written in a software language like C/C++. Traditionally, HLS tools have operated on sequential code, however, in recent years there has been a drive to synthesise multi-threaded code. In this con...

Full description

Published in: IEEE Transactions on Computers
ISSN: 0018-9340 1557-9956
Published: Institute of Electrical and Electronics Engineers (IEEE) 2022
Online Access: Check full text

URI: https://cronfa.swan.ac.uk/Record/cronfa56451
Tags: Add Tag
No Tags, Be the first to tag this record!
first_indexed 2021-03-16T09:53:44Z
last_indexed 2022-04-08T03:25:38Z
id cronfa56451
recordtype SURis
fullrecord <?xml version="1.0"?><rfc1807><datestamp>2022-04-07T12:46:33.2649454</datestamp><bib-version>v2</bib-version><id>56451</id><entry>2021-03-16</entry><title>Efficient Memory Arbitration in High-Level Synthesis From Multi-Threaded Code</title><swanseaauthors><author><sid>fe23ad3ebacc194b4f4c480fdde55b95</sid><firstname>Shane</firstname><surname>Fleming</surname><name>Shane Fleming</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2021-03-16</date><deptcode>FGSEN</deptcode><abstract>High-level synthesis (HLS) is an increasingly popular method for generating hardware from a description written in a software language like C/C++. Traditionally, HLS tools have operated on sequential code, however, in recent years there has been a drive to synthesise multi-threaded code. In this context, a major challenge facing HLS tools is how to automatically partition memory among parallel threads to fully exploit the bandwidth available on an FPGA device and minimise memory contention. Existingpartitioning approaches require inefficient arbitration circuitry to serialise accesses to each bank because they make conservative assumptions about which threads might access which memory banks. In this article, we design a static analysis that can prove certain memory banks are only accessed by certain threads, and use this analysis to simplify or even remove the arbiters while preserving correctness. We show how this analysis can be implemented using the Microsoft Boogie verifier on top of satisfiability modulo theories (SMT) solver, and propose a tool named EASY using automatic formal verification. Our work supports arbitrary input code with any irregular memory access patterns and indirect array addressing forms. We implement our approach in LLVM and integrate it into the LegUp HLS tool. For a set of typical application benchmarks our results have shown that EASY can achieve 0.13&#xD7;(avg. 0.43&#xD7;) of area and 1.64&#xD7;(avg. 1.28&#xD7;) of performance compared to the baseline, with little additional compilation time relative to the long time in hardware synthesis.</abstract><type>Journal Article</type><journal>IEEE Transactions on Computers</journal><volume>71</volume><journalNumber>4</journalNumber><paginationStart>933</paginationStart><paginationEnd>946</paginationEnd><publisher>Institute of Electrical and Electronics Engineers (IEEE)</publisher><placeOfPublication/><isbnPrint/><isbnElectronic/><issnPrint>0018-9340</issnPrint><issnElectronic>1557-9956</issnElectronic><keywords/><publishedDay>1</publishedDay><publishedMonth>4</publishedMonth><publishedYear>2022</publishedYear><publishedDate>2022-04-01</publishedDate><doi>10.1109/tc.2021.3066466</doi><url/><notes/><college>COLLEGE NANME</college><department>Science and Engineering - Faculty</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>FGSEN</DepartmentCode><institution>Swansea University</institution><apcterm/><lastEdited>2022-04-07T12:46:33.2649454</lastEdited><Created>2021-03-16T09:43:03.4419052</Created><path><level id="1">College of Science</level><level id="2">College of Science</level></path><authors><author><firstname>Jianyi</firstname><surname>Cheng</surname><orcid>0000-0003-2791-2555</orcid><order>1</order></author><author><firstname>Shane</firstname><surname>Fleming</surname><order>2</order></author><author><firstname>Yu Ting</firstname><surname>Chen</surname><order>3</order></author><author><firstname>Jason</firstname><surname>Anderson</surname><order>4</order></author><author><firstname>John</firstname><surname>Wickerson</surname><orcid>0000-0001-6735-5533</orcid><order>5</order></author><author><firstname>George A.</firstname><surname>Constantinides</surname><orcid>0000-0002-0201-310x</orcid><order>6</order></author></authors><documents><document><filename>56451__19590__b98976cce3e742f28f31716234b1d45c.pdf</filename><originalFilename>56451.pdf</originalFilename><uploaded>2021-03-29T15:30:17.6267430</uploaded><type>Output</type><contentLength>1370792</contentLength><contentType>application/pdf</contentType><version>Accepted Manuscript</version><cronfaStatus>true</cronfaStatus><copyrightCorrect>true</copyrightCorrect><language>eng</language></document></documents><OutputDurs/></rfc1807>
spelling 2022-04-07T12:46:33.2649454 v2 56451 2021-03-16 Efficient Memory Arbitration in High-Level Synthesis From Multi-Threaded Code fe23ad3ebacc194b4f4c480fdde55b95 Shane Fleming Shane Fleming true false 2021-03-16 FGSEN High-level synthesis (HLS) is an increasingly popular method for generating hardware from a description written in a software language like C/C++. Traditionally, HLS tools have operated on sequential code, however, in recent years there has been a drive to synthesise multi-threaded code. In this context, a major challenge facing HLS tools is how to automatically partition memory among parallel threads to fully exploit the bandwidth available on an FPGA device and minimise memory contention. Existingpartitioning approaches require inefficient arbitration circuitry to serialise accesses to each bank because they make conservative assumptions about which threads might access which memory banks. In this article, we design a static analysis that can prove certain memory banks are only accessed by certain threads, and use this analysis to simplify or even remove the arbiters while preserving correctness. We show how this analysis can be implemented using the Microsoft Boogie verifier on top of satisfiability modulo theories (SMT) solver, and propose a tool named EASY using automatic formal verification. Our work supports arbitrary input code with any irregular memory access patterns and indirect array addressing forms. We implement our approach in LLVM and integrate it into the LegUp HLS tool. For a set of typical application benchmarks our results have shown that EASY can achieve 0.13×(avg. 0.43×) of area and 1.64×(avg. 1.28×) of performance compared to the baseline, with little additional compilation time relative to the long time in hardware synthesis. Journal Article IEEE Transactions on Computers 71 4 933 946 Institute of Electrical and Electronics Engineers (IEEE) 0018-9340 1557-9956 1 4 2022 2022-04-01 10.1109/tc.2021.3066466 COLLEGE NANME Science and Engineering - Faculty COLLEGE CODE FGSEN Swansea University 2022-04-07T12:46:33.2649454 2021-03-16T09:43:03.4419052 College of Science College of Science Jianyi Cheng 0000-0003-2791-2555 1 Shane Fleming 2 Yu Ting Chen 3 Jason Anderson 4 John Wickerson 0000-0001-6735-5533 5 George A. Constantinides 0000-0002-0201-310x 6 56451__19590__b98976cce3e742f28f31716234b1d45c.pdf 56451.pdf 2021-03-29T15:30:17.6267430 Output 1370792 application/pdf Accepted Manuscript true true eng
title Efficient Memory Arbitration in High-Level Synthesis From Multi-Threaded Code
spellingShingle Efficient Memory Arbitration in High-Level Synthesis From Multi-Threaded Code
Shane Fleming
title_short Efficient Memory Arbitration in High-Level Synthesis From Multi-Threaded Code
title_full Efficient Memory Arbitration in High-Level Synthesis From Multi-Threaded Code
title_fullStr Efficient Memory Arbitration in High-Level Synthesis From Multi-Threaded Code
title_full_unstemmed Efficient Memory Arbitration in High-Level Synthesis From Multi-Threaded Code
title_sort Efficient Memory Arbitration in High-Level Synthesis From Multi-Threaded Code
author_id_str_mv fe23ad3ebacc194b4f4c480fdde55b95
author_id_fullname_str_mv fe23ad3ebacc194b4f4c480fdde55b95_***_Shane Fleming
author Shane Fleming
author2 Jianyi Cheng
Shane Fleming
Yu Ting Chen
Jason Anderson
John Wickerson
George A. Constantinides
format Journal article
container_title IEEE Transactions on Computers
container_volume 71
container_issue 4
container_start_page 933
publishDate 2022
institution Swansea University
issn 0018-9340
1557-9956
doi_str_mv 10.1109/tc.2021.3066466
publisher Institute of Electrical and Electronics Engineers (IEEE)
college_str College of Science
hierarchytype
hierarchy_top_id collegeofscience
hierarchy_top_title College of Science
hierarchy_parent_id collegeofscience
hierarchy_parent_title College of Science
department_str College of Science{{{_:::_}}}College of Science{{{_:::_}}}College of Science
document_store_str 1
active_str 0
description High-level synthesis (HLS) is an increasingly popular method for generating hardware from a description written in a software language like C/C++. Traditionally, HLS tools have operated on sequential code, however, in recent years there has been a drive to synthesise multi-threaded code. In this context, a major challenge facing HLS tools is how to automatically partition memory among parallel threads to fully exploit the bandwidth available on an FPGA device and minimise memory contention. Existingpartitioning approaches require inefficient arbitration circuitry to serialise accesses to each bank because they make conservative assumptions about which threads might access which memory banks. In this article, we design a static analysis that can prove certain memory banks are only accessed by certain threads, and use this analysis to simplify or even remove the arbiters while preserving correctness. We show how this analysis can be implemented using the Microsoft Boogie verifier on top of satisfiability modulo theories (SMT) solver, and propose a tool named EASY using automatic formal verification. Our work supports arbitrary input code with any irregular memory access patterns and indirect array addressing forms. We implement our approach in LLVM and integrate it into the LegUp HLS tool. For a set of typical application benchmarks our results have shown that EASY can achieve 0.13×(avg. 0.43×) of area and 1.64×(avg. 1.28×) of performance compared to the baseline, with little additional compilation time relative to the long time in hardware synthesis.
published_date 2022-04-01T04:12:02Z
_version_ 1737027719545225216
score 10.898474