Journal article 169 views 111 downloads
Efficient Memory Arbitration in High-Level Synthesis From Multi-Threaded Code
IEEE Transactions on Computers, Volume: 71, Issue: 4, Pages: 933 - 946
Swansea University Author: Shane Fleming
PDF | Accepted ManuscriptDownload (1.31MB)
High-level synthesis (HLS) is an increasingly popular method for generating hardware from a description written in a software language like C/C++. Traditionally, HLS tools have operated on sequential code, however, in recent years there has been a drive to synthesise multi-threaded code. In this con...
|Published in:||IEEE Transactions on Computers|
Institute of Electrical and Electronics Engineers (IEEE)
Check full text
No Tags, Be the first to tag this record!
High-level synthesis (HLS) is an increasingly popular method for generating hardware from a description written in a software language like C/C++. Traditionally, HLS tools have operated on sequential code, however, in recent years there has been a drive to synthesise multi-threaded code. In this context, a major challenge facing HLS tools is how to automatically partition memory among parallel threads to fully exploit the bandwidth available on an FPGA device and minimise memory contention. Existingpartitioning approaches require inefficient arbitration circuitry to serialise accesses to each bank because they make conservative assumptions about which threads might access which memory banks. In this article, we design a static analysis that can prove certain memory banks are only accessed by certain threads, and use this analysis to simplify or even remove the arbiters while preserving correctness. We show how this analysis can be implemented using the Microsoft Boogie verifier on top of satisfiability modulo theories (SMT) solver, and propose a tool named EASY using automatic formal verification. Our work supports arbitrary input code with any irregular memory access patterns and indirect array addressing forms. We implement our approach in LLVM and integrate it into the LegUp HLS tool. For a set of typical application benchmarks our results have shown that EASY can achieve 0.13×(avg. 0.43×) of area and 1.64×(avg. 1.28×) of performance compared to the baseline, with little additional compilation time relative to the long time in hardware synthesis.
College of Science