Conference Paper/Proceeding/Abstract 1276 views 369 downloads
General hardware multicasting for fine-grained message-passing architectures
2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), Pages: 126 - 133
Swansea University Author: Shane Fleming
-
PDF | Accepted Manuscript
Download (229.88KB)
DOI (Published version): 10.1109/pdp52278.2021.00028
Abstract
Manycore architectures are increasingly favouring message-passing or partitioned global address spaces (PGAS) over cache coherency for reasons of power efficiency and scalability. However, in the absence of cache coherency, there can be a lack of hardware support for one-to-many communication patter...
| Published in: | 2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) |
|---|---|
| ISBN: | 978-1-6654-4764-5 978-1-6654-1455-5 |
| ISSN: | 1066-6192 2377-5750 |
| Published: |
IEEE
2021
|
| Online Access: |
Check full text
|
| URI: | https://cronfa.swan.ac.uk/Record/cronfa56452 |
| first_indexed |
2021-03-16T10:02:35Z |
|---|---|
| last_indexed |
2025-10-14T07:00:38Z |
| id |
cronfa56452 |
| recordtype |
SURis |
| fullrecord |
<?xml version="1.0"?><rfc1807><datestamp>2025-10-13T16:50:04.1954130</datestamp><bib-version>v2</bib-version><id>56452</id><entry>2021-03-16</entry><title>General hardware multicasting for fine-grained message-passing architectures</title><swanseaauthors><author><sid>fe23ad3ebacc194b4f4c480fdde55b95</sid><firstname>Shane</firstname><surname>Fleming</surname><name>Shane Fleming</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2021-03-16</date><deptcode>MACS</deptcode><abstract>Manycore architectures are increasingly favouring message-passing or partitioned global address spaces (PGAS) over cache coherency for reasons of power efficiency and scalability. However, in the absence of cache coherency, there can be a lack of hardware support for one-to-many communication patterns, which are prevalent in someapplication domains. To address this, we present new hardware primitives for multicast communication in rack-scale manycore systems. These primitives guarantee delivery to both colocated and distributed destinations, and can capture large unstructured communication patterns precisely. As a result, reliable multicast transfers among any number of software tasks, connected in any topology, can be fully offloaded to hardware. We implement the new primitives in a research platform consisting of 50K RISC-V threads distributed over 48 FPGAs, and demonstrate significant performance benefits on a range of applications expressed using a high-level vertex-centric programming model.</abstract><type>Conference Paper/Proceeding/Abstract</type><journal>2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)</journal><volume/><journalNumber/><paginationStart>126</paginationStart><paginationEnd>133</paginationEnd><publisher>IEEE</publisher><placeOfPublication/><isbnPrint>978-1-6654-4764-5</isbnPrint><isbnElectronic>978-1-6654-1455-5</isbnElectronic><issnPrint>1066-6192</issnPrint><issnElectronic>2377-5750</issnElectronic><keywords>Scalability, Computer architecture, Multicast communication, System recovery, Hardware, Software, Topology</keywords><publishedDay>1</publishedDay><publishedMonth>3</publishedMonth><publishedYear>2021</publishedYear><publishedDate>2021-03-01</publishedDate><doi>10.1109/pdp52278.2021.00028</doi><url/><notes/><college>COLLEGE NANME</college><department>Mathematics and Computer Science School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>MACS</DepartmentCode><institution>Swansea University</institution><apcterm>Not Required</apcterm><funders>This work was supported by UK EPSRC grant EP/N031768/1 (POETS project).</funders><projectreference/><lastEdited>2025-10-13T16:50:04.1954130</lastEdited><Created>2021-03-16T09:57:33.4741858</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Mathematics and Computer Science - Computer Science</level></path><authors><author><firstname>Matthew</firstname><surname>Naylor</surname><order>1</order></author><author><firstname>Simon W.</firstname><surname>Moore</surname><order>2</order></author><author><firstname>David</firstname><surname>Thomas</surname><order>3</order></author><author><firstname>Jonathan R.</firstname><surname>Beaumont</surname><order>4</order></author><author><firstname>Shane</firstname><surname>Fleming</surname><order>5</order></author><author><firstname>Mark</firstname><surname>Vousden</surname><order>6</order></author><author><firstname>A. Theodore</firstname><surname>Markettos</surname><order>7</order></author><author><firstname>Thomas</firstname><surname>Bytheway</surname><order>8</order></author><author><firstname>Andrew</firstname><surname>Brown</surname><order>9</order></author></authors><documents><document><filename>56452__19490__b5dc3b7b98bd4556a097a04cab8d08ce.pdf</filename><originalFilename>pdp2021-mcast-draft.pdf</originalFilename><uploaded>2021-03-16T10:01:48.8252523</uploaded><type>Output</type><contentLength>235400</contentLength><contentType>application/pdf</contentType><version>Accepted Manuscript</version><cronfaStatus>true</cronfaStatus><copyrightCorrect>true</copyrightCorrect><language>eng</language></document></documents><OutputDurs/></rfc1807> |
| spelling |
2025-10-13T16:50:04.1954130 v2 56452 2021-03-16 General hardware multicasting for fine-grained message-passing architectures fe23ad3ebacc194b4f4c480fdde55b95 Shane Fleming Shane Fleming true false 2021-03-16 MACS Manycore architectures are increasingly favouring message-passing or partitioned global address spaces (PGAS) over cache coherency for reasons of power efficiency and scalability. However, in the absence of cache coherency, there can be a lack of hardware support for one-to-many communication patterns, which are prevalent in someapplication domains. To address this, we present new hardware primitives for multicast communication in rack-scale manycore systems. These primitives guarantee delivery to both colocated and distributed destinations, and can capture large unstructured communication patterns precisely. As a result, reliable multicast transfers among any number of software tasks, connected in any topology, can be fully offloaded to hardware. We implement the new primitives in a research platform consisting of 50K RISC-V threads distributed over 48 FPGAs, and demonstrate significant performance benefits on a range of applications expressed using a high-level vertex-centric programming model. Conference Paper/Proceeding/Abstract 2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) 126 133 IEEE 978-1-6654-4764-5 978-1-6654-1455-5 1066-6192 2377-5750 Scalability, Computer architecture, Multicast communication, System recovery, Hardware, Software, Topology 1 3 2021 2021-03-01 10.1109/pdp52278.2021.00028 COLLEGE NANME Mathematics and Computer Science School COLLEGE CODE MACS Swansea University Not Required This work was supported by UK EPSRC grant EP/N031768/1 (POETS project). 2025-10-13T16:50:04.1954130 2021-03-16T09:57:33.4741858 Faculty of Science and Engineering School of Mathematics and Computer Science - Computer Science Matthew Naylor 1 Simon W. Moore 2 David Thomas 3 Jonathan R. Beaumont 4 Shane Fleming 5 Mark Vousden 6 A. Theodore Markettos 7 Thomas Bytheway 8 Andrew Brown 9 56452__19490__b5dc3b7b98bd4556a097a04cab8d08ce.pdf pdp2021-mcast-draft.pdf 2021-03-16T10:01:48.8252523 Output 235400 application/pdf Accepted Manuscript true true eng |
| title |
General hardware multicasting for fine-grained message-passing architectures |
| spellingShingle |
General hardware multicasting for fine-grained message-passing architectures Shane Fleming |
| title_short |
General hardware multicasting for fine-grained message-passing architectures |
| title_full |
General hardware multicasting for fine-grained message-passing architectures |
| title_fullStr |
General hardware multicasting for fine-grained message-passing architectures |
| title_full_unstemmed |
General hardware multicasting for fine-grained message-passing architectures |
| title_sort |
General hardware multicasting for fine-grained message-passing architectures |
| author_id_str_mv |
fe23ad3ebacc194b4f4c480fdde55b95 |
| author_id_fullname_str_mv |
fe23ad3ebacc194b4f4c480fdde55b95_***_Shane Fleming |
| author |
Shane Fleming |
| author2 |
Matthew Naylor Simon W. Moore David Thomas Jonathan R. Beaumont Shane Fleming Mark Vousden A. Theodore Markettos Thomas Bytheway Andrew Brown |
| format |
Conference Paper/Proceeding/Abstract |
| container_title |
2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) |
| container_start_page |
126 |
| publishDate |
2021 |
| institution |
Swansea University |
| isbn |
978-1-6654-4764-5 978-1-6654-1455-5 |
| issn |
1066-6192 2377-5750 |
| doi_str_mv |
10.1109/pdp52278.2021.00028 |
| publisher |
IEEE |
| college_str |
Faculty of Science and Engineering |
| hierarchytype |
|
| hierarchy_top_id |
facultyofscienceandengineering |
| hierarchy_top_title |
Faculty of Science and Engineering |
| hierarchy_parent_id |
facultyofscienceandengineering |
| hierarchy_parent_title |
Faculty of Science and Engineering |
| department_str |
School of Mathematics and Computer Science - Computer Science{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Mathematics and Computer Science - Computer Science |
| document_store_str |
1 |
| active_str |
0 |
| description |
Manycore architectures are increasingly favouring message-passing or partitioned global address spaces (PGAS) over cache coherency for reasons of power efficiency and scalability. However, in the absence of cache coherency, there can be a lack of hardware support for one-to-many communication patterns, which are prevalent in someapplication domains. To address this, we present new hardware primitives for multicast communication in rack-scale manycore systems. These primitives guarantee delivery to both colocated and distributed destinations, and can capture large unstructured communication patterns precisely. As a result, reliable multicast transfers among any number of software tasks, connected in any topology, can be fully offloaded to hardware. We implement the new primitives in a research platform consisting of 50K RISC-V threads distributed over 48 FPGAs, and demonstrate significant performance benefits on a range of applications expressed using a high-level vertex-centric programming model. |
| published_date |
2021-03-01T04:52:48Z |
| _version_ |
1851367430486491136 |
| score |
11.089572 |

