Journal article 1637 views 610 downloads
A high performance data parallel tensor contraction framework: Application to coupled electro-mechanics
Computer Physics Communications
Swansea University Author: Antonio Gil
-
PDF | Accepted Manuscript
Download (1.73MB)
DOI (Published version): 10.1016/j.cpc.2017.02.016
Abstract
The paper presents aspects of implementation of a new high performance tensor contraction framework for the numerical analysis of coupled and multi-physics problems on streaming architectures. In addition to explicit SIMD instructions and smart expression templates, the framework introduces domain s...
Published in: | Computer Physics Communications |
---|---|
ISSN: | 0010-4655 |
Published: |
2017
|
Online Access: |
Check full text
|
URI: | https://cronfa.swan.ac.uk/Record/cronfa32097 |
first_indexed |
2017-03-15T20:01:45Z |
---|---|
last_indexed |
2018-02-09T05:19:40Z |
id |
cronfa32097 |
recordtype |
SURis |
fullrecord |
<?xml version="1.0"?><rfc1807><datestamp>2017-02-24T15:08:57.5092029</datestamp><bib-version>v2</bib-version><id>32097</id><entry>2017-02-24</entry><title>A high performance data parallel tensor contraction framework: Application to coupled electro-mechanics</title><swanseaauthors><author><sid>1f5666865d1c6de9469f8b7d0d6d30e2</sid><ORCID>0000-0001-7753-1414</ORCID><firstname>Antonio</firstname><surname>Gil</surname><name>Antonio Gil</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2017-02-24</date><deptcode>ACEM</deptcode><abstract>The paper presents aspects of implementation of a new high performance tensor contraction framework for the numerical analysis of coupled and multi-physics problems on streaming architectures. In addition to explicit SIMD instructions and smart expression templates, the framework introduces domain specific constructs for the tensor cross product and its associated algebra recently rediscovered by Bonet et al. (2015, 2016) in the context of solid mechanics. The two key ingredients of the presented expression template engine are as follows. First, the capability to mathematically transform complex chains of operations to simpler equivalent expressions, while potentially avoiding routes with higher levels of computational complexity and, second, to perform a compile time depth-first search to find the optimal contraction indices of a large tensor network in order to minimise the number of floating point operations. For optimisations of tensor contraction such as loop transformation, loop fusion and data locality optimisations, the framework relies heavily on compile time technologies rather than source-to-source translation or JIT techniques. Every aspect of the framework is examined through relevant performance benchmarks, including the impact of data parallelism on the performance of isomorphic and nonisomorphic tensor products, the FLOP and memory I/O optimality in the evaluation of tensor networks, the compilation cost and memory footprint of the framework and the performance of tensor cross product kernels. The framework is then applied to finite element analysis of coupled electromechanical problems to assess the speed-ups achieved in kernel-based numerical integration of complex electroelastic energy functionals. In this context, domain-aware expression templates are shown to provide a significant speed-up over the classical low-level style programming techniques.</abstract><type>Journal Article</type><journal>Computer Physics Communications</journal><publisher/><issnPrint>0010-4655</issnPrint><keywords>Tensor contraction; Data parallelism; Domain-aware expression templates; Nonlinear coupled electromechanics</keywords><publishedDay>31</publishedDay><publishedMonth>12</publishedMonth><publishedYear>2017</publishedYear><publishedDate>2017-12-31</publishedDate><doi>10.1016/j.cpc.2017.02.016</doi><url/><notes/><college>COLLEGE NANME</college><department>Aerospace, Civil, Electrical, and Mechanical Engineering</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>ACEM</DepartmentCode><institution>Swansea University</institution><apcterm/><lastEdited>2017-02-24T15:08:57.5092029</lastEdited><Created>2017-02-24T15:06:16.2647937</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Aerospace, Civil, Electrical, General and Mechanical Engineering - Civil Engineering</level></path><authors><author><firstname>Roman</firstname><surname>Poya</surname><order>1</order></author><author><firstname>Antonio</firstname><surname>Gil</surname><orcid>0000-0001-7753-1414</orcid><order>2</order></author><author><firstname>Rogelio</firstname><surname>Ortigosa</surname><order>3</order></author></authors><documents><document><filename>0032097-24022017150828.pdf</filename><originalFilename>poya2017.pdf</originalFilename><uploaded>2017-02-24T15:08:28.3670000</uploaded><type>Output</type><contentLength>1780790</contentLength><contentType>application/pdf</contentType><version>Accepted Manuscript</version><cronfaStatus>true</cronfaStatus><embargoDate>2018-02-24T00:00:00.0000000</embargoDate><copyrightCorrect>false</copyrightCorrect><language>eng</language></document></documents><OutputDurs/></rfc1807> |
spelling |
2017-02-24T15:08:57.5092029 v2 32097 2017-02-24 A high performance data parallel tensor contraction framework: Application to coupled electro-mechanics 1f5666865d1c6de9469f8b7d0d6d30e2 0000-0001-7753-1414 Antonio Gil Antonio Gil true false 2017-02-24 ACEM The paper presents aspects of implementation of a new high performance tensor contraction framework for the numerical analysis of coupled and multi-physics problems on streaming architectures. In addition to explicit SIMD instructions and smart expression templates, the framework introduces domain specific constructs for the tensor cross product and its associated algebra recently rediscovered by Bonet et al. (2015, 2016) in the context of solid mechanics. The two key ingredients of the presented expression template engine are as follows. First, the capability to mathematically transform complex chains of operations to simpler equivalent expressions, while potentially avoiding routes with higher levels of computational complexity and, second, to perform a compile time depth-first search to find the optimal contraction indices of a large tensor network in order to minimise the number of floating point operations. For optimisations of tensor contraction such as loop transformation, loop fusion and data locality optimisations, the framework relies heavily on compile time technologies rather than source-to-source translation or JIT techniques. Every aspect of the framework is examined through relevant performance benchmarks, including the impact of data parallelism on the performance of isomorphic and nonisomorphic tensor products, the FLOP and memory I/O optimality in the evaluation of tensor networks, the compilation cost and memory footprint of the framework and the performance of tensor cross product kernels. The framework is then applied to finite element analysis of coupled electromechanical problems to assess the speed-ups achieved in kernel-based numerical integration of complex electroelastic energy functionals. In this context, domain-aware expression templates are shown to provide a significant speed-up over the classical low-level style programming techniques. Journal Article Computer Physics Communications 0010-4655 Tensor contraction; Data parallelism; Domain-aware expression templates; Nonlinear coupled electromechanics 31 12 2017 2017-12-31 10.1016/j.cpc.2017.02.016 COLLEGE NANME Aerospace, Civil, Electrical, and Mechanical Engineering COLLEGE CODE ACEM Swansea University 2017-02-24T15:08:57.5092029 2017-02-24T15:06:16.2647937 Faculty of Science and Engineering School of Aerospace, Civil, Electrical, General and Mechanical Engineering - Civil Engineering Roman Poya 1 Antonio Gil 0000-0001-7753-1414 2 Rogelio Ortigosa 3 0032097-24022017150828.pdf poya2017.pdf 2017-02-24T15:08:28.3670000 Output 1780790 application/pdf Accepted Manuscript true 2018-02-24T00:00:00.0000000 false eng |
title |
A high performance data parallel tensor contraction framework: Application to coupled electro-mechanics |
spellingShingle |
A high performance data parallel tensor contraction framework: Application to coupled electro-mechanics Antonio Gil |
title_short |
A high performance data parallel tensor contraction framework: Application to coupled electro-mechanics |
title_full |
A high performance data parallel tensor contraction framework: Application to coupled electro-mechanics |
title_fullStr |
A high performance data parallel tensor contraction framework: Application to coupled electro-mechanics |
title_full_unstemmed |
A high performance data parallel tensor contraction framework: Application to coupled electro-mechanics |
title_sort |
A high performance data parallel tensor contraction framework: Application to coupled electro-mechanics |
author_id_str_mv |
1f5666865d1c6de9469f8b7d0d6d30e2 |
author_id_fullname_str_mv |
1f5666865d1c6de9469f8b7d0d6d30e2_***_Antonio Gil |
author |
Antonio Gil |
author2 |
Roman Poya Antonio Gil Rogelio Ortigosa |
format |
Journal article |
container_title |
Computer Physics Communications |
publishDate |
2017 |
institution |
Swansea University |
issn |
0010-4655 |
doi_str_mv |
10.1016/j.cpc.2017.02.016 |
college_str |
Faculty of Science and Engineering |
hierarchytype |
|
hierarchy_top_id |
facultyofscienceandengineering |
hierarchy_top_title |
Faculty of Science and Engineering |
hierarchy_parent_id |
facultyofscienceandengineering |
hierarchy_parent_title |
Faculty of Science and Engineering |
department_str |
School of Aerospace, Civil, Electrical, General and Mechanical Engineering - Civil Engineering{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Aerospace, Civil, Electrical, General and Mechanical Engineering - Civil Engineering |
document_store_str |
1 |
active_str |
0 |
description |
The paper presents aspects of implementation of a new high performance tensor contraction framework for the numerical analysis of coupled and multi-physics problems on streaming architectures. In addition to explicit SIMD instructions and smart expression templates, the framework introduces domain specific constructs for the tensor cross product and its associated algebra recently rediscovered by Bonet et al. (2015, 2016) in the context of solid mechanics. The two key ingredients of the presented expression template engine are as follows. First, the capability to mathematically transform complex chains of operations to simpler equivalent expressions, while potentially avoiding routes with higher levels of computational complexity and, second, to perform a compile time depth-first search to find the optimal contraction indices of a large tensor network in order to minimise the number of floating point operations. For optimisations of tensor contraction such as loop transformation, loop fusion and data locality optimisations, the framework relies heavily on compile time technologies rather than source-to-source translation or JIT techniques. Every aspect of the framework is examined through relevant performance benchmarks, including the impact of data parallelism on the performance of isomorphic and nonisomorphic tensor products, the FLOP and memory I/O optimality in the evaluation of tensor networks, the compilation cost and memory footprint of the framework and the performance of tensor cross product kernels. The framework is then applied to finite element analysis of coupled electromechanical problems to assess the speed-ups achieved in kernel-based numerical integration of complex electroelastic energy functionals. In this context, domain-aware expression templates are shown to provide a significant speed-up over the classical low-level style programming techniques. |
published_date |
2017-12-31T04:13:43Z |
_version_ |
1822283345173676032 |
score |
11.357488 |