No Cover Image

Journal article 1637 views 610 downloads

A high performance data parallel tensor contraction framework: Application to coupled electro-mechanics

Roman Poya, Antonio Gil Orcid Logo, Rogelio Ortigosa

Computer Physics Communications

Swansea University Author: Antonio Gil Orcid Logo

Abstract

The paper presents aspects of implementation of a new high performance tensor contraction framework for the numerical analysis of coupled and multi-physics problems on streaming architectures. In addition to explicit SIMD instructions and smart expression templates, the framework introduces domain s...

Full description

Published in: Computer Physics Communications
ISSN: 0010-4655
Published: 2017
Online Access: Check full text

URI: https://cronfa.swan.ac.uk/Record/cronfa32097
first_indexed 2017-03-15T20:01:45Z
last_indexed 2018-02-09T05:19:40Z
id cronfa32097
recordtype SURis
fullrecord <?xml version="1.0"?><rfc1807><datestamp>2017-02-24T15:08:57.5092029</datestamp><bib-version>v2</bib-version><id>32097</id><entry>2017-02-24</entry><title>A high performance data parallel tensor contraction framework: Application to coupled electro-mechanics</title><swanseaauthors><author><sid>1f5666865d1c6de9469f8b7d0d6d30e2</sid><ORCID>0000-0001-7753-1414</ORCID><firstname>Antonio</firstname><surname>Gil</surname><name>Antonio Gil</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2017-02-24</date><deptcode>ACEM</deptcode><abstract>The paper presents aspects of implementation of a new high performance tensor contraction framework for the numerical analysis of coupled and multi-physics problems on streaming architectures. In addition to explicit SIMD instructions and smart expression templates, the framework introduces domain specific constructs for the tensor cross product and its associated algebra recently rediscovered by Bonet et al. (2015, 2016) in the context of solid mechanics. The two key ingredients of the presented expression template engine are as follows. First, the capability to mathematically transform complex chains of operations to simpler equivalent expressions, while potentially avoiding routes with higher levels of computational complexity and, second, to perform a compile time depth-first search to find the optimal contraction indices of a large tensor network in order to minimise the number of floating point operations. For optimisations of tensor contraction such as loop transformation, loop fusion and data locality optimisations, the framework relies heavily on compile time technologies rather than source-to-source translation or JIT techniques. Every aspect of the framework is examined through relevant performance benchmarks, including the impact of data parallelism on the performance of isomorphic and nonisomorphic tensor products, the FLOP and memory I/O optimality in the evaluation of tensor networks, the compilation cost and memory footprint of the framework and the performance of tensor cross product kernels. The framework is then applied to finite element analysis of coupled electromechanical problems to assess the speed-ups achieved in kernel-based numerical integration of complex electroelastic energy functionals. In this context, domain-aware expression templates are shown to provide a significant speed-up over the classical low-level style programming techniques.</abstract><type>Journal Article</type><journal>Computer Physics Communications</journal><publisher/><issnPrint>0010-4655</issnPrint><keywords>Tensor contraction; Data parallelism; Domain-aware expression templates; Nonlinear coupled electromechanics</keywords><publishedDay>31</publishedDay><publishedMonth>12</publishedMonth><publishedYear>2017</publishedYear><publishedDate>2017-12-31</publishedDate><doi>10.1016/j.cpc.2017.02.016</doi><url/><notes/><college>COLLEGE NANME</college><department>Aerospace, Civil, Electrical, and Mechanical Engineering</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>ACEM</DepartmentCode><institution>Swansea University</institution><apcterm/><lastEdited>2017-02-24T15:08:57.5092029</lastEdited><Created>2017-02-24T15:06:16.2647937</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Aerospace, Civil, Electrical, General and Mechanical Engineering - Civil Engineering</level></path><authors><author><firstname>Roman</firstname><surname>Poya</surname><order>1</order></author><author><firstname>Antonio</firstname><surname>Gil</surname><orcid>0000-0001-7753-1414</orcid><order>2</order></author><author><firstname>Rogelio</firstname><surname>Ortigosa</surname><order>3</order></author></authors><documents><document><filename>0032097-24022017150828.pdf</filename><originalFilename>poya2017.pdf</originalFilename><uploaded>2017-02-24T15:08:28.3670000</uploaded><type>Output</type><contentLength>1780790</contentLength><contentType>application/pdf</contentType><version>Accepted Manuscript</version><cronfaStatus>true</cronfaStatus><embargoDate>2018-02-24T00:00:00.0000000</embargoDate><copyrightCorrect>false</copyrightCorrect><language>eng</language></document></documents><OutputDurs/></rfc1807>
spelling 2017-02-24T15:08:57.5092029 v2 32097 2017-02-24 A high performance data parallel tensor contraction framework: Application to coupled electro-mechanics 1f5666865d1c6de9469f8b7d0d6d30e2 0000-0001-7753-1414 Antonio Gil Antonio Gil true false 2017-02-24 ACEM The paper presents aspects of implementation of a new high performance tensor contraction framework for the numerical analysis of coupled and multi-physics problems on streaming architectures. In addition to explicit SIMD instructions and smart expression templates, the framework introduces domain specific constructs for the tensor cross product and its associated algebra recently rediscovered by Bonet et al. (2015, 2016) in the context of solid mechanics. The two key ingredients of the presented expression template engine are as follows. First, the capability to mathematically transform complex chains of operations to simpler equivalent expressions, while potentially avoiding routes with higher levels of computational complexity and, second, to perform a compile time depth-first search to find the optimal contraction indices of a large tensor network in order to minimise the number of floating point operations. For optimisations of tensor contraction such as loop transformation, loop fusion and data locality optimisations, the framework relies heavily on compile time technologies rather than source-to-source translation or JIT techniques. Every aspect of the framework is examined through relevant performance benchmarks, including the impact of data parallelism on the performance of isomorphic and nonisomorphic tensor products, the FLOP and memory I/O optimality in the evaluation of tensor networks, the compilation cost and memory footprint of the framework and the performance of tensor cross product kernels. The framework is then applied to finite element analysis of coupled electromechanical problems to assess the speed-ups achieved in kernel-based numerical integration of complex electroelastic energy functionals. In this context, domain-aware expression templates are shown to provide a significant speed-up over the classical low-level style programming techniques. Journal Article Computer Physics Communications 0010-4655 Tensor contraction; Data parallelism; Domain-aware expression templates; Nonlinear coupled electromechanics 31 12 2017 2017-12-31 10.1016/j.cpc.2017.02.016 COLLEGE NANME Aerospace, Civil, Electrical, and Mechanical Engineering COLLEGE CODE ACEM Swansea University 2017-02-24T15:08:57.5092029 2017-02-24T15:06:16.2647937 Faculty of Science and Engineering School of Aerospace, Civil, Electrical, General and Mechanical Engineering - Civil Engineering Roman Poya 1 Antonio Gil 0000-0001-7753-1414 2 Rogelio Ortigosa 3 0032097-24022017150828.pdf poya2017.pdf 2017-02-24T15:08:28.3670000 Output 1780790 application/pdf Accepted Manuscript true 2018-02-24T00:00:00.0000000 false eng
title A high performance data parallel tensor contraction framework: Application to coupled electro-mechanics
spellingShingle A high performance data parallel tensor contraction framework: Application to coupled electro-mechanics
Antonio Gil
title_short A high performance data parallel tensor contraction framework: Application to coupled electro-mechanics
title_full A high performance data parallel tensor contraction framework: Application to coupled electro-mechanics
title_fullStr A high performance data parallel tensor contraction framework: Application to coupled electro-mechanics
title_full_unstemmed A high performance data parallel tensor contraction framework: Application to coupled electro-mechanics
title_sort A high performance data parallel tensor contraction framework: Application to coupled electro-mechanics
author_id_str_mv 1f5666865d1c6de9469f8b7d0d6d30e2
author_id_fullname_str_mv 1f5666865d1c6de9469f8b7d0d6d30e2_***_Antonio Gil
author Antonio Gil
author2 Roman Poya
Antonio Gil
Rogelio Ortigosa
format Journal article
container_title Computer Physics Communications
publishDate 2017
institution Swansea University
issn 0010-4655
doi_str_mv 10.1016/j.cpc.2017.02.016
college_str Faculty of Science and Engineering
hierarchytype
hierarchy_top_id facultyofscienceandengineering
hierarchy_top_title Faculty of Science and Engineering
hierarchy_parent_id facultyofscienceandengineering
hierarchy_parent_title Faculty of Science and Engineering
department_str School of Aerospace, Civil, Electrical, General and Mechanical Engineering - Civil Engineering{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Aerospace, Civil, Electrical, General and Mechanical Engineering - Civil Engineering
document_store_str 1
active_str 0
description The paper presents aspects of implementation of a new high performance tensor contraction framework for the numerical analysis of coupled and multi-physics problems on streaming architectures. In addition to explicit SIMD instructions and smart expression templates, the framework introduces domain specific constructs for the tensor cross product and its associated algebra recently rediscovered by Bonet et al. (2015, 2016) in the context of solid mechanics. The two key ingredients of the presented expression template engine are as follows. First, the capability to mathematically transform complex chains of operations to simpler equivalent expressions, while potentially avoiding routes with higher levels of computational complexity and, second, to perform a compile time depth-first search to find the optimal contraction indices of a large tensor network in order to minimise the number of floating point operations. For optimisations of tensor contraction such as loop transformation, loop fusion and data locality optimisations, the framework relies heavily on compile time technologies rather than source-to-source translation or JIT techniques. Every aspect of the framework is examined through relevant performance benchmarks, including the impact of data parallelism on the performance of isomorphic and nonisomorphic tensor products, the FLOP and memory I/O optimality in the evaluation of tensor networks, the compilation cost and memory footprint of the framework and the performance of tensor cross product kernels. The framework is then applied to finite element analysis of coupled electromechanical problems to assess the speed-ups achieved in kernel-based numerical integration of complex electroelastic energy functionals. In this context, domain-aware expression templates are shown to provide a significant speed-up over the classical low-level style programming techniques.
published_date 2017-12-31T04:13:43Z
_version_ 1822283345173676032
score 11.357488