Journal article 400 views 179 downloads
Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA
IEEE Access, Volume: 7, Pages: 2782 - 2798
Swansea University Author: Matheus Torquato
-
PDF | Accepted Manuscript
Download (6.15MB)
DOI (Published version): 10.1109/ACCESS.2018.2885950
Abstract
Q-learning is an off-policy reinforcement learning technique, which has the main advantage of obtaining an optimal policy interacting with an unknown model environment. This paper proposes a parallel fixed-point Q-learning algorithm architecture implemented on field programmable gate arrays (FPGA) f...
Published in: | IEEE Access |
---|---|
ISSN: | 2169-3536 |
Published: |
2019
|
Online Access: |
Check full text
|
URI: | https://cronfa.swan.ac.uk/Record/cronfa49022 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
first_indexed |
2019-02-28T20:02:28Z |
---|---|
last_indexed |
2019-03-19T19:56:38Z |
id |
cronfa49022 |
recordtype |
SURis |
fullrecord |
<?xml version="1.0"?><rfc1807><datestamp>2019-03-19T15:33:15.6760076</datestamp><bib-version>v2</bib-version><id>49022</id><entry>2019-02-28</entry><title>Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA</title><swanseaauthors><author><sid>7a053c668886b4642286baed36fdba90</sid><ORCID>0000-0001-6356-3538</ORCID><firstname>Matheus</firstname><surname>Torquato</surname><name>Matheus Torquato</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2019-02-28</date><deptcode>SCS</deptcode><abstract>Q-learning is an off-policy reinforcement learning technique, which has the main advantage of obtaining an optimal policy interacting with an unknown model environment. This paper proposes a parallel fixed-point Q-learning algorithm architecture implemented on field programmable gate arrays (FPGA) focusing on optimizing the system processing time. The convergence results are presented, and the processing time and occupied area were analyzed for different states and actions sizes scenarios and various fixed-point formats. The studies concerning the accuracy of the Q-learning technique response and resolution error associated with a decrease in the number of bits were also carried out for hardware implementation. The architecture implementation details were featured. The entire project was developed using the system generator platform (Xilinx), with a Virtex-6 xc6vcx240t-1ff1156 as the target FPGA.</abstract><type>Journal Article</type><journal>IEEE Access</journal><volume>7</volume><paginationStart>2782</paginationStart><paginationEnd>2798</paginationEnd><publisher/><issnElectronic>2169-3536</issnElectronic><keywords/><publishedDay>31</publishedDay><publishedMonth>12</publishedMonth><publishedYear>2019</publishedYear><publishedDate>2019-12-31</publishedDate><doi>10.1109/ACCESS.2018.2885950</doi><url/><notes/><college>COLLEGE NANME</college><department>Computer Science</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>SCS</DepartmentCode><institution>Swansea University</institution><apcterm/><lastEdited>2019-03-19T15:33:15.6760076</lastEdited><Created>2019-02-28T14:16:38.0120581</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Engineering and Applied Sciences - Uncategorised</level></path><authors><author><firstname>Lucileide M. D.</firstname><surname>Da Silva</surname><order>1</order></author><author><firstname>Matheus</firstname><surname>Torquato</surname><orcid>0000-0001-6356-3538</orcid><order>2</order></author><author><firstname>Marcelo A. C.</firstname><surname>Fernandes</surname><order>3</order></author></authors><documents><document><filename>0049022-19032019103735.pdf</filename><originalFilename>dasilva2018.pdf</originalFilename><uploaded>2019-03-19T10:37:35.2770000</uploaded><type>Output</type><contentLength>6429418</contentLength><contentType>application/pdf</contentType><version>Accepted Manuscript</version><cronfaStatus>true</cronfaStatus><embargoDate>2019-03-19T00:00:00.0000000</embargoDate><copyrightCorrect>false</copyrightCorrect><language>eng</language></document></documents><OutputDurs/></rfc1807> |
spelling |
2019-03-19T15:33:15.6760076 v2 49022 2019-02-28 Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA 7a053c668886b4642286baed36fdba90 0000-0001-6356-3538 Matheus Torquato Matheus Torquato true false 2019-02-28 SCS Q-learning is an off-policy reinforcement learning technique, which has the main advantage of obtaining an optimal policy interacting with an unknown model environment. This paper proposes a parallel fixed-point Q-learning algorithm architecture implemented on field programmable gate arrays (FPGA) focusing on optimizing the system processing time. The convergence results are presented, and the processing time and occupied area were analyzed for different states and actions sizes scenarios and various fixed-point formats. The studies concerning the accuracy of the Q-learning technique response and resolution error associated with a decrease in the number of bits were also carried out for hardware implementation. The architecture implementation details were featured. The entire project was developed using the system generator platform (Xilinx), with a Virtex-6 xc6vcx240t-1ff1156 as the target FPGA. Journal Article IEEE Access 7 2782 2798 2169-3536 31 12 2019 2019-12-31 10.1109/ACCESS.2018.2885950 COLLEGE NANME Computer Science COLLEGE CODE SCS Swansea University 2019-03-19T15:33:15.6760076 2019-02-28T14:16:38.0120581 Faculty of Science and Engineering School of Engineering and Applied Sciences - Uncategorised Lucileide M. D. Da Silva 1 Matheus Torquato 0000-0001-6356-3538 2 Marcelo A. C. Fernandes 3 0049022-19032019103735.pdf dasilva2018.pdf 2019-03-19T10:37:35.2770000 Output 6429418 application/pdf Accepted Manuscript true 2019-03-19T00:00:00.0000000 false eng |
title |
Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA |
spellingShingle |
Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA Matheus Torquato |
title_short |
Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA |
title_full |
Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA |
title_fullStr |
Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA |
title_full_unstemmed |
Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA |
title_sort |
Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA |
author_id_str_mv |
7a053c668886b4642286baed36fdba90 |
author_id_fullname_str_mv |
7a053c668886b4642286baed36fdba90_***_Matheus Torquato |
author |
Matheus Torquato |
author2 |
Lucileide M. D. Da Silva Matheus Torquato Marcelo A. C. Fernandes |
format |
Journal article |
container_title |
IEEE Access |
container_volume |
7 |
container_start_page |
2782 |
publishDate |
2019 |
institution |
Swansea University |
issn |
2169-3536 |
doi_str_mv |
10.1109/ACCESS.2018.2885950 |
college_str |
Faculty of Science and Engineering |
hierarchytype |
|
hierarchy_top_id |
facultyofscienceandengineering |
hierarchy_top_title |
Faculty of Science and Engineering |
hierarchy_parent_id |
facultyofscienceandengineering |
hierarchy_parent_title |
Faculty of Science and Engineering |
department_str |
School of Engineering and Applied Sciences - Uncategorised{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Engineering and Applied Sciences - Uncategorised |
document_store_str |
1 |
active_str |
0 |
description |
Q-learning is an off-policy reinforcement learning technique, which has the main advantage of obtaining an optimal policy interacting with an unknown model environment. This paper proposes a parallel fixed-point Q-learning algorithm architecture implemented on field programmable gate arrays (FPGA) focusing on optimizing the system processing time. The convergence results are presented, and the processing time and occupied area were analyzed for different states and actions sizes scenarios and various fixed-point formats. The studies concerning the accuracy of the Q-learning technique response and resolution error associated with a decrease in the number of bits were also carried out for hardware implementation. The architecture implementation details were featured. The entire project was developed using the system generator platform (Xilinx), with a Virtex-6 xc6vcx240t-1ff1156 as the target FPGA. |
published_date |
2019-12-31T03:59:46Z |
_version_ |
1763753052316106752 |
score |
11.036006 |