Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA

D., Lucileide M.; Torquato, Matheus; C., Marcelo A.

doi:10.1109/ACCESS.2018.2885950

Journal article 400 views 179 downloads

Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA

Lucileide M. D. Da Silva, Matheus Torquato

, Marcelo A. C. Fernandes

IEEE Access, Volume: 7, Pages: 2782 - 2798

Swansea University Author: Matheus Torquato

PDF | Accepted Manuscript
Download (6.15MB)

Check full text

DOI (Published version): 10.1109/ACCESS.2018.2885950

Abstract

Q-learning is an off-policy reinforcement learning technique, which has the main advantage of obtaining an optimal policy interacting with an unknown model environment. This paper proposes a parallel fixed-point Q-learning algorithm architecture implemented on field programmable gate arrays (FPGA) f...

Full description

Published in:	IEEE Access
ISSN:	2169-3536
Published:	2019
Online Access:	Check full text
URI:	https://cronfa.swan.ac.uk/Record/cronfa49022
Tags:	Add Tag No Tags, Be the first to tag this record!

first_indexed	2019-02-28T20:02:28Z
last_indexed	2019-03-19T19:56:38Z
id	cronfa49022
recordtype	SURis
fullrecord	<?xml version="1.0"?><rfc1807><datestamp>2019-03-19T15:33:15.6760076</datestamp><bib-version>v2</bib-version><id>49022</id><entry>2019-02-28</entry><title>Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA</title><swanseaauthors><author><sid>7a053c668886b4642286baed36fdba90</sid><ORCID>0000-0001-6356-3538</ORCID><firstname>Matheus</firstname><surname>Torquato</surname><name>Matheus Torquato</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2019-02-28</date><deptcode>SCS</deptcode><abstract>Q-learning is an off-policy reinforcement learning technique, which has the main advantage of obtaining an optimal policy interacting with an unknown model environment. This paper proposes a parallel fixed-point Q-learning algorithm architecture implemented on field programmable gate arrays (FPGA) focusing on optimizing the system processing time. The convergence results are presented, and the processing time and occupied area were analyzed for different states and actions sizes scenarios and various fixed-point formats. The studies concerning the accuracy of the Q-learning technique response and resolution error associated with a decrease in the number of bits were also carried out for hardware implementation. The architecture implementation details were featured. The entire project was developed using the system generator platform (Xilinx), with a Virtex-6 xc6vcx240t-1ff1156 as the target FPGA.</abstract><type>Journal Article</type><journal>IEEE Access</journal><volume>7</volume><paginationStart>2782</paginationStart><paginationEnd>2798</paginationEnd><publisher/><issnElectronic>2169-3536</issnElectronic><keywords/><publishedDay>31</publishedDay><publishedMonth>12</publishedMonth><publishedYear>2019</publishedYear><publishedDate>2019-12-31</publishedDate><doi>10.1109/ACCESS.2018.2885950</doi><url/><notes/><college>COLLEGE NANME</college><department>Computer Science</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>SCS</DepartmentCode><institution>Swansea University</institution><apcterm/><lastEdited>2019-03-19T15:33:15.6760076</lastEdited><Created>2019-02-28T14:16:38.0120581</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Engineering and Applied Sciences - Uncategorised</level></path><authors><author><firstname>Lucileide M. D.</firstname><surname>Da Silva</surname><order>1</order></author><author><firstname>Matheus</firstname><surname>Torquato</surname><orcid>0000-0001-6356-3538</orcid><order>2</order></author><author><firstname>Marcelo A. C.</firstname><surname>Fernandes</surname><order>3</order></author></authors><documents><document><filename>0049022-19032019103735.pdf</filename><originalFilename>dasilva2018.pdf</originalFilename><uploaded>2019-03-19T10:37:35.2770000</uploaded><type>Output</type><contentLength>6429418</contentLength><contentType>application/pdf</contentType><version>Accepted Manuscript</version><cronfaStatus>true</cronfaStatus><embargoDate>2019-03-19T00:00:00.0000000</embargoDate><copyrightCorrect>false</copyrightCorrect><language>eng</language></document></documents><OutputDurs/></rfc1807>
spelling	2019-03-19T15:33:15.6760076 v2 49022 2019-02-28 Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA 7a053c668886b4642286baed36fdba90 0000-0001-6356-3538 Matheus Torquato Matheus Torquato true false 2019-02-28 SCS Q-learning is an off-policy reinforcement learning technique, which has the main advantage of obtaining an optimal policy interacting with an unknown model environment. This paper proposes a parallel fixed-point Q-learning algorithm architecture implemented on field programmable gate arrays (FPGA) focusing on optimizing the system processing time. The convergence results are presented, and the processing time and occupied area were analyzed for different states and actions sizes scenarios and various fixed-point formats. The studies concerning the accuracy of the Q-learning technique response and resolution error associated with a decrease in the number of bits were also carried out for hardware implementation. The architecture implementation details were featured. The entire project was developed using the system generator platform (Xilinx), with a Virtex-6 xc6vcx240t-1ff1156 as the target FPGA. Journal Article IEEE Access 7 2782 2798 2169-3536 31 12 2019 2019-12-31 10.1109/ACCESS.2018.2885950 COLLEGE NANME Computer Science COLLEGE CODE SCS Swansea University 2019-03-19T15:33:15.6760076 2019-02-28T14:16:38.0120581 Faculty of Science and Engineering School of Engineering and Applied Sciences - Uncategorised Lucileide M. D. Da Silva 1 Matheus Torquato 0000-0001-6356-3538 2 Marcelo A. C. Fernandes 3 0049022-19032019103735.pdf dasilva2018.pdf 2019-03-19T10:37:35.2770000 Output 6429418 application/pdf Accepted Manuscript true 2019-03-19T00:00:00.0000000 false eng
title	Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA
spellingShingle	Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA Matheus Torquato
title_short	Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA
title_full	Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA
title_fullStr	Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA
title_full_unstemmed	Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA
title_sort	Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA
author_id_str_mv	7a053c668886b4642286baed36fdba90
author_id_fullname_str_mv	7a053c668886b4642286baed36fdba90_***_Matheus Torquato
author	Matheus Torquato
author2	Lucileide M. D. Da Silva Matheus Torquato Marcelo A. C. Fernandes
format	Journal article
container_title	IEEE Access
container_volume	7
container_start_page	2782
publishDate	2019
institution	Swansea University
issn	2169-3536
doi_str_mv	10.1109/ACCESS.2018.2885950
college_str	Faculty of Science and Engineering
hierarchytype
hierarchy_top_id	facultyofscienceandengineering
hierarchy_top_title	Faculty of Science and Engineering
hierarchy_parent_id	facultyofscienceandengineering
hierarchy_parent_title	Faculty of Science and Engineering
department_str	School of Engineering and Applied Sciences - Uncategorised{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Engineering and Applied Sciences - Uncategorised
document_store_str	1
active_str	0
description	Q-learning is an off-policy reinforcement learning technique, which has the main advantage of obtaining an optimal policy interacting with an unknown model environment. This paper proposes a parallel fixed-point Q-learning algorithm architecture implemented on field programmable gate arrays (FPGA) focusing on optimizing the system processing time. The convergence results are presented, and the processing time and occupied area were analyzed for different states and actions sizes scenarios and various fixed-point formats. The studies concerning the accuracy of the Q-learning technique response and resolution error associated with a decrease in the number of bits were also carried out for hardware implementation. The architecture implementation details were featured. The entire project was developed using the system generator platform (Xilinx), with a Virtex-6 xc6vcx240t-1ff1156 as the target FPGA.
published_date	2019-12-31T03:59:46Z
_version_	1763753052316106752
score	11.036006

Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA

Similar Items