April 16, 2019Open Access

Reinforcement Learning, Fast and Slow

Key Points

Key points are not available for this paper at this time.

Abstract

Recent AI research has given rise to powerful techniques for deep reinforcement learning. In their combination of representation learning with reward-driven behavior, deep reinforcement learning would appear to have inherent interest for psychology and neuroscience. One reservation has been that deep reinforcement learning procedures demand large amounts of training data, suggesting that these algorithms may differ fundamentally from those underlying human learning. While this concern applies to the initial wave of deep RL techniques, subsequent AI work has established methods that allow deep RL systems to learn more quickly and efficiently. Two particularly interesting and promising techniques center, respectively, on episodic memory and meta-learning. Alongside their interest as AI techniques, deep RL methods leveraging episodic memory and meta-learning have direct and interesting implications for psychology and neuroscience. One subtle but critically important insight which these techniques bring into focus is the fundamental connection between fast and slow forms of learning. Deep reinforcement learning (RL) methods have driven impressive advances in artificial intelligence in recent years, exceeding human performance in domains ranging from Atari to Go to no-limit poker. This progress has drawn the attention of cognitive scientists interested in understanding human learning. However, the concern has been raised that deep RL may be too sample-inefficient – that is, it may simply be too slow – to provide a plausible model of how humans learn. In the present review, we counter this critique by describing recently developed techniques that allow deep RL to operate more nimbly, solving problems much more quickly than previous methods. Although these techniques were developed in an AI context, we propose that they may have rich implications for psychology and neuroscience. A key insight, arising from these AI methods, concerns the fundamental connection between fast RL and slower, more incremental forms of learning. Deep reinforcement learning (RL) methods have driven impressive advances in artificial intelligence in recent years, exceeding human performance in domains ranging from Atari to Go to no-limit poker. This progress has drawn the attention of cognitive scientists interested in understanding human learning. However, the concern has been raised that deep RL may be too sample-inefficient – that is, it may simply be too slow – to provide a plausible model of how humans learn. In the present review, we counter this critique by describing recently developed techniques that allow deep RL to operate more nimbly, solving problems much more quickly than previous methods. Although these techniques were developed in an AI context, we propose that they may have rich implications for psychology and neuroscience. A key insight, arising from these AI methods, concerns the fundamental connection between fast RL and slower, more incremental forms of learning. Over just the past few years, revolutionary advances have occurred in artificial intelligence (AI) research, where a resurgence in neural network or ‘deep learning’ methods [1LeCun Y. et al.Deep learning.Nature. 2015; 521: 436Crossref PubMed et al.Deep has in understanding et with deep neural et representation and PubMed et by learning to and et a model for and have interest from and in AI human and et an of deep learning and PubMed et training of neural for cognitive and PubMed deep learning to PubMed et neural network that a for the of 2015; PubMed Deep but may it PubMed One of AI research that particularly from this is deep RL Deep RL neural network with reinforcement a of methods for learning from and than from more as an than deep RL has the past into of the of AI research, performance in from et deep reinforcement learning.Nature. 2015; PubMed to et artificial intelligence in no-limit PubMed et performance in with deep reinforcement and and et the of with deep neural and PubMed et and by with a reinforcement learning et the of human PubMed et reinforcement learning that and PubMed on the of learning a a from or to which In the be as a the for In this of is and the be as a work in the how this be a neural network learning and learning to from to A and However, the of deep neural with RL work how deep RL be to work in domains as Atari et deep reinforcement learning.Nature. 2015; PubMed and progress has been and deep RL et in deep reinforcement to domains as Go et the of with deep neural and PubMed and the et training of neural In the advances have deep RL with and as et the of with deep neural and PubMed or memory et a neural network with PubMed and have on the of learning deep RL to progress on just a few as in the the of deep RL methods, in A with learning and This on a neural network which as a representation of the and to an of the as simply to the of a from the the network by and et deep reinforcement learning.Nature. 2015; PubMed a neural network et with deep neural as and to a representation of a deep RL by and et memory in a A of the of this RL is the of the present be in et memory in a However, as the the a neural network that an memory to which to a that on the to in as in RL on the of learning a a from or to which In the be as a the for In this of is and the be as a work in the how this be a neural network learning and learning to from to A and However, the of deep neural with RL work how deep RL be to work in domains as Atari et deep reinforcement learning.Nature. 2015; PubMed and progress has been and deep RL et in deep reinforcement to domains as Go et the of with deep neural and PubMed and the et training of neural In the advances have deep RL with and as et the of with deep neural and PubMed or memory et a neural network with PubMed and have on the of learning deep RL to progress on just a few as in the the of deep RL methods, in A with learning and This on a neural network which as a representation of the and to an of the as simply to the of a from the the network by and et deep reinforcement learning.Nature. 2015; PubMed a neural network et with deep neural as and to a representation of a deep RL by and et memory in a A of the of this RL is the of the present be in et memory in a However, as the the a neural network that an memory to which to a that on the to in as in inherent interest as an AI deep RL would appear to interest for psychology and neuroscience. that learning in deep RL were by research a of and PubMed and to to neural for learning on et neural of and PubMed the deep RL neural to learn powerful that and key of these deep RL would appear to a rich of and for interested in human and the and have to et an of deep learning and PubMed et training of neural for cognitive and PubMed the on the wave of deep RL research has a of it that deep RL systems learn in a from of this it has been in the of human learning deep to the of for a learning to of this the initial wave of deep RL systems appear from human performance on as Atari or deep RL systems have of more training than human et learning in In deep in initial much too slow to a plausible model for human learning. the has et learning 2015; PubMed Deep a critique is to the wave of deep RL methods, et with deep reinforcement However, in the important have occurred in deep RL research, which how the of deep RL be methods the by deep RL for amounts of training data, deep RL to be of these techniques deep RL as a model of human learning and a of insight for psychology and neuroscience. In the present review, we key deep RL methods that the episodic deep RL and how these techniques fast deep RL and their implications for psychology and neuroscience. A key for techniques for fast RL is to initial methods for deep RL were in we of the of the of the we to how the of by these in of in deep RL is the for incremental deep RL methods in AI to the of a deep neural network from to has been in AI but in psychology et learning systems learning systems PubMed the this of learning be in to et of 2015; and the of learning to as This demand for in learning is of in the methods for deep A is A of learning is that learning a the the initial the learning the to be the the initial of the learning the be for learning to be the initial in the A learning with be to a of but in be and and In is fast learning. A learning that a of in on the more than a with the that initial neural learning they have and of these to a of by the this that neural in the in the deep RL to be large amounts of to learn. these and the of deep RL However, subsequent research has that of these be deep RL to in a much more In we techniques, of which the incremental and the of which the of In to their implications the AI of these AI techniques with psychology and as we incremental is of in deep to learn be to incremental the learning to the of However, recent research that is to the which is to an of past and this as a of in This to as episodic RL et episodic learning and episodic memory in humans and an PubMed to the in learning and and and or of learning in psychology an of and a is and a be to the is to an representation of the with of past is the with the on the of the past that to the the representation is by a neural we to the as deep A more of the of episodic deep RL is in Deep RL algorithms the of and episodic learning and episodic memory in humans and an PubMed et of past for in PubMed episodic for PubMed for the episodic in the with the of the an episodic memory of the and the that the of a the a of the by the between and the This be to by the with the and in the memory the to in which the In et episodic an episodic RL to performance on Atari of episodic RL on the to In a to et episodic et et episodic that performance be by these learning. performance and of the on the in the Atari et episodic the of slow learning and fast learning. RL algorithms the of and episodic learning and episodic memory in humans and an PubMed et of past for in PubMed episodic for PubMed for the episodic in the with the of the an episodic memory of the and the that the of a the a of the by the between and the This be to by the with the and in the memory the to in which the In et episodic an episodic RL to performance on Atari of episodic RL on the to In a to et episodic et et episodic that performance be by these learning. performance and of the on the in the Atari et episodic the of slow learning and fast learning. In episodic deep the incremental the be to However, episodic deep RL is to where methods for deep RL is a to this the fast learning of episodic deep RL critically on slow incremental learning. This is the learning of the connection that the to or of of these is the of incremental that forms the of deep the of episodic deep RL is by this of learning. is, fast learning is by slow learning. This of fast learning on slow learning is we it is a fundamental to psychology and than to a of this we in the recently developed AI for deep a key of in deep incremental is in the of the fast learning the to in with a of the of the that it the the learning However, as is a a learning it the While they the the to with the to be a of a learning how the to One to this is to on past this the in for the of learning to a In this context, past with and the the work and of the initial to the in the and they for the to quickly learn how to the A these with would a of the the of learning leveraging of past to learning is to in learning as meta-learning However, the from where it has been to In the to this of learning PubMed an that the were with and to of a or an were the and the for a of Two and were and with these of and the were to that a and the of with a of were to learn in which the a but of learning to learn learning the of a learning that the with and et to learn on to and to of learning to to et of in reinforcement PubMed This is in of of an that to that the of an that the to to the where the and RL learning from and that in the of this a of neural network in memory et to learn on PubMed In this the of learning of the connection by a deep RL Over the of this rise to an learning which is in the of the network et to reinforcement Y. et fast reinforcement learning slow reinforcement et with neural on et to learn on PubMed A training a neural network on a the between the or and is on that the for a of problems is with a of with for a of and on to for the in is to of the with of to to this learning to more on a with that more with of training on a of the network with connection a and on the a that with algorithms the learning that in the is of the RL that the it work than that it that with the on which the is is in et et as a learning PubMed performance is training on a of which the the training on a in which the they to the training on the to in on the more This an in the a of in the of learning PubMed the of to a where between of learning on in the for of is with of to an on and the to for In recent and et as a learning PubMed that a neural rise to the of learning learning the of a learning that the with and et to learn on to and to of learning to to et of in reinforcement PubMed This is in of of an that to that the of an that the to to the where the and RL learning from and that in the of this a of neural network in memory et to learn on PubMed In this the of learning of the connection by a deep RL Over the of this rise to an learning which is in the of the network et to reinforcement Y. et fast reinforcement learning slow reinforcement et with neural on et to learn on PubMed A training a neural network on a the between the or and is on that the for a of problems is with a of with for a of and on to for the in is to of the with of to to this learning to more on a with that more with of training on a of the network with connection a and on the a that with algorithms the learning that in the is of the RL that the it work than that it that with the on which the is is in et et as a learning PubMed performance is training on a of which the the training on a in which the they to the training on the to in on the more This an in the a of in the of learning PubMed the of to a where between of learning on in the for of is with of to an on and the to for In recent and et as a learning PubMed that a neural rise to the of learning to recent work has how learning to learn be to learning in deep This has been in a of et to learn by by et meta-learning for fast of deep However, that has to and psychology by et to reinforcement and Y. et fast reinforcement learning slow reinforcement and their a neural network is on a of RL in the network they is but fast to the of In this of the network to their RL which for quickly solving on from past RL to and the with episodic deep an connection between fast and slow learning. in the network that to be the of the network a learning which problems they have been with by the underlying of slow learning fast learning and is slow learning. the techniques we have recent work has an to meta-learning and episodic on their et meta-learning with episodic on et as In episodic meta-learning a neural as in the previous and However, on this is an episodic memory the of which is to of in the in episodic deep the episodic memory a of past which be on the However, than with episodic with from the or important they to the has from with for In episodic the a that to in the it the from the previous to the In episodic memory the to work in and et et meta-learning with episodic on that episodic just that it to with a episodic and the it the to the with a the from the of on the and it from the learning by episodic we the the of has been as a for the of deep RL to learning in humans and et learning 2015; PubMed Deep a One important of episodic deep RL and from the of of psychology and is that they this by that deep RL in be This deep RL as a model of human and learning. this the of episodic deep RL and to interesting in psychology and neuroscience. with episodic deep we have the interesting connection between this and of human where previous an of and RL a for how reward-driven learning. recent work on RL in and humans has the of episodic with that of and on for past learning and episodic memory in humans and an PubMed to the et of past for in PubMed episodic for PubMed deep RL a for how this to learning more it the important that representation learning and learning in RL on episodic deep RL that it may be to the that fast episodic RL in humans and may with and learning While this between fast and slow learning has been in work on memory work on memory systems et learning systems learning systems PubMed et and memory PubMed et is a cognitive for et learning systems in the and from the and of of learning and PubMed in learning has been et reinforcement to this has interesting implications for psychology and neuroscience. and et as a learning PubMed have a direct from the of to neural and they propose that may to the of in a that the to an of learning procedures a of and et as a learning PubMed how in this for a of from the and and in the and et as a learning PubMed that as in model learning in that on the of and that this is by an of focus on the is to learning et is a cognitive for et and learning and PubMed and in for that learning. et et for in PubMed from a in which that for the of a but for the previous the previous and the of previous and previous key for an learning in this et et as a learning PubMed on the the of the artificial network to a fast learning with that in this network to et et as a learning PubMed that meta-learning the of an the in to have that for learning to the to the of learning et for systems on PubMed and reinforcement the PubMed et of PubMed In this incremental to and these the to This learning is as to a in and in the PubMed et between and systems for PubMed and learning and their PubMed this is by the that et on and PubMed et and in a PubMed et and for of PubMed in a to and et et on and PubMed in to be the of et et as a learning PubMed on a of this et to PubMed In of behavior, they that the network in a learning the that the training of they the of they that the work has been to understanding how the and et is a cognitive for et neural underlying and reinforcement PubMed et and model in PubMed memory a model of learning in the and PubMed a that learning to a learning may be an important of In this incremental learning into algorithms that in the to learn in reinforcement PubMed et of in reinforcement PubMed et and the of reinforcement learning in and et as a learning PubMed that as in model learning in that on the of and that this is by an of focus on the is to learning et is a cognitive for et and learning and PubMed and in for that learning. et et for in PubMed from a in which that for the of a but for the previous the previous and the of previous and previous key for an learning in this et et as a learning PubMed on the the of the artificial network to a fast learning with that in this network to et et as a learning PubMed that meta-learning the of an the in to have that for learning to the to the of learning et for systems on PubMed and reinforcement the PubMed et of PubMed In this incremental to and these the to This learning is as to a in and in the PubMed et between and systems for PubMed and learning and their PubMed However, this is by the that et on and PubMed et and in a PubMed et and for of PubMed in a to and et et on and PubMed in to be the of et et as a learning PubMed on a of this et to PubMed In of behavior, they that the network in a learning the that the training of they the of they that the work has been to understanding how the and et is a cognitive for et neural underlying and reinforcement PubMed et and model in PubMed memory a model of learning in the and PubMed a that learning to a learning In may be an important of In this incremental learning into algorithms that in the to learn in reinforcement PubMed et of in reinforcement PubMed et and the of reinforcement learning in direct episodic with psychology and neuroscience. the in episodic by that episodic memory to of in memory et as and et meta-learning with episodic on how a be rise to a that et with neural on et memory in a et a neural network with PubMed In to the initial it from this work to by a for recently between episodic and in human learning et to reinforcement on a the work by and et meta-learning with episodic on an of how meta-learning operate memory systems et learning systems learning systems PubMed their that they learning. In episodic RL and we have the of learning in learning. In as we have the of learning is to that and to of incremental learning in episodic RL be in RL on between or learning the that and in a of which episodic RL more that is an into the learning In episodic RL a of for than this is into the of the learning that episodic In AI this is a of or in to as in A of AI research is on to this is learning or the direct of or the has been for the resurgence of neural in neural which the for this in a with in However, the past few years, an large of AI research has been or on the of et deep and et deep reinforcement and et for a these of concerns in we have the that may be learning in from psychology of learning PubMed and has an of research et learning 2015; PubMed However, meta-learning in neural may provide a to the and of in the RL and in has the that that is, PubMed However, the more of and of into neural network learning have been in and in the a model of PubMed et and cognitive PubMed in a of PubMed methods for deep learning and deep RL provide a that may be in for in the implications of et and of PubMed of network PubMed of the in the PubMed is AI work a between that learning and that by in a a more is and as arising a learning driven by is the learning and that allow learning. this meta-learning a but this that for a learning but for an which the in the in which have In this context, recent in AI may in implications for and Recent AI work has into methods for as as by et performance in with deep reinforcement neural PubMed et learning by the learning et for it on or AI work on and with a for how the to learning. by AI research on the initial of network et meta-learning for fast of deep of learning et to learn by by Y. et a learning from on et neural and the of or et deep and et learning with a on et in reinforcement and et memory in a the of and and a in which learning is from to that learning and with of this and a that is by the of the et and the of 2015; this and which these which on this as the learning in a with a by to that this is to cognitive the of learning to learn has a in psychology for of learning PubMed et that learn and PubMed and of learning have with the for et learning 2015; PubMed et of in et of and PubMed et to a and PubMed et meta-learning as on has on learning these the recent in AI research the of slow and fast learning in neural and in a of and a of of deep RL interest for psychology and given focus on representation learning and In the present review, we have recently forms of deep RL that the of deep RL to work techniques the of deep RL to psychology and recent they the by to as episodic memory and learning to learn. arising from deep RL research and for research in psychology and neuroscience. we have a key of recent work on deep RL is that where fast learning it on slow which the and that fast learning. This a for memory systems in the as as their However, human learning those in this review, and we that deep RL model to of these in to learning a understanding the between fast and slow in RL a for psychology and neuroscience. this may be a key where and psychology as has been the in cognitive AI methods for deep RL to the of rich humans In these methods rich of the that human of training be for human learning from to those in is their neural important for AI techniques, in the or is the by important for learning in the that human were these and or and to they that human is that we the that and human and how we those in AI AI methods for deep RL to the of rich humans In these methods rich of the that human of training be for human learning from to those in is their neural important for AI techniques, in the or is the by important for learning in the that human were these and or and to they One that human is that we the that and human and how we those in AI were by a neural network with or more a representation in a of a neural a of a neural network in between the and a of and in which an in to an learn and in A of and in a the of is and as more is to the a neural network that in a from to the

Mark Helpful

Bookmark

Relay

View Full Paper