The Bank of England published Staff Working Paper No. 1,142 proposing deep reinforcement learning (DRL) as a general method for representing bounded rationality in dynamic stochastic general equilibrium (DSGE) models, with agents modelled as deep neural networks that learn by interacting with an a priori unknown environment. In an application drawn from the adaptive learning literature, the authors find DRL agents can learn all equilibria irrespective of local stability properties, but that learning can be slow and unstable without early stopping criteria. The paper applies DRL to a monetary-fiscal policy model featuring multiple steady states, including an inflation-target equilibrium and a low-inflation “liquidity trap”, and compares outcomes under DRL and adaptive learning. Unlike adaptive learning agents, DRL agents are able to learn equilibria that are explosive or indeterminate under rational expectations, which the authors attribute to DRL’s reliance on global utility maximisation rather than linearised local dynamics; the paper also introduces “first-order condition distances” as a metric for how close behaviour is to the rational expectations solution. As a Bank of England staff working paper, the publication is presented as research in progress intended to elicit comments and debate and does not represent Bank of England policy.