Glen Berseth

I am an assistant professor at the University de Montreal and Mila. My research explores how to use deep learning and reinforcement learning to develop generalist robots.

Publication Articles


Feedback Control for Cassie with Deep Reinforcement Learning

Zhaoming Xie, Glen Berseth, Patrick Clary, Jonathan Hurst, Michiel van de Panne

Deep reinforcement learning has achieved great strides in solving challenging motion control tasks. Recently, there has been significant work on methods for exploiting the data gathered during training, but there has been less work on how to best generate the data to learn from. For continuous action domains, the most common method for generating exploratory actions involves sampling from a Gaussian distribution centred around the mean action output by a policy. Although these methods can be quite capable, they do not scale well with the dimensionality of the action space, and can be dangerous to apply on hardware. We consider learning a forward dynamics model to predict the result, \((x_{t+1})\), of taking a particular action, \((u_{t})\), given a specific observation of the state, \((x_{t})\). With this model we perform internal look-ahead predictions of outcomes and seek actions we believe have a reasonable chance of success. This method alters the exploratory action space, thereby increasing learning speed and enables higher quality solutions to difficult problems, such as robotic locomotion and juggling


Model-Based Action Exploration for Learning Dynamic Motion Skills

Glen Berseth, Alex Kyriazis, Ivan Zinin, William Choi, Michiel van de Panne

Deep reinforcement learning has achieved great strides in solving challenging motion control tasks. Recently, there has been significant work on methods for exploiting the data gathered during training, but there has been less work on how to best generate the data to learn from. For continuous action domains, the most common method for generating exploratory actions involves sampling from a Gaussian distribution centred around the mean action output by a policy. Although these methods can be quite capable, they do not scale well with the dimensionality of the action space, and can be dangerous to apply on hardware. We consider learning a forward dynamics model to predict the result, \((x_{t+1})\), of taking a particular action, \((u_{t})\), given a specific observation of the state, \((x_{t})\). With this model we perform internal look-ahead predictions of outcomes and seek actions we believe have a reasonable chance of success. This method alters the exploratory action space, thereby increasing learning speed and enables higher quality solutions to difficult problems, such as robotic locomotion and juggling


TerrainRL Sim

Glen Berseth, Xue Bin Peng, Michiel van de Panne

We provide 88 challenging simulation environments that range in difficulty. The difficulty in these environments is linked not only to the number of dimensions in the action space but also to the task complexity. Using more complex and accurate simulations will help push the field closer to creating human-level intelligence. Therefore, we are releasing a number of simulation environments that include local egocentric visual perception. These environments include randomly generated terrain which the agent needs to learn to interpret via visual features. The library also provides simple mechanisms to create new environments with different agent morphologies and the option to modify the distribution of generated terrain.


Progressive Reinforcement Learning with Distillation for Multi-Skilled Motion Control

Glen Berseth, Cheng Xie, Paul Cernek, Michiel van de Panne

Deep reinforcement learning has demonstrated increasing capabilities for continuous control problems, including agents that can move with skill and agility through their environment. An open problem in this setting is that of developing good strategies for integrating or merging policies for multiple skills, where each individual skill is a specialist in a specific skill and its associated state distribution. We extend policy distillation methods to the continuous action setting and leverage this technique to combine expert policies, as evaluated in the domain of simulated bipedal locomotion across different classes of terrain. We also introduce an input injection method for augmenting an existing policy network to exploit new input features. Lastly, our method uses transfer learning to assist in the efficient acquisition of new skills. The combination of these methods allows a policy to be incrementally augmented with new skills. We compare our progressive learning and integration via distillation (PLAID) method against three alternative baselines.


Evaluating and Optimizing Evacuation Plans for Crowd Egress

Vincius J Cassol, Estêvão Smania Testa, Cláudio Rosito Jung, Muhammad Usman, Petros Faloutsos, Glen Berseth, Mubbasir Kapadia, Norman I Badler, Soraia Raupp Musse

Evacuation planning is an important and difficult task in building design. The proposed framework can identify optimal evacuation plans using decision points, which control the ratio of agents that select a particular route at a specific spatial location. The authors optimize these ratios to achieve the best evacuation based on a quantitatively validated metric for evacuation performance. This metric captures many of the important aspects of an evacuation: total evacuation time, average evacuation time, agent speed, and local agent density. The proposed approach was validated using a night club model that incorporates real data from an actual evacuation.