Publication Articles

Sun 14 May 2017
Publication

DeepLoco: Dynamic Locomotion Skills Using Hierarchical Deep Reinforcement Learning

Xue Bin Peng, Glen Berseth, KangKang Yin, Michiel van de Panne

Learning physics-based locomotion skills is a difficult problem, leading to solutions that typically exploit prior knowledge of various forms. In this paper, we aim to learn a variety of environment-aware locomotion skills with a limited amount of prior knowledge. We adopt a two-level hierarchical control framework. First, low-level controllers are learned that operate at a fine timescale and which achieve robust walking gaits that satisfy stepping-target and style objectives. Second, high-level controllers are then learned which plan at the timescale of steps by invoking desired step targets for the low-level controller. The high-level controller makes decisions directly based on high-dimensional inputs, including terrain maps or other suitable representations of the surroundings. Both levels of the control policy are trained using deep reinforcement learning. Results are demonstrated on a simulated 3D biped. Low-level controllers are learned for a variety of motion styles and demonstrate robustness with respect to force-based disturbances, terrain variations, and style interpolation. High-level controllers are demonstrated that are capable of following trails through terrains, dribbling a soccer ball towards a target location, and navigating through static or dynamic obstacles.

Mon 14 November 2016
Publication

Towards Computer Assisted Crowd Aware Architectural Design

Brandon Haworth, Muhammad Usman, Glen Berseth, Mahyar Khayatkhoei, Mubbasir Turab Kapadia, Petros Faloutsos

We present a preliminary exploration of an architectural optimization process towards a computational tool for designing environments (e.g., building floor plans). Using dynamic crowd simulators we derive the fitness of architectural layouts. The results of the simulation are used to provide feedback to a user in terms of crowd animation, aggregate statistics, and heat maps. Our approach automatically optimizes the placement of environment elements to maximize the flow of the crowd, while satisfying constraints that are imposed by the user (e.g., immovable walls or support bearing structures). We take steps towards user-in-the-loop optimization and design of an environment by applying an adaptive refinement approach to reduce the search space of the optimization. We perform a small scale user study to obtain early feedback on the performance and quality of our method in contrast with a manual approach.

Sat 14 May 2016
Publication

Using synthetic crowds to inform building pillar placements

Brandon Haworth, Muhammad Usman, Glen Berseth, Mahyar Khayatkhoei, Mubbasir Turab Kapadia, Petros Faloutsos

We present a preliminary exploration of synthetic crowds towards computational tools for informing the design of environments (e.g., building floor plans). Feedback and automatic design processes are developed from exploring crowd behaviours and metrics derived from simulations of environments in density stressed scenarios, such as evacuations. Computational approaches for crowd analysis and environment design benefit from measures characterizing the relationships between environments and crowd flow behaviours. We investigate the optimization of environment elements to maximize crowd flow, under a range of LoS conditions, a standard indicator for characterizing the service afforded by environments to crowds of specific densities widely used in crowd management and urban design. The steering algorithm, the number of optimized environment elements, the scenario configuration and the LoS conditions affect the optimal configuration of environment elements. From the insights gained exploring optimizations under LoS conditions, we take steps towards user-in-the-loop optimization and design of an environment by applying an adaptive refinement approach to reduce the search space of the optimization. We derive the fitness of architectural layouts from background simulations. We perform a ground truth study to gauge the performance and quality of our method.

Sat 14 May 2016
Publication

Dynamic terrain traversal skills using reinforcement learning

Xue Bin Peng, Glen Berseth, Michiel van de Panne

The locomotion skills developed for physics-based characters most often target flat terrain. However, much of their potential lies with the creation of dynamic, momentum-based motions across more complex terrains. In this paper, we learn controllers that allow simulated characters to traverse terrains with gaps, steps, and walls using highly dynamic gaits. This is achieved using reinforcement learning, with careful attention given to the action representation, non-parametric approximation of both the value function and the policy; epsilon-greedy exploration; and the learning of a good state distance metric. The methods enable a 21-link planar dog and a 7-link planar biped to navigate challenging sequences of terrain using bounding and running gaits. We evaluate the impact of the key features of our skill learning pipeline on the resulting performance.

Sat 14 May 2016
Publication

Modelling Dynamic Brachiation

Glen Berseth, Michiel van de Panne

Significant progress has been made with regard to motions such as walking, running, and other specific motions, such as falling and rolling. However, we still have difficulty simulating agile motions we see in nature, for example, brachiation by gibbons. Gibbons are one of the most agile primates and can leap remarkable distances. In this work we discuss the advantages of skill learning with explicit planning to create motion controllers for more complex and dynamic navigation tasks. Skill learning is complex and cannot be directly solved using only supervised learning because generating good data plays a key role in learning good skills. Here we construct a FSM controller to model the motion and capabilities of a gibbon, one of the most agile primates, shown in Figure 1. We endeavour to give this controller motion skills using reinforcement learning and use this dynamics model to intelligently sample good actions.

Previous
5 of 7
Next