- Sun 14 May 2017
- ReinforcementLeanring
- Glen Berseth
- #RL, #Python, #SWIG, #Authors
Intro
These days, many people want to use one of many new libraries written in python to train deep learning models. In general, python has many powerful and easy to use libraries for performing machine learning. However, many applications that generate data are written in other languages. In particular, many simulators for RL applications are written in C++. Here I will focus on an interface I have found useful for wrapping physics-based simulators in C++.
The Interface
We want to able to support two types of control flow, one for learning and one for simulation. I find it helpful to have these two different methods as we often want to simulate things faster than we would typically render them. Also, during rendering, we might want to interact with the simulation, for example, to give an agent in the simulation a little shove.
Examples of The Flow of Control
For Rendering the simulation:
animate()
// Might run the physics in the simulation faster than the frame rendering.
For n = 1 .. num_substeps
if simulation needs new action
s <= get state()
a <= get new action
apply the action
sim.updateAction(a)
update simulation
postredisplay()
For Training:
act(action)
sim.updateAction(a)
While (not needsNewAction())
update sim()
simEpoch(actor, env)
n=0
tuples=[]
While (not endOfEpisode() or n < 100)
S = env.getState()
A = policy(s)
R = env.act(a)
tuples.append((S, A, R))
n+=1
return tuples
The code
Being able to use the above pseudo code is made possible by implementing the following interface.
/*
* SimulationWrapper.h
*
* Created on: Dec 9, 2016
* Author: Glen
*/
#include <vector>
class SimulationWrapper {
public:
/// Create the wrapper given some configuration
SimulationWrapper(Configuration * config);
virtual ~SimulationWrapper();
/// Get the Current observation the agent can 'see'
virtual std::vector<double> getObservation();
/// Has the simulation reached the end of an epoch/episode
virtual bool endOfEpoch();
/*
Step the simulation through to the end of the action
*/
virtual double act(std::vector<double> action);
/*
Update the current agent/controller parameters
*/
virtual double updateAction(std::vector<double> action);
/// Perform one simulation update
virtual void update();
/// check whether or not the last action has completed and a new action is needed
virtual bool needUpdatedAction();
/*
Gets the state of the simuation not just the observation.
This state should be detailed enough to set the entire state of the simulation back to this state.
*/
virtual std::vector<double> getSimState();
/*
Sets the state of the simulation to the given state.
*/
virtual void setSimState(std::vector<double> state_);
/*
Computes the observation from the given simulation state.
*/
virtual std::vector<double> getObservationFromSimState(std::vector<double> state_);
/// Initialize the agent in the simulation, prepares all datastructures to begin simulation
virtual void init();
/// clears the data for the old epoch and creates a new epoch to simulate
virtual void initEpoch();
/// Generates data for a new epoch
virtual void generateEnvironmentSample();
/// Gets a good measure to evaluate the performance of agent cross an entire epoch. For example average reward over epoch/episode
virtual std::vector<double> getEvaluationData();
/// unloads everything from the simulation in preparation for termination.
virtual void finish();
/// Clears the data for the current epoch/episode
virtual void clear();
/// Get a pointer to the current actor. Can help with computing reward function values
virtual SimulationActor * getActor() const {return _actor;}
/// Has the agent fallen (into a non recoverable state)
virtual bool agentHasFallen();
/// Calculate the current reward. This is often computed over the simulation since the beging of an action.
virtual double calcReward() const ;
/// Interactive functions to doing things in the simulation, like resetting and throwing objects at the character
virtual void onKeyEvent(int key, int mouseX, int mouseY);
private:
SimulationAgent * _actor;
};