Runner
The Runner manages the episode loop, connecting an environment with a policy and an optional training monitor.
from tinyrl import GridWorld, Runner
env = GridWorld()
runner = Runner(env)
Constructor
Runner(env, monitor=None)
Args:
env— anEnvironmentinstancemonitor— aTrainingMonitorinstance. IfNone, a default one is created automatically.
Methods
run_episode(policy_fn, visualize=False, delay=0.1, log=True, return_trajectory=False)
Roll out a single episode.
Args:
policy_fn— a callable that takes an observation and returns either:intornp.ndarray— the action to takePolicyOutput— action plus optional logprob and entropy
visualize— ifTrue, callsenv.render()at each stepdelay— seconds between frames when visualizinglog— ifTrue, logs stats to the monitor. SetFalsefor eval runs.return_trajectory— ifTrue, populatesresult.trajectory
Returns: EpisodeResult
# basic
result = runner.run_episode(policy)
result.reward # total reward
result.steps # number of steps
# with trajectory
result = runner.run_episode(policy, return_trajectory=True)
result.trajectory # Trajectory object
# eval (no logging)
result = runner.run_episode(policy, log=False)
plot()
Calls self.monitor.plot() to display training curves.
for _ in range(500):
runner.run_episode(policy)
runner.plot()