Types

Data classes used throughout tinyrl.

PolicyOutput

Structured return type for policy functions.

from tinyrl import PolicyOutput

# just an action
PolicyOutput(action=2)

# with training info
PolicyOutput(action=2, logprob=-0.3, entropy=1.2)

Field	Type	Default	Description
`action`	`int \\| np.ndarray`	required	The action to take
`logprob`	`float \\| None`	`None`	Log-probability of the action
`entropy`	`float \\| None`	`None`	Entropy of the policy distribution

Step

A single environment transition, stored in trajectories.

Field	Type	Default	Description
`obs`	`np.ndarray`	required	Observation before the action
`action`	`int \\| np.ndarray`	required	Action taken
`reward`	`float`	required	Reward received
`next_obs`	`np.ndarray`	required	Observation after the action
`done`	`bool`	required	Whether the episode ended
`logprob`	`float \\| None`	`None`	Log-probability (if policy provided it)
`entropy`	`float \\| None`	`None`	Entropy (if policy provided it)

Trajectory

A full episode trajectory. Supports indexing, iteration, and batched property access.

result = runner.run_episode(policy, return_trajectory=True)
traj = result.trajectory

traj[0]            # first Step
len(traj)          # number of steps
for step in traj:  # iterate
    ...

Batched properties

Property	Shape	Description
`traj.obs`	`(length, state_dim)`	All observations
`traj.actions`	`(length,)` or `(length, action_dim)`	All actions
`traj.rewards`	`(length,)`	All rewards
`traj.logprobs`	`(length,)` or `None`	All log-probs (if provided)
`traj.entropies`	`(length,)` or `None`	All entropies (if provided)

EpisodeResult

Return type for Runner.run_episode().

Field	Type	Description
`reward`	`float`	Total episode reward
`steps`	`int`	Number of steps taken
`trajectory`	`Trajectory \\| None`	Full trajectory (if `return_trajectory=True`)