GridWorld
A simple grid-based MDP. The agent starts at the top-left corner and must navigate to the bottom-right goal.
from tinyrl import GridWorld
env = GridWorld(size=5)
MDP specification
| Component | Value |
|---|---|
| States | (row, col) positions, normalized to [0, 1] |
| Actions | 0=up, 1=right, 2=down, 3=left |
| Transitions | Deterministic. Moves in the chosen direction, clipped at walls |
| Rewards | -1 per step, +10 when reaching the goal |
| Start | (0, 0) — top-left |
| Goal | (size-1, size-1) — bottom-right |
| Horizon | 50 steps max |
Constructor
GridWorld(size=5)
Args:
size— grid dimension (creates asize x sizegrid). Default:5.
Example
from tinyrl import GridWorld
env = GridWorld()
obs = env.reset() # array([0., 0.])
obs, r, done = env.step(2) # move down -> array([0.25, 0.]), r=-1.0
obs, r, done = env.step(1) # move right -> array([0.25, 0.25]), r=-1.0
Optimal policy
The shortest path from (0,0) to (4,4) takes 8 steps (4 down + 4 right), giving a total reward of +2 (7 steps × -1 + 1 step with +10).