Skip to content

GridWorld

A simple grid-based MDP. The agent starts at the top-left corner and must navigate to the bottom-right goal.

from tinyrl import GridWorld

env = GridWorld(size=5)

MDP specification

Component Value
States (row, col) positions, normalized to [0, 1]
Actions 0=up, 1=right, 2=down, 3=left
Transitions Deterministic. Moves in the chosen direction, clipped at walls
Rewards -1 per step, +10 when reaching the goal
Start (0, 0) — top-left
Goal (size-1, size-1) — bottom-right
Horizon 50 steps max

Constructor

GridWorld(size=5)

Args:

  • size — grid dimension (creates a size x size grid). Default: 5.

Example

from tinyrl import GridWorld

env = GridWorld()
obs = env.reset()          # array([0., 0.])
obs, r, done = env.step(2) # move down -> array([0.25, 0.]), r=-1.0
obs, r, done = env.step(1) # move right -> array([0.25, 0.25]), r=-1.0

Optimal policy

The shortest path from (0,0) to (4,4) takes 8 steps (4 down + 4 right), giving a total reward of +2 (7 steps × -1 + 1 step with +10).