Week 3, Lecture 1 - Search I

Date: 10/10/22

I. Overview

We now move into the next unit of the course: Search. This is a reflex-based paradigm. We will still abide by the principles of the “model, learning, and inference” paradigm. To introduce search, take the following canonical problem:

Example 1 (Route planning) Given as the input a map, a source point and a destination point, output a sequence of actions (e.g., go straight, turn left, or turn right) that will take us from the source to the destination. Evaluate based on custom objective (e.g., most scenic route, shortest path, etc.)

  • There are many more examples that we can think of. Robot motion planning,solving puzzles, etc. Many traditional (non-ML) algorithmic paradigms (e.g., dynamic programming) make its way into AI through this medium. The main idea though, is that:
    • Search problems are defined by their objective and the actions it can take to get there.

Beyond reflex

picture 1

  • Reflex models are very high-level: e.g., classifiers which output a quick “yes” or “no”. Not dependent on any future/past predictions.
  • However, some applications of AI (such as solving puzzles), demand more.
  • Search problems are an instance of state-based models.
  • We are still building a predictor \(f\) which takes an input \(x\), but will now return an entire action sequence, not just a single action (i.e., a “reflex”).
  • You can think of the difference as defined by the task. Solving a puzzle is not the same as classifying cats vs. monkeys.
  • Can’t you solve a search problem just by iteratively multiplying the reflex-based predictions to get an action sequence?
    • Not really… future actions should be conditioned on past actions!
    • But… think of this: “if you’re walking around Stanford for the first time, you might have to really plan things out, but eventually it kind of becomes reflex.”.
  • We have looked at many real-world examples of this new “search” paradigm. For each example, the key is to decompose the output solution into a sequence of primitive actions. In addition, we need to think about how to evaluate different possible outputs.

Roadmap

  • In the ML unit, we had the “model, learning, and inference” paradigm. We adapt this for search too.
    • Machine learning:
      • Modeling: feature extractor + neural architecture
      • Inference: model forward pass (\(f(x)\))
      • Learning: algorithms such as stochastic gradient descent to find optimal model weights
    • Search: find out down below! We will start with modeling and inference, then learning next lecture.

Here is a roadmap of what we will cover:

picture 2

The main idea is to cover how we define (“model”) search problems, and come up with algorithms to solve them.

Note

(State-based vs reflex-models) The main TL;DR is to understand the following difference and what problems we can now solve with this new tool:

Reflex-based modeling:

\[ x \rightarrow f \rightarrow \text { single action } y \in\{-1,+1\} \]

(Search) State-based modeling:

\[ x \rightarrow f \rightarrow \text { action sequence }\left(a_1, a_2, a_3, a_4, \ldots\right) \]

This allows us to not only solve things like classification problems with reflex-based modeling, but also now solve things like route planning with state-based modeling.

II. Modeling

Ok… cool! We now have a general intuition for what is search/state-based modeling. Let’s get formal now with definitions.

We will introduce modeling via the following example.

Example 2 (Farmer search problems) A farmer wants to get his cabbage, goat, and wolf across a river. He has a boat that only holds two. He cannot leave the cabbage and goat alone or the goat and wolf alone. How many river crossings does he need?

  • You probably thought of each scenario playing out in your head. How can we build a system to do this automatically.
  • Search problems define the possibilities, and search algorithms explore these possibilities.
  • For this problem, we have eight possible actions, which will be denoted by a concise set of symbols.

picture 4

  • We can start to build a search tree of each “what-if” scenario. Playing out each scenario, we get the following:

picture 5

  • Each scenario is represented by a “branch” of the search tree.
  • We terminate the branch (leaf) once we reached a final state (win or lose).

Let’s formalize this.

Defining the search problem

Note

Definition 1 (Search Problem) A search problem is built upon the following:

  • \(s_{\text {start }}\) : starting state
  • Actions \((s)\) : possible actions
  • \(\operatorname{Cost}(s, a)\) : action cost
  • \(\operatorname{Succ}(s, a)\) : successor
  • IsEnd \((s)\) : reached end state?

This can be visually represented/solved with a search tree. The root of the tree is the start state \(s_{\text {start }}\), and the leaves are the end states (IsEnd \((s)\) is true). Each edge leaving a node \(s\) corresponds to a possible action \(a \in \operatorname{Actions}(s)\) that could be performed in state \(s\). The edge is labeled with the action and its cost, written \(a: \operatorname{Cost}(s, a)\). The action leads deterministically to the successor state \(\operatorname{Succ}(s, a)\), represented by the child node.

In the example, each root-to-leaf path represents a possible action sequence, and the sum of the costs of the edges is the cost of that path. The goal is to find the root-to-leaf path that ends in a valid end state with minimum cost.

IV. Dynamic Programming

  • Time-complexity was pretty bad for the tree traversal algorithms for solving search problems. Lots of it is simply too brute-force to be reasonable (at worst case).
  • Time-complexity was growing exponentially because we’d have to recurse through the whole maze.
  • Let’s see how DP can speed this up.

Here is the general idea:

Example 3 take the following graph:

picture 11

in tree search (take backtracking), we’d have to recurse through each branch, for every branch. Exponential time complexity! However, notice that, for example, once we hit a \(5\), the rest of the branch is the same. This is the key idea. We can store the subproblem path of starting from \(5\) to the solution (leaf) in a lookup table and next time we are at a \(5\), we just look it up! We save tons of time but sacrifice a bit of space while were at it (which is ok!). This concept is called memoization.

This slide formalizes the concept of dynamic programming.

Note
Definition 2 (Dynamic programming)

picture 12

In code, this would be something like:

if table[subproblem_solution] is not None:
    return table[subproblem_solution]
else:
    # recurse as usual 

    # base case 
    solution = -1 
    if isEnd(state):
        solution = ... 
    # recursive step 
    for action in actions:
        ... 
        solution = ...
        

    table[subproblem_solution] = solution 
    return solution 
  • Key observation is that the future costs only depend on current city… not on what we did in the past to get to that city.
    • What we really mean is that future cost only depends on state.
    • State goes from “past sequence of actions (e.g.: 1 –> 2 –> 5 –> 6) to just the current thing (e.g.: 6).
    • Exponential to polynomial time (based on number of states in polynomial time (\(O(n)\))).

Idea of “state”:

Note
Definition 3 (State) A state is a summary of all the past actions sufficient to choose future actions optimally. e.g.,:

picture 15

Adding a constraint

Cool… but what happens if we add a constraint dependent on the state.

Example 4 (Route finding) Find the minimum cost path from city 1 to city \(n\), only moving forward. It costs \(c_{i j}\) to go from \(i\) to \(j\).

Constraint: Can’t visit three odd cities in a row.

  • Note that we choose what the “state” is in these DP problems. However, time and space complexity is dependent on the size of the state. So choose wisely. A bit of a balance.
    • This is why the definition of what “state” is from above makes sense: get the minimal amount of information that you need to know that is sufficient enough to choose actions optimally.

With this constraint in mind, we can now define a new state definition for this problem.

\[ \text{state = (previous city, current city)} \]

example in this context then:

\[S_0:(n / a, 1)\] \[S_1:(1,3)\] \[S_2:(3,7)\]

State space (complexity) is then \(|S|=N \times N=N^2\). We can do better. Define state as:

\[ \text{state = (if previous city was odd, current city)} \]

this gives us \(|S| = 2n\).

The whole example here is to drill in the idea of a “state” and how dynamic it can really be defined (but also how important it is to define a good state).

Acyclic assumption

Note

Definition 4 (Acyclic assumption) The state graph defined by \(\operatorname{Actions}(s)\) and \(\operatorname{Succ}(s, a)\) is acyclic.

Here is a quick primer on the relevant graph algorithm background.

  • Cycle: a path that starts from a given vertex and ends at the same vertex is called a cycle (ref)2.
  • Acyclic vs. cyclic:

picture 16

  • That is, we need an ordering of the states such that we can’t go “back to a state” to acheive the optimal path.
  • So basically, our state graph just has to be acyclic.
  • Uniform cost search resolves this.

TL;DR

Note
  • DP: backtracking + memoization
  • Memoization: cache all solutions to each subproblem. Next time you encounter subproblem, lookup solution in cache instead of redoing the recursion.
  • DP offers speed-up in time complexity: exponential to polynomial.
  • State: how you define your state plays a huge role in performance. Get the minimal amount of information that you need to know that is sufficient enough to choose actions optimally in the future.
  • However, DP depends on the assumption that the graph being traversed is acyclic. That is, there exists some sequential ordering in the problem (no going backwards!).
  • Uniform cost search addresses this issue.

Footnotes

  1. can’t this vary though depending on the depth that we are at?↩︎

  2. this site actually has some good info.↩︎

  3. note that this is an important thing to understand as it gives rise to the motivation for \(A^*\). In UCS, we uniformly align the nodes in order. However, as we will see, \(A^*\) has a bias knowing the end goal.↩︎