Adversarial Search

Game Theory in AI

Games introduce a multi-agent environment where agents have conflicting goals.

Formalizing a Game

  1. Initial State: Board position and whose turn it is.
  2. Player(s): Which player has the move in state s.
  3. Actions(s): Returns legal moves in state s.
  4. Result(s, a): The transition model.
  5. Terminal-Test(s): Is the game over?
  6. Utility(s, p): Final numeric value for a game that ends in state s for player p (e.g., +1 for win, 0 for draw, -1 for loss).

The Minimax Algorithm

Idea: Player MAX wants to maximize the utility. Player MIN wants to minimize it.

We build a search tree down to the terminal nodes, compute the utility, and back up the values to the root.

function MINIMAX-DECISION(state) returns an action
    return argmax_{a \in ACTIONS(state)} MIN-VALUE(RESULT(state, a))

function MAX-VALUE(state) returns a utility value
    if TERMINAL-TEST(state) then return UTILITY(state)
    v \leftarrow -\infty
    for each a in ACTIONS(state) do
        v \leftarrow MAX(v, MIN-VALUE(RESULT(state, a)))
    return v

function MIN-VALUE(state) returns a utility value
    if TERMINAL-TEST(state) then return UTILITY(state)
    v \leftarrow \infty
    for each a in ACTIONS(state) do
        v \leftarrow MIN(v, MAX-VALUE(RESULT(state, a)))
    return v

Analysis — Minimax requires O(b^m) time and O(bm) space. For chess, b \approx 35, m \approx 100. This is completely infeasible to compute fully!

Alpha-Beta Pruning

Idea: We don’t need to look at every branch. If we find a move that is worse than a move we’ve already found, we can stop exploring that branch.

function ALPHA-BETA-SEARCH(state) returns an action
    v \leftarrow MAX-VALUE(state, -\infty, \infty)
    return the action in ACTIONS(state) that has value v

// Inside MAX-VALUE loop:
    v \leftarrow MAX(v, MIN-VALUE(RESULT(state, a), \alpha, \beta))
    if v \ge \beta then return v // Prune!
    \alpha \leftarrow MAX(\alpha, v)
Example: Alpha-Beta Intuition

You are picking a door to walk through (MAX). Behind each door is an opponent (MIN) who gets to choose what item to hand you.

You open Door 1, and the worst thing MIN can hand you is $10. So you are guaranteed at least $10.

You open Door 2, and the first item you see MIN holding is a live grenade. You don’t need to see what else MIN has behind Door 2; you immediately close it, because a live grenade is strictly worse than $10.

Note — Alpha-Beta pruning does not affect the final result! It just computes the exact same minimax decision much faster. With perfect move ordering, time complexity drops to O(b^{m/2}).

Real-Time Decisions

Because we still can’t search all the way to the end of most games, we:

  1. Cut the search off early (using a depth limit).
  2. Apply an Evaluation Function to non-terminal nodes to guess the utility (e.g., Material value in chess: Queen = 9, Rook = 5, etc.).11. A major flaw of cutting off search early is the “horizon effect,” where the agent pushes an inevitable bad event just past the search depth limit, falsely thinking it has avoided it.

  1. A major flaw of cutting off search early is the “horizon effect,” where the agent pushes an inevitable bad event just past the search depth limit, falsely thinking it has avoided it.↩︎