Games introduce a multi-agent environment where agents have conflicting goals.
Idea: Player MAX wants to maximize the utility. Player MIN wants to minimize it.
We build a search tree down to the terminal nodes, compute the utility, and back up the values to the root.
function MINIMAX-DECISION(state) returns an actionreturn argmax_{a \in ACTIONS(state)} MIN-VALUE(RESULT(state, a))function MAX-VALUE(state) returns a utility valueif TERMINAL-TEST(state) then return UTILITY(state)v \leftarrow -\inftyfor each a in ACTIONS(state) dov \leftarrow MAX(v, MIN-VALUE(RESULT(state, a)))return vfunction MIN-VALUE(state) returns a utility valueif TERMINAL-TEST(state) then return UTILITY(state)v \leftarrow \inftyfor each a in ACTIONS(state) dov \leftarrow MIN(v, MAX-VALUE(RESULT(state, a)))return v
Analysis — Minimax requires O(b^m) time and O(bm) space. For chess, b \approx 35, m \approx 100. This is completely infeasible to compute fully!
Idea: We don’t need to look at every branch. If we find a move that is worse than a move we’ve already found, we can stop exploring that branch.
function ALPHA-BETA-SEARCH(state) returns an actionv \leftarrow MAX-VALUE(state, -\infty, \infty)return the action in ACTIONS(state) that has value v// Inside MAX-VALUE loop:v \leftarrow MAX(v, MIN-VALUE(RESULT(state, a), \alpha, \beta))if v \ge \beta then return v // Prune!\alpha \leftarrow MAX(\alpha, v)
You are picking a door to walk through (MAX). Behind each door is an opponent (MIN) who gets to choose what item to hand you.
You open Door 1, and the worst thing MIN can hand you is $10. So you are guaranteed at least $10.
You open Door 2, and the first item you see MIN holding is a live grenade. You don’t need to see what else MIN has behind Door 2; you immediately close it, because a live grenade is strictly worse than $10.
Note — Alpha-Beta pruning does not affect the final result! It just computes the exact same minimax decision much faster. With perfect move ordering, time complexity drops to O(b^{m/2}).
Because we still can’t search all the way to the end of most games, we:
A major flaw of cutting off search early is the “horizon effect,” where the agent pushes an inevitable bad event just past the search depth limit, falsely thinking it has avoided it.↩︎