Background — So far we’ve systematically explored the search space with memory to get a path. But the path is irrelevant in many optimization problems (e.g., N-Queens, we just want the final board).
Local Search Algorithms: Keep a single “current” state and try to improve it.11. Because they don’t maintain a frontier or explored set, they operate in O(1) space.
aka: greedy local search
“Like climbing Everest in thick fog with amnesia”22. just like me for real.
function HILL-CLIMBING(problem) returns a state that is a local maximumcurrent \leftarrow MAKE-NODE(problem.INITIAL-STATE)loop doneighbor \leftarrow a highest-valued successor of currentif VALUE[neighbor] \le VALUE[current] then return STATE[current]current \leftarrow neighbor
Note — Gets stuck easily in local maxima, ridges, and plateaus.33. A ridge is a sequence of local maxima that is very difficult for greedy algorithms to navigate, like walking along the top of a mountain range.
Sideways moves (allowing neighbor value == current value) can help escape plateaus, but might cause infinite loops.
Stochastic Hill Climbing: Chooses randomly from among the uphill moves (probability can vary with steepness).
First-Choice Hill Climbing: Randomly generates successors until one is better than current (good when there are thousands of successors).
Random-Restart Hill Climbing: “If at first you don’t succeed, try, try again.” Runs HC from randomly generated initial states until goal found.
Idea: Allow some bad (downward) moves to escape local maxima, but gradually decrease their frequency.
function SIMULATED-ANNEALING(problem, schedule) returns a solution statecurrent \leftarrow MAKE-NODE(problem.INITIAL-STATE)for t = 1 to infinity doT \leftarrow schedule(t)if T = 0 then return currentnext \leftarrow a randomly selected successor of current\Delta E \leftarrow VALUE[next] \text{$-$} VALUE[current]if \Delta E \gt 0 then current \leftarrow nextelse current \leftarrow next only with probability e^{\Delta E / T}
Why? — Temperature (T) controls the probability of downward steps. High T means more random exploration. Low T means it acts like normal Hill Climbing.
Idea: Track k states simultaneously. Start with k randomly generated states.44. It’s called “beam” search because instead of illuminating the whole tree (BFS) or one deep shaft (DFS), you shine a “beam” of width k down the most promising paths.
Recall — This isn’t random-restart! Random-restart runs k independent searches. Beam search passes information between the branches since the k best states could all come from one superior parent.
A variant of stochastic beam search where successor states are generated by combining two parent states.
Parent 1:
1010|111
Parent 2:
0000|000
Child:
1010|000 (First half from parent 1, second half from parent 2)
When an agent operates in an environment where information is incomplete, the search problem fundamentally changes. There are four main problem types based on the level of observability and determinism:
The key insight for dealing with missing information is that the agent must reason about belief states rather than physical states.55. A belief state represents the agent’s current knowledge about the world, essentially grouping together all physical states that are indistinguishable given the sensor history.
A sensorless problem requires finding a pure sequence of actions that will coerce the world into the goal state no matter the starting position.
A contingency problem requires a tree-structured or conditional plan that branches based on sensor readings.
[Right, Suck][Right, Suck, Left, Suck][Left, if Dirty then Suck, Right, if Dirty then Suck]Because they don’t maintain a frontier or explored set, they operate in O(1) space.↩︎
just like me for real.↩︎
A ridge is a sequence of local maxima that is very difficult for greedy algorithms to navigate, like walking along the top of a mountain range.↩︎
It’s called “beam” search because instead of illuminating the whole tree (BFS) or one deep shaft (DFS), you shine a “beam” of width k down the most promising paths.↩︎
A belief state represents the agent’s current knowledge about the world, essentially grouping together all physical states that are indistinguishable given the sensor history.↩︎