人工智能（四）

Game Tree Search

So far: our search problems have assumed agent has complete control of environment
- state does not change unless the agent (robot) changes it
Assumption not always reasonable
- other agents whose interests conﬂict with yours
In these cases, we need to generalize our view of search to handle state changes that are not in the control of the agent

Players have their own interests
Each player tries to alter the world so as to best beneﬁt itself
They are hard because: How you should play depends on how you think the other person will play; but how they play depends on how they think you will play

Two-player
Discrete: Game states or decisions can be mapped on discrete values.
Finite: There are only a ﬁnite number of states and possible decisions that can be made
Zero-sum (“): Fully competitive
- if one player wins, the other loses an equal amount
- note that some games don’t have this property
Deterministic: no chance involved
- no dice, or random deals of cards, or coin ﬂips, etc.
Perfect information: all aspects of the state are fully observable
- e.g., no hidden cards.

But R,P,S is a simple /one shot0(一次性) game
- single move each
- in game theory: a strategic or normal form game (策略或范式博弈)
Many games extend over multiple moves
- turn-taking: players act alternatively
- e.g., chess, checkers, etc.
- in game theory: extensive form games (扩展形式博弈)
We’ll focus on the extensive form
- that’s where the computational questions emerges

Two players A (Max) and B (Min)
Set of states S (a finite set of states of the game)
An initial stateI∈S(where game begins)
Terminal positionsT⊆S(Terminal states of the game:states where the game is over)
Successors (or Succs - a function that takes a state as inputand returns a set of possible next states to whomever is dueto move)
Utility (效益) or payoff (收益) functionV:T→R. (amapping from terminal states to real numbers that show howgood is each terminal state for player A and bad for player B.)Y. LiuIntro to AI8 / 48

Players alternate moves (starting with A, or Max)
- Game ends when some terminal t ∊T is reached
A game state: a state-player pair
- Tells us what state we’re in and whose move it is
Utility function and terminals replace goals
- A, or Max, wants to maximize the terminal payoff
- B, or Min, wants to minimize the terminal payoff
Think of it as:
- A, or Max, gets V(t)and B, or Min, gets –V(t) for terminal node t
- This is why it’s called zero (or constant) sum

Assume that the other player will always play their best move
- you always play a move that will minimize the payoff thatcould be gained by the other player.
- By minimizing the other player’s payoff, you maximize yourown.
Note that if you know that Min will play poorly in somecircumstances, there might be a better strategy than MiniMax(i.e., a strategy that gives you a better payoff)
Build full game tree (all leaves are terminals)
- Root is start state, edges are possible moves, etc.
- Label terminal nodes with utilities
Back values upthe tree
- U(t)is defined for all terminals (part of input)
- U(n)= min {U(c) : c isa child of n} if nis a Min node
- U(n)= max {U(c) : cis a child of n} if nis a Max node

Building the entire game tree and backing up values gives each player their strategy.
However, the game tree is exponential in size.
Furthermore, as we will see later it is not necessary to know all of the tree.
To solve these problems we find a depth-firstimplementation of minimax.
We run the depth-first search after each move to compute what is the next move for the MAXplayer. (We could do the same for the MINplayer).
This avoids explicitly representing the exponentially sized game tree: we just compute each move as it is needed.

It is not necessary to examine entire tree to make correct MiniMax decision
Assume depth-first generation of tree
- After generating value for only someof n’s children we can prove that we’ll never reach n in a MiniMax strategy.
- So we needn’t generate or evaluate any further children of n!
Two types of pruning (cuts):
- pruning of max nodes (α-cuts)
- pruning of min nodes (β-cuts)

At a Max node n:
- Let βbe the lowest value of n’s siblings examined so far (siblings to the left of n that have already been searched)
- Letαbe the highest value of n’s children examined so far (changes as children examined)
While at a Max node n, if αbecomes ≥ βwe can stop expanding the children of n
- Min will never choose to move from n’s parent to nsince it would choose one of n’s lower valued siblings first