10 Sep 2019
3889字
13分
次
BY-SA 4.0(除特别声明或转载文章外)
Game Tree Search
Generalizing search problems
- So far: our search problems have assumed agent has complete control of environment
- state does not change unless the agent (robot) changes it
- Assumption not always reasonable
- other agents whose interests conflict with yours
- In these cases, we need to generalize our view of search to handle state changes that are not in the control of the agent
What are key features of a game
- Players have their own interests
- Each player tries to alter the world so as to best benefit itself
- They are hard because: How you should play depends on how you think the other person will play; but how they play depends on how they think you will play
Game properties
- Two-player
- Discrete: Game states or decisions can be mapped on discrete values.
- Finite: There are only a finite number of states and possible decisions that can be made
- Zero-sum (“): Fully competitive
- if one player wins, the other loses an equal amount
- note that some games don’t have this property
- Deterministic: no chance involved
- no dice, or random deals of cards, or coin flips, etc.
- Perfect information: all aspects of the state are fully observable
- But R,P,S is a simple /one shot0(一次性) game
- single move each
- in game theory: a strategic or normal form game (策略或范式博弈)
- Many games extend over multiple moves
- turn-taking: players act alternatively
- e.g., chess, checkers, etc.
- in game theory: extensive form games (扩展形式博弈)
- We’ll focus on the extensive form
- that’s where the computational questions emerges
Two-Player Zero-Sum Game – Definition
- Two players A (Max) and B (Min)
- Set of states S (a finite set of states of the game)
- An initial stateI∈S(where game begins)
- Terminal positionsT⊆S(Terminal states of the game:states where the game is over)
- Successors (or Succs - a function that takes a state as inputand returns a set of possible next states to whomever is dueto move)
- Utility (效益) or payoff (收益) functionV:T→R. (amapping from terminal states to real numbers that show howgood is each terminal state for player A and bad for player B.)Y. LiuIntro to AI8 / 48
Two-Player Zero-Sum Game – Intuition
- Players alternate moves (starting with A, or Max)
- Game ends when some terminal t ∊T is reached
- A game state: a state-player pair
- Tells us what state we’re in and whose move it is
- Utility function and terminals replace goals
- A, or Max, wants to maximize the terminal payoff
- B, or Min, wants to minimize the terminal payoff
- Think of it as:
- A, or Max, gets V(t)and B, or Min, gets –V(t) for terminal node t
- This is why it’s called zero (or constant) sum
The MiniMax Strategy
- Assume that the other player will always play their best move
- you always play a move that will minimize the payoff thatcould be gained by the other player.
- By minimizing the other player’s payoff, you maximize yourown.
- Note that if you know that Min will play poorly in somecircumstances, there might be a better strategy than MiniMax(i.e., a strategy that gives you a better payoff)
- Build full game tree (all leaves are terminals)
- Root is start state, edges are possible moves, etc.
- Label terminal nodes with utilities
- Back values upthe tree
- U(t)is defined for all terminals (part of input)
- U(n)= min {U(c) : c isa child of n} if nis a Min node
- U(n)= max {U(c) : cis a child of n} if nis a Max node
DFS Minmax
- Building the entire game tree and backing up values gives each player their strategy.
- However, the game tree is exponential in size.
- Furthermore, as we will see later it is not necessary to know all of the tree.
- To solve these problems we find a depth-firstimplementation of minimax.
- We run the depth-first search after each move to compute what is the next move for the MAXplayer. (We could do the same for the MINplayer).
- This avoids explicitly representing the exponentially sized game tree: we just compute each move as it is needed.
Pruning
- It is not necessary to examine entire tree to make correct MiniMax decision
- Assume depth-first generation of tree
- After generating value for only someof n’s children we can prove that we’ll never reach n in a MiniMax strategy.
- So we needn’t generate or evaluate any further children of n!
- Two types of pruning (cuts):
- pruning of max nodes (α-cuts)
- pruning of min nodes (β-cuts)
Cutting Max Nodes (Alpha Cuts)
- At a Max node n:
- Let βbe the lowest value of n’s siblings examined so far (siblings to the left of n that have already been searched)
- Letαbe the highest value of n’s children examined so far (changes as children examined)
- While at a Max node n, if αbecomes ≥ βwe can stop expanding the children of n
- Min will never choose to move from n’s parent to nsince it would choose one of n’s lower valued siblings first