10 Sep 2019
    
    
    
    3889字
    
    
    
    13分
    
    
    
      
      次
    
    
    
    
     BY-SA 4.0(除特别声明或转载文章外)
  
  
  
  
    Game Tree Search
Generalizing search problems
  - So far: our search problems have assumed agent has complete control of environment
    
      - state does not change unless the agent (robot) changes it
 
- Assumption not always reasonable
    
      - other agents whose interests conflict with yours
 
- In these cases, we need to generalize our view of search to handle state changes that are not in the control of the agent
What are key features of a game
  - Players have their own interests
- Each player tries to alter the world so as to best benefit itself
- They are hard because: How you should play depends on how you think the other person will play; but how they play depends on how they think you will play
Game properties
  - Two-player
- Discrete: Game states or decisions can be mapped on discrete values.
- Finite: There are only a finite number of states and possible decisions that can be made
- Zero-sum (“): Fully competitive
    
      - if one player wins, the other loses an equal amount
- note that some games don’t have this property
 
- Deterministic: no chance involved
    
      - no dice, or random deals of cards, or coin flips, etc.
 
- Perfect information: all aspects of the state are fully observable
    
  
  - But R,P,S is a simple /one shot0(一次性) game
    
      - single move each
- in game theory: a strategic or normal form game (策略或范式博弈)
 
- Many games extend over multiple moves
    
      - turn-taking: players act alternatively
- e.g., chess, checkers, etc.
- in game theory: extensive form games (扩展形式博弈)
 
- We’ll focus on the extensive form
    
      - that’s where the computational questions emerges
 
Two-Player Zero-Sum Game – Definition
  - Two players A (Max) and B (Min)
- Set of states S (a finite set of states of the game)
- An initial stateI∈S(where game begins)
- Terminal positionsT⊆S(Terminal states of the game:states where the game is over)
- Successors (or Succs - a function that takes a state as inputand returns a set of possible next states to whomever is dueto move)
- Utility (效益) or payoff (收益) functionV:T→R.  (amapping from terminal states to real numbers that show howgood is each terminal state for player A and bad for player B.)Y. LiuIntro to AI8 / 48
Two-Player Zero-Sum Game – Intuition
  - Players alternate moves (starting with A, or Max)
    
      - Game ends when some terminal t ∊T is reached
 
- A game state: a state-player pair
    
      - Tells us what state we’re in and whose move it is
 
- Utility function and terminals replace goals
    
      - A, or Max, wants to maximize the terminal payoff
- B, or Min, wants to minimize the terminal payoff
 
- Think of it as:
    
      - A, or Max, gets V(t)and B, or Min, gets –V(t) for terminal node t
- This is why it’s called zero (or constant) sum
 
The MiniMax Strategy
  - Assume that the other player will always play their best move
    
      - you always play a move that will minimize the payoff thatcould be gained by the other player.
- By minimizing the other player’s payoff, you maximize yourown.
 
- Note that if you know that Min will play poorly in somecircumstances, there might be a better strategy than MiniMax(i.e., a strategy that gives you a better payoff)
- Build full game tree (all leaves are terminals)
    
      - Root is start state, edges are possible moves, etc.
- Label terminal nodes with utilities
 
- Back values upthe tree
    
      - U(t)is defined for all terminals (part of input)
- U(n)= min {U(c) : c isa child of n} if nis a Min node
- U(n)= max {U(c) : cis a child of n} if nis a Max node
 
DFS Minmax
  - Building the entire game tree and backing up values gives each player their strategy.
- However, the game tree is exponential in size.
- Furthermore, as we will see later it is not necessary to know all of the tree.
- To solve these problems we find a depth-firstimplementation of minimax.
- We run the depth-first search after each move to compute what is the next move for the MAXplayer. (We could do the same for the MINplayer).
- This avoids explicitly representing the exponentially sized game tree: we just compute each move as it is needed.
Pruning
  - It is not necessary to examine entire tree to make correct MiniMax decision
- Assume depth-first generation of tree
    
      - After generating value for only someof n’s children we can prove that we’ll never reach n in a MiniMax strategy.
- So we needn’t generate or evaluate any further children of n!
 
- Two types of pruning (cuts):
    
      - pruning of max nodes (α-cuts)
- pruning of min nodes (β-cuts)
 
Cutting Max Nodes (Alpha Cuts)
  - At a Max node n:
    
      - Let βbe the lowest value of n’s siblings examined so far (siblings to the left of n that have already been searched)
- Letαbe the highest value of n’s children examined so far (changes as children examined)
 
- While at a Max node n, if αbecomes ≥ βwe can stop expanding the children of n
    
      - Min will never choose to move from n’s parent to nsince it would choose one of n’s lower valued siblings first