Generalizing search problems

  • So far: our search problems have assumed agent has complete control of environment
    • state does not change unless the agent (robot) changes it
  • Assumption not always reasonable
    • other agents whose interests conflict with yours
  • In these cases, we need to generalize our view of search to handle state changes that are not in the control of the agent

What are key features of a game

  • Players have their own interests
  • Each player tries to alter the world so as to best benefit itself
  • They are hard because: How you should play depends on how you think the other person will play; but how they play depends on how they think you will play

Game properties

  • Two-player
  • Discrete: Game states or decisions can be mapped on discrete values.
  • Finite: There are only a finite number of states and possible decisions that can be made
  • Zero-sum (“): Fully competitive
    • if one player wins, the other loses an equal amount
    • note that some games don’t have this property
  • Deterministic: no chance involved
    • no dice, or random deals of cards, or coin flips, etc.
  • Perfect information: all aspects of the state are fully observable
    • e.g., no hidden cards.

Extensive Form Two-Player Zero-Sum Games

  • But R,P,S is a simple /one shot0(一次性) game
    • single move each
    • in game theory: a strategic or normal form game (策略或范式博弈)
  • Many games extend over multiple moves
    • turn-taking: players act alternatively
    • e.g., chess, checkers, etc.
    • in game theory: extensive form games (扩展形式博弈)
  • We’ll focus on the extensive form
    • that’s where the computational questions emerges

Two-Player Zero-Sum Game – Definition

  • Two players A (Max) and B (Min)
  • Set of states S (a finite set of states of the game)
  • An initial stateI∈S(where game begins)
  • Terminal positionsT⊆S(Terminal states of the game:states where the game is over)
  • Successors (or Succs - a function that takes a state as inputand returns a set of possible next states to whomever is dueto move)
  • Utility (效益) or payoff (收益) functionV:T→R. (amapping from terminal states to real numbers that show howgood is each terminal state for player A and bad for player B.)Y. LiuIntro to AI8 / 48

Two-Player Zero-Sum Game – Intuition

  • Players alternate moves (starting with A, or Max)
    • Game ends when some terminal t ∊T is reached
  • A game state: a state-player pair
    • Tells us what state we’re in and whose move it is
  • Utility function and terminals replace goals
    • A, or Max, wants to maximize the terminal payoff
    • B, or Min, wants to minimize the terminal payoff
  • Think of it as:
    • A, or Max, gets V(t)and B, or Min, gets –V(t) for terminal node t
    • This is why it’s called zero (or constant) sum

The MiniMax Strategy

  • Assume that the other player will always play their best move
    • you always play a move that will minimize the payoff thatcould be gained by the other player.
    • By minimizing the other player’s payoff, you maximize yourown.
  • Note that if you know that Min will play poorly in somecircumstances, there might be a better strategy than MiniMax(i.e., a strategy that gives you a better payoff)
  • Build full game tree (all leaves are terminals)
    • Root is start state, edges are possible moves, etc.
    • Label terminal nodes with utilities
  • Back values upthe tree
    • U(t)is defined for all terminals (part of input)
    • U(n)= min {U(c) : c isa child of n} if nis a Min node
    • U(n)= max {U(c) : cis a child of n} if nis a Max node

DFS Minmax

  • Building the entire game tree and backing up values gives each player their strategy.
  • However, the game tree is exponential in size.
  • Furthermore, as we will see later it is not necessary to know all of the tree.
  • To solve these problems we find a depth-firstimplementation of minimax.
  • We run the depth-first search after each move to compute what is the next move for the MAXplayer. (We could do the same for the MINplayer).
  • This avoids explicitly representing the exponentially sized game tree: we just compute each move as it is needed.

Pruning

  • It is not necessary to examine entire tree to make correct MiniMax decision
  • Assume depth-first generation of tree
    • After generating value for only someof n’s children we can prove that we’ll never reach n in a MiniMax strategy.
    • So we needn’t generate or evaluate any further children of n!
  • Two types of pruning (cuts):
    • pruning of max nodes (α-cuts)
    • pruning of min nodes (β-cuts)

Cutting Max Nodes (Alpha Cuts)

  • At a Max node n:
    • Let βbe the lowest value of n’s siblings examined so far (siblings to the left of n that have already been searched)
    • Letαbe the highest value of n’s children examined so far (changes as children examined)
  • While at a Max node n, if αbecomes ≥ βwe can stop expanding the children of n
    • Min will never choose to move from n’s parent to nsince it would choose one of n’s lower valued siblings first