UCSC-CRL-05-03: Average Reward Timed Games

06/25/2005 09:00 AM
Computer Engineering
We consider real-time games where the goal consists, for each player,
in maximizing the average amount of reward he or she receives per time unit. We
consider zero-sum rewards, so that a reward of +r to one player corresponds to
a reward of ?r to the other player. The games are played on discrete-time game
structures which can be specified using a two-player version of timed automata
whose locations are labeled by rewards. Even though the rewards themselves are
zero-sum, the games are not, due to the requirement that time must progress along
a play of the game.
Since we focus on control applications, we define the value of the game to a
player to be the maximal average reward per time unit that the player can ensure.
We show that in general the values to players 1 and 2 do not sum to zero. We
provide algorithms for computing the value of the game for either player; the al-
gorithms are based on the relationship between the original, infinite-round, game,
and a derived game that is played for only finitely many rounds. As positional op-
timal strategies exist for both players in both games, we show that the problem of
computing the value of the game is in NP¿coNP.

UCSC-CRL-05-03