Gabor Lugosi - October 14, 2008

Randomized sequential prediction: performance bounds and algorithms

We discuss sequential prediction problems in which a decision maker 
repeatedly chooses an action and suffers a loss related to the chosen
action. The goal of the decision maker is to accumulate a total loss
that is not much larger than the best fixed action that could have
been chosen if all losses had been known in advance. This problem and
its variants have been thoroughly studied in game theory, learning
theory, and information theory and various well-performing randomized 
strategies exist. We consider cases in which the set of actions is
very large and has some combinatorial structure. Computing randomized 
strategies with small loss often raises nontrivial algorithmic 
problems. We present some examples and study the corresponding 
multi-armed bandit problems, that is, when the decision maker
only observes the loss of the selected action.