Jun Abstract: Adaptive data collection for accelerating discovery rates

The standard machine learning methods, called "supervised learning", take in a dataset passively and then build a model that can make accurate predictions for future data. In many situations, however, we can choose actively which data to collect (or desire to do so to maximally use the budget). That is, we may collect data wisely (e.g., adaptive experiments) so we use significantly less data while achieving the same performance (e.g., identification of interesting genes). At the same time, adaptive data collection means that we are breaking the standard i.i.d. assumption on the data, which is a significant challenge as theorems and principles developed for supervised learning do not apply here. In this talk, I will talk about novel adaptive data collection and learning algorithms arising from the so-called multi-armed bandit framework and show their theoretical guarantees and their effectiveness in real-world applications including biological experiments. I will also make connections to sequential design of experiments proposed by Chernoff in 1959.