Zhang Abstract: Efficient and noise-tolerant active learning of sparse halfspaces

We study the problem of efficient noise-tolerant PAC active learning of homogeneous linear classifiers (halfspaces) in R^d, where the goal is to learn a halfspace with low excess error using as few label queries as possible. We consider the eta-bounded label noise model, in that the label y of every instance x is generated by predicting with an underlying s-sparse halfspace (s << d), then flipped with an (potentially instance-dependent) probability at most eta < 1/2. We would like our active learning algorithm to be attribute efficient, i.e. to have label requirements sublinear in d.

In this talk, we present an efficient algorithm that achieves the above goal, with a label requirement of O(s / (1-2 eta)^4 polylog(d,1/epsilon)). Our result is novel even in the non-sparse setting. In contrast, existing algorithms are either computationally inefficient, or subject to label requirements exponential in 1/(1-2 eta).

Joint work with Jie Shen (Stevens Institute of Technology) and Pranjal Awasthi (Rutgers). Paper link: https://arxiv.org/abs/2002.04840.