Variable Selection is Hard
Who: Howard Karloff
When: Feb 9, 1-2pm
Where: Computer Science Building, room 150/151
Consider the task of a machine-learning system faced with voluminous data on m individuals. There may be p=10^6 features describing each individual. How can the algorithm find a small set of features that “best” describes the individuals? People usually seek small feature sets both because models with small feature sets are understandable and because simple models usually generalize better.
We study the simple case of linear regression, in which a user has an m x p matrix B and a vector y, and seeks a p-vector x *with as few nonzeroes as possible* such that Bx is approximately equal to y, and we call it SPARSE REGRESSION. There are numerous algorithms in the statistical literature for SPARSE REGRESSION, such as Forward Selection, Backward Elimination, LASSO, and Ridge Regression.
We give a general hardness proof that (subject to a complexity assumption) no polynomial-time algorithm can give good performance (in the worst case) for SPARSE REGRESSION, even if it is allowed to include more variables than necessary, and even if it need only find an x such that Bx is relatively far from y.
This is joint work with Dean Foster and Justin Thaler and was done when all coauthors were at Yahoo Labs.
Bio: After receiving a PhD from UC Berkeley, Howard Karloff taught at the University of Chicago and Georgia Tech before leaving Georgia Tech to join AT&T Labs–Research in 1999. He left ATT Labs in 2013 to join Yahoo Labs in New York, where he stayed till February, 2015. Now he does data science for Goldman Sachs in New York.
A fellow of the ACM, he has served on the program committees of numerous conferences and chaired the 1998 SODA program committee. He is the author of numerous journal and conference articles and the Birkhauser book “Linear Programming.” His interests include data science, machine learning, algorithms, and optimization.