To handle big data, shrink it
As anyone who’s ever used a spreadsheet can attest, it’s often convenient to organize data into tables. But in the age of big data, those tables can be enormous, with millions or even hundreds of millions of rows. One way to make big-data analysis computationally practical is to reduce the size of data tables — or matrices, to use the mathematical term — by leaving out a bunch of rows. The trick is that the remaining rows have to be in some sense representative of the ones that were omitted, in order for computations performed on them to yield approximately the right results. At the ACM Symposium on Theory of Computing in June, MIT researchers will present a new algorithm that finds the smallest possible approximation of the original matrix that guarantees reliable computations. For a class of problems important in engineering and machine learning, this is a significant improvement over previous...