Brown receives $3.1M to improve data analysis

Eli Upfal
ELI UPFAL is a professor of computer science at Brown University. He will lead the school's $3.1 million project to make data-analysis software more statistically reliable. /COURTESY BROWN UNIVERSITY/JESSE POLHEMUS

PROVIDENCE – Computer scientists at Brown University have received $3.1 million from the Defense Advanced Research Projects Agency to build a new software for big-data exploration and analysis, the school announced April 26.

The software aims to help users determine whether the patterns they see in data sets are real, relevant correlations or simply coincidental. Current commercial data exploration tools don’t provide sufficiently rigorous tests to ensure results are statistically significant, according to the Brown team.

“The goal is to build a user-friendly system that can easily explore data and produce useful visualizations, but also continuously controls for the statistical validity of the results,” said Eli Upfal, professor of computer science at Brown and the project’s principal investigator, in a release.

Upfal, an expert in computational theory, will work mainly on the statistical side of the project. Tim Kraska, an assistant professor, and Carsten Binnig, adjunct associate professor, will work on the data management side of the project as experts in machine learning and databasing, while computer graphics expert Andy van Dam will work on the user interface and visualizations.

- Advertisement -

One of the major problems their software will tackle is known as the “multiple comparisons problem” – the random data fluctuations that are more likely to appear in a dataset the more questions and filters an analyst applies to the numbers, which appear like genuine correlations but are not statistically valid.

Without a proper test to verify the statistical significance of a correlation, these fluctuations can lead to false discoveries.

“To some extent it’s our fault here in computer science that we have made analysis of data so easy,” Upfal said. “If I give you a huge database and let you simply push a button to ask question after question, you’ll eventually reach something that’s there purely by chance.”

Upfal and the rest of the Brown team plan to create a system that continually monitors the questions a user asks while exploring data and warns them when they’re on shaky statistical ground, thus helping data analysts avoid making false discoveries.

The project will be part of Brown’s recently launched Data Science Initiative, which aims to develop novel approaches to dealing with data.

“Ultimately, we want to promote data science and see it be successful,” Upfal said. “We hope this project will be a step toward that.”

Kaylen Auer is a PBN contributing writer.

No posts to display