Skip to Content, Navigation, or Footer.


Big Data research at the University took a big leap forward last week when the National Institute of Health and National Science Foundation awarded $1.5 million to further the research of three computer science professors.

For the last four years, Professor of Computer Science Eli Upfal, Assistant Professor of Computer Science Fabio Vandin and Associate Professor of Computer Science Ben Raphael have been working on developing algorithms to analyze particularly large data sets, with a specific emphasis on its application to genes and their functions. With the help of this grant, they hope to develop analysis tools that are "rigorous, reliable and statistically sound," Raphael said. 

"Big Data is the biggest trend in technology these days," Upfal said. "Tools we develop here will feed into a lot of different areas."  Upfal, who has a background in probability and statistics and is the chief algorithms specialist of the research team, said he believes Big Data is the result of developments in computer science that provide the ability to process, store and generate enormous amounts of data from sources ranging from Facebook profiles to gene sequences. 

Correct interpretation of these data can reveal "associations between what you see in the data and outside phenomena, which can help explain a lot about these phenomena," Upfal said.

Big Data "upholds an interdisciplinary problem since it has implications in the social sciences as well as life sciences and physical sciences, Raphael said. "Across all disciplines, people are being flooded with data." 

The research team chose genomics as its first application for the algorithms it is developing on Big Data analysis because data in this field is already available to the University through Raphael, whose main research is in the field of computational biology. Raphael has spent the past six years looking at biological data sets to identify groups of mutations in DNA sequences that are responsible for cancer. "This grant is more about the algorithm itself," Raphael said. "We want to develop tools that will help us analyze networks of interactions, tools that will help us mine Big Data sets from various fields." 

The researchers also aim to use this grant to develop algorithms that can be efficiently scaled to large data sets and answer the question of how much data is needed to have confidence in the algorithm's prediction, Raphael said. 

There are several steps involved in Big Data analysis. "As usual in research, one of the hardest things to do is to identify interesting problems to work with," Upfal said, adding that this is partly why they chose to work with biological data - it is analytical and emphasizes the use of mathematics and probability, allowing proofs to be scientifically valid. 

The process of modeling and developing algorithms is the "creative part" of the research, Upfal said. He acknowledged that this can be hard because it is not always clear which mathematical model should be used on which data set, and using different ones can change the results.

A handful of students are working on this project, including Patrick Clay '13 and PhD candidates Matteo Riondato GS and Max Leiserson GS. Riondato got involved with the research as part of his master's project after meeting Upfal in 2008. "I began working on data mining, which is the extraction of specific golden nuggets of information that's hidden in large data sets, without having to actually look at the entire data set," he said. He said he has found working on Big Data rewarding because it demands expertise in many areas, forcing him to develop different kinds of skills and combine them for research.

Clay began working with Upfal in his sophomore year after taking a class with him on probability algorithms and is currently focusing on exploring how a given analysis would be affected if a data set is grown. He was intrigued by the math-heavy part of the research and by the possibility of analyzing "large data sets with simpler, less intelligent processing techniques, which Big Data enables you to do," he said. "The more data you have, the greater potential there is to finding interesting results with simpler algorithms." He said he is excited by the prospect of contributing to a scientific achievement - "a small one, but a real one" - and by the possibility of having something published.


Powered by SNworks Solutions by The State News
All Content © 2022 The Brown Daily Herald, Inc.