Samuel Scarpino bio photo

Samuel Scarpino

Director of AI + Life Sciences and Professor of the Practice, Northeastern University.

Email Twitter Blue Sky LinkedIn Instagram Github Google Scholar

Big Data alone can't solve our problems

This deceptively simple principle guides my research and is motivated by two key insights. First, data in the absence of testable hypotheses, quantitative theory, and statistical methods are rendered worthless, and second, that the appropriateness of the data for a given question is far more important than the quantity gathered. From these insights, I have developed a scientific research framework: to investigate pressing scientific questions by integrating mathematical models and data with powerful statistical methods.

Perhaps the greatest achievement in the past two decades has been our ability to gather and store vast amounts of data. For example, during the past five years, the number of unique DNA sequences on GenBank doubled from 80 to 160 million and according to IBM, more than 90% of all accessible data today was created in the past two years. While there have been successes, this explosion in data has not facilitated a golden age of discovery. This seeming failure of big data led to my two key insights and motivates my philosophy of science.