The past, the present and the future of data science
IN 1994, BusinessWeek published a cover story on database marketing by Jonathan Berry. “Companies are collecting mountains of information about you, crunching it to predict how likely you are to buy a product, and using that knowledge to craft a marketing message precisely calibrated to get you to do so,” Berry wrote. Certainly, that was a turning point in society’s perception of data-related technological requirements.
Data has always been crucial to the advancement of science and the expansion of human knowledge. And without a doubt, statistics, in particular, is a data-driven discipline. But during the past 60 years, the field of data science has slowly emerged.
A major breakthrough occurred in 1962, when, in a seminal paper, statistician John Tukey stunned his readers (academic statisticians) by pointing out the existence of an as-yet-unrecognised science whose focus was learning from data, or ‘data analysis’. In 1977, Tukey wrote that exploratory data analysis and confirmatory data analysis “can and should proceed side by side” and that more emphasis should be placed on analysing data to put the hypothesis to the test. Danish computer science pioneer and Turing Award recipient Peter Naur introduced the phrase ‘data science’ as an alternative to computer science in the 1960s, referring to it as “the science of dealing with data”.
A massive amount of data is continuously generated in today’s world due to the advent of computers and the Internet. The ever-expanding horizon of ‘data’ is now growing exponentially. And the Covid epidemic certainly raised the rate. In recent years, there has been a lot of hype around ‘big data’. People are captivated by it, and they aim to churn it to create effective strategies in every bit of their lives, whether they are in business, industry, sports, healthcare, elections or national policymaking. The account of how Billy Beane, the manager of Oakland Athletics, used historical data and analytics to achieve enormous success in Major League Baseball on a tight budget was detailed in Michael Lewis’ 2003 book Moneyball and its 2011 film adaptation starring Brad Pitt. Since then, the Moneyball culture has permeated every aspect of our lives. Big-data analytics from Silicon Valley arrived on the scene.
A new class of professionals has emerged: data scientists. According to a 2012 Harvard Business Review article by Thomas Davenport and DJ Patil, this is the hottest profession of the 21st century. The rise of ‘data science’ programmes at numerous prestigious universities and institutes around the world is an expanding phenomenon. The potential of data science may therefore appear boundless, given the ocean of data at hand. What, though, is data science’s future? Is it going to shape our lifestyle and the direction of data-centric scientific study? And how easy is it to leverage that huge volume of data we are collecting these days?
Society has slowly embraced data science as an important area to nurture. In a 2001 paper, William S Cleveland of Purdue University introduced the notion of data science as an independent discipline, extending the field of statistics to incorporate “advances in computing with data”. The phrase ‘data scientist’ was possibly coined in 2008 by DJ Patil and Jeff Hammerbacher to refer to “high-ranking professionals with the training and curiosity to make discoveries in the world of big data”.
Kenneth Cukier stated in his report titled ‘Data, Data Everywhere’, published in The Economist in February 2010, that “a new kind of professional has emerged, the data scientist, who combines the skills of software programmer, statistician, and storyteller/artist to extract the nuggets of gold hidden under mountains of data.”
In general, data science nowadays is undoubtedly a synthesis of statistics, mathematics, algorithms, engineering prowess and communication and management skills. According to the 2012 Harvard Business Review article referenced above, a good data scientist is a cross between a data hacker, analyst, communicator and trusted adviser.
Did the hype increase over time? Davenport and Patil published a follow-up article in the Harvard Business Review in 2022 to reassess their decade-old narrative and determine whether the central premise of their 2012 article that data science is one of the world’s fastest-growing professions remains accurate. In 2012, they explained, “More than anything, what data scientists do is make discoveries while swimming in data.” According to their 2022 explanation, “the job is more in demand than ever with employers and recruiters. AI is increasingly popular in business, and companies of all sizes and locations feel they need data scientists to develop AI models”. Additionally, they believed that being a data scientist was still a very attractive job, although the scope might have changed a bit with the AI revolution.
Let’s see how data science actually performs in the real world. Anne Milgram, former Attorney General of New Jersey, showed how smarter statistics could help fight crime. She integrated data analytics and statistical analysis into the US criminal justice system when she became the Attorney General in 2007. Milgram called it “moneyballing criminal justice”. Yes, moneyball has truly become a buzzword, and the goal of data science may be to moneyball the situation by discovering new things while swimming in data. Well, let’s not forget that statistics is still in its infancy while dealing with millions of data points on thousands of variables.
In 2017, David Donoho, a Stanford professor, published a fascinating paper titled ‘50 Years of Data Science’. Donoho made a prediction as well. “In the future, scientific methodology will be validated empirically,” he wrote. “In 2065, mathematical derivation and proof will not trump conclusions derived from state-of-the-art empiricism.”
Well, would that be so simple in approximately four decades? That’s the distant future in this rapidly changing, technology-driven world. A shade of uncertainty would remain.