Big data, and how to manage it all : The Tribune India

Join Whatsapp Channel

Big data, and how to manage it all

In the era of Big Data, we are like the blind men of the parable standing in front of an elephant, trying to guess what the giant animal looks like.

Big data, and how to manage it all


Atanu Biswas & Bimal Roy

In the era of Big Data, we are like the blind men of the parable standing in front of an elephant, trying to guess what the giant animal looks like. However, unless we understand what Big Data is, we will never get it right. 

It’s important to know why fancy analytics tools that we have used for small data sets cannot be replicated when our data grows. For example, if we want to find simple average of ‘n’ numbers, we just add them, and divide the sum by ‘n’. The same approach is followed  whether the ‘n’ is 100 or 100 billion. 

However, if all the numbers are large positive, then the sum of 100 billion such numbers might not fit in a computer’s memory. We need to adjust the algorithm appropriately to find the average. That's the extra bit of cosmetic surgery needed for handling Big Data.

Decode complex stats

Data analytics mostly comprise statistical methodologies like regression analyses, classification and clustering techniques, standard estimation and testing procedures, etc. While most of such theories are neatly developed in statistical literature and easily applied for small to moderate-sized data, one might need to manipulate intelligently and devise novel techniques for unusual format of data. But, the real challenge, even for standard ready-to-use techniques, lies in the limitations of using data with huge number of variables. 

One reason is the presence of ‘spurious’ or nonsense correlations among different variables. The more the variables we handle, the more we counter such correlations. And unless we can sift out the unimportant variables, we cannot have meaningful analyses of data. 

It’s theoretically challenging too. In addition, even in a standard regression analysis, for example, with loads of data and say, 10,000 variables, we need additional computational techniques.

Information management  

So, how do we handle the ocean of data? Now, with virtually everything confined under the system of Internet of Things, a gigantic amount of data is generated continuously. The ever-expanding horizon of data is now growing faster than ever. An IBM report of mid-2017 described that 2.5 quintillion bytes of data are created per day, and according to a Forbes article (2015), by 2020, new information of about 1.7 megabytes per second is expected to be created for every human being on the planet.

How to store data

Big Data is a boon and a curse at the same time. Are we really capable of leveraging it? With the present expertise, the answer is ‘no’. We need to devise statistical techniques to accommodate data. Only the top statisticians and computational experts together might produce such techniques, that too in a case-by-case manner.

Understand the power of data

Consider the example of multiplication. We need some additional techniques for multiplying two big numbers, say with hundreds of digits. We use our memory, multiply one number by every digit of the other, one by one, starting from the unit place. Finally we add all the rows. This algorithm for multiplication is a derivative of the knowledge of tables, combined with some special techniques. This can be interpreted as a Big Data problem. And special techniques are needed for solving it.

Consider another simple mathematical problem — sorting. Suppose we are to sort five numbers in increasing order. In our elementary classes, we could easily sort them by looking at the numbers; certainly some algorithm within our brain runs to sort them manually. But, we cannot sort 100 numbers, or say 100,000 numbers just by looking at them. We need some algorithm to reach the answer. We have been tackling the Big Data problem for years now.

—The writers are professors at the Indian Statistical Institute, Kolkata

Top News

Telangana CM Revanth Reddy summoned by Delhi Police to join probe in Amit Shah’s doctored video case

Telangana CM Revanth Reddy summoned by Delhi Police to join probe in Amit Shah’s doctored video case

The Special Cell of Delhi Police had on Sunday registered an...

Supreme Court asks Arvind Kejriwal's counsel why he has not filed bail application before trial court

Excise policy case: Why no bail plea in trial court, Supreme Court asks Delhi CM Arvind Kejriwal

Court was hearing Kejriwal's plea against his arrest in PMLA...

Indore Lok Sabha seat Congress nominee Akshay Bam withdraws candidature

Congress's Indore nominee Akshay Bam withdraws candidature; likely to join BJP

The Congress had fielded Bam (45), a newbie in the poll aren...

Tihar jail administration allows wife Sunita to meet Arvind Kejriwal

Arvind Kejriwal asks Atishi to ensure there is no water shortage in Delhi, says AAP

Sunita Kejriwal and Atishi meet Kejriwal in Tihar jail


Cities

View All