Big data, and how to manage it all : The Tribune India

Join Whatsapp Channel

Big data, and how to manage it all

In the era of Big Data, we are like the blind men of the parable standing in front of an elephant, trying to guess what the giant animal looks like.

Big data, and how to manage it all


Atanu Biswas & Bimal Roy

In the era of Big Data, we are like the blind men of the parable standing in front of an elephant, trying to guess what the giant animal looks like. However, unless we understand what Big Data is, we will never get it right. 

It’s important to know why fancy analytics tools that we have used for small data sets cannot be replicated when our data grows. For example, if we want to find simple average of ‘n’ numbers, we just add them, and divide the sum by ‘n’. The same approach is followed  whether the ‘n’ is 100 or 100 billion. 

However, if all the numbers are large positive, then the sum of 100 billion such numbers might not fit in a computer’s memory. We need to adjust the algorithm appropriately to find the average. That's the extra bit of cosmetic surgery needed for handling Big Data.

Decode complex stats

Data analytics mostly comprise statistical methodologies like regression analyses, classification and clustering techniques, standard estimation and testing procedures, etc. While most of such theories are neatly developed in statistical literature and easily applied for small to moderate-sized data, one might need to manipulate intelligently and devise novel techniques for unusual format of data. But, the real challenge, even for standard ready-to-use techniques, lies in the limitations of using data with huge number of variables. 

One reason is the presence of ‘spurious’ or nonsense correlations among different variables. The more the variables we handle, the more we counter such correlations. And unless we can sift out the unimportant variables, we cannot have meaningful analyses of data. 

It’s theoretically challenging too. In addition, even in a standard regression analysis, for example, with loads of data and say, 10,000 variables, we need additional computational techniques.

Information management  

So, how do we handle the ocean of data? Now, with virtually everything confined under the system of Internet of Things, a gigantic amount of data is generated continuously. The ever-expanding horizon of data is now growing faster than ever. An IBM report of mid-2017 described that 2.5 quintillion bytes of data are created per day, and according to a Forbes article (2015), by 2020, new information of about 1.7 megabytes per second is expected to be created for every human being on the planet.

How to store data

Big Data is a boon and a curse at the same time. Are we really capable of leveraging it? With the present expertise, the answer is ‘no’. We need to devise statistical techniques to accommodate data. Only the top statisticians and computational experts together might produce such techniques, that too in a case-by-case manner.

Understand the power of data

Consider the example of multiplication. We need some additional techniques for multiplying two big numbers, say with hundreds of digits. We use our memory, multiply one number by every digit of the other, one by one, starting from the unit place. Finally we add all the rows. This algorithm for multiplication is a derivative of the knowledge of tables, combined with some special techniques. This can be interpreted as a Big Data problem. And special techniques are needed for solving it.

Consider another simple mathematical problem — sorting. Suppose we are to sort five numbers in increasing order. In our elementary classes, we could easily sort them by looking at the numbers; certainly some algorithm within our brain runs to sort them manually. But, we cannot sort 100 numbers, or say 100,000 numbers just by looking at them. We need some algorithm to reach the answer. We have been tackling the Big Data problem for years now.

—The writers are professors at the Indian Statistical Institute, Kolkata

Top News

Jailed gangster-politician Mukhtar Ansari dies of cardiac arrest

Jailed gangster-politician Mukhtar Ansari dies of cardiac arrest

Ansari was hospitalised after he complained of abdominal pai...

Delhi High Court dismisses PIL to remove Arvind Kejriwal from CM post after arrest

Delhi High Court dismisses PIL to remove Arvind Kejriwal from CM post after arrest

The bench refuses to comment on merits of the issue, saying ...

Arvind Kejriwal to be produced before Delhi court today as 6-day ED custody ends

Excise policy case: Delhi court extends ED custody of Chief Minister Arvind Kejriwal till April 1

In his submissions, Kejriwal said, ‘I am named by 4 witnesse...

‘Unwarranted, unacceptable’: India on US remarks on Delhi CM Arvind Kejriwal’s arrest

‘Unwarranted, unacceptable’: India on US remarks on Delhi CM Arvind Kejriwal’s arrest

MEA spokesperson says India is proud of its independent and ...

Gujarat court sentences former IPS officer Sanjiv Bhatt to 20 years in jail in 1996 drug case

Gujarat court sentences former IPS officer Sanjiv Bhatt to 20 years in jail in 1996 drug case

Bhatt, who was sacked from the force in 2015, is already beh...


Cities

View All