The three challenges of “Big Data”, Performance, Intelligence and Communication.
End Point Protection agents on 140,000 computers generate daily 60 Gigabytes of log files. Retrieving this data to a centralized storage for analysis purposes takes several hours and already has a small impact on the enterprise bandwidth. 2.7 million Users who access provided services generate 15 Gigabytes of log files in one hour and 360 Gigabyte every 24 hours. The perimeter defense cycles thousands of rules and in average 250 signatures for every single user session. This is only a fraction of Big Data, but it already comes with its challenges.
The necessity to store huge amount of data are caused by mainly three different needs
- To meet the need of governance and become compliant with legal requirements.
- Statistical data analysis wich might lead to product or process improvements.
- Data as a new comodity for sale (in raw or refined form) to the highest bidder.
The Performance Challenge
Approximately 50,000 call logs per second is just an average number of data created by a national Telco company. The present hardware is able to record and store this data. However, analyzing and enriching every single call log with additional information tells a story of one of many challenges.
MIPS million instruction per second was the name of a CPU (computer) back in the day and it was fast. Nowadays MTPS million transaction per second is very common and just indicates how much data is processed in a second as background noise.
In a finance sector where not only the throughput but also low latency is crucial, there are challenges even with the present hardware, intelligence (software) and standards. A security mechanism which could deep inspect every transaction from its origin to its destination is just not feasible yet. A tradeoff between security and performance must be made; which is an issue.
Performance will be an issue as long as the current architecture is not aligned in a sincere effort to take the challenges. Some of the well established technology standards must be broken in order to advance. Even though there are some small attempts, it can be assumed that the leading suppliers are not really in the best interest of efficiency.
Incorporating some human aspects in the algorithm such as filtering and forgetting of information and utilization of distributed parallel processing are key elements to overcome the performance challenge.
The Intelligence Challenge
Intelligence encompasses data modeling, algorithms, coding, software, architecture and the people who are dealing with them and the communication between them.
Storing data already requires resources, energy and intelligence (compression and data mapping) to keep it (CIA) confidential, keep its integrity and keep it available. Extracting certain amount of needed data from a huge data graveyard is another time consuming process; time equals money.
Data modeling and using algorithms is one of the touchiest challenges in this field. Like in any statistics, the wrong question always leads to a wrong answer. The very same data can prove or disprove a case depending on the applied model and the algorithm. Most of the time the models and algorithms are aligned in a way to prove what it should do and not what it does. Pattern matching is the most essential part of algorithms, yet narrowing down the patterns to what is already known and familiar does not give any impartial advantage or insight.
Disjointed data silos are another challenge for intelligence. Imagine every department of one company keeps the collected data for themselves to strengthen their position. Connectivity is the key here; without any final result research would reflect only a part of the whole.
Adding the right dynamic value (attributes) to the right data can not be achieved based on geo centric or cultural assumptions. The intelligence challenge can be only overcome by the incorporation of a horizontal and vertical level of diversity.
The Communication Challenge
Assuming the performance and the intelligence challenge is solved; this last one is not to be underestimated. Who is the man with the master plan? Who will deliver the message? Who is going to interpret it? Who will visualize it and who will tell the story? Facts do not speak for themselves but must be told in an impartial way.
It is well known that there are scientists who can create equations, formulas and algorithms we can’t even dream of; yet when it comes to deliver the message in a clear way the recipient might have some challenges to digest the information.
As long as the collected data does not speak a story in real time, they are of little use. Therefore at the present moment, most data storages are nothing but data graveyards.
Communication and clear expression in an enterprise among people is one thing that might even be considered as the smaller part of the challenge. However, the far greater challenge is the communication between intelligence to the artificial intelligence (human to computer). This means if the information can not be attributed and expressed in a simple, clear and understandable way, we cannot expect the A.I. (which makes increasingly more decisions for us) to be capable of making the right one.
Big data big challenges so there is a long way to go however even the longest way starts with the first step.