Big data has been a much-used buzz-phrase for several years. However, it is only more recently that big data analytics has entered the corporate mainstream. More and more companies now say that they are using or looking to use big data analytics in their business. But the concept of big data raises a number of issues for data protection and data security, and while there has been no major breach of big data datasets yet, it is only a matter of time.
There is no single, generally accepted definition of big data, but one of the most common is that given by Gartner: "high volume, high velocity and high variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision-making." Increases in processing power and declining storage costs mean that we are creating more data than ever before, with IBM estimating in September 2013 that 90% of the world's data was created in the previous two years, and Google processing 24 petabytes of data every single day. Businesses and governments are looking at ways to harness this volume of data, and the ability to process unstructured data, to find correlations that they would otherwise have been unable to detect.
However, the very nature of big data does mean that careful consideration needs to be given to how data is handled. Big data often means combining datasets to create as large a dataset as possible, and holding and analyzing as much data as possible. The term ‘N=all’ is often used, meaning the dataset to use is ‘all available data’. This goes against traditional data protection principles, which hold data minimization as one of the key requirements. It also means that, even where data is theoretically anonymized, big datasets will be a target for cyber-criminals simply on the basis that the aggregation of data in one place makes it a target. On the subject of anonymization, the jury is still out but it is clear from studies already published that big data does afford some ability to re-identify individuals from theoretically anonymized datasets. At the moment, it looks as though this is, in fact, an issue arising from poor anonymization techniques.
"The very nature of big data [means] that careful consideration needs to be given to how data is handled"
Another issue with big data processing is that it is often unclear what use data will be put to in the future, so big data seeks to retain datasets for as long as possible. Big data analytics is about finding unexpected correlations and taking advantage of them; while to some extent one can predict what datasets may be used in future, this ability is limited in scope with big data. This means that some of the protection afforded by the requirements to (a) only use data for the purpose for which it was collected and to (b) hold it for the minimum time to achieve that purpose may be lost with big data processing.
Allied to this is the risk that organizations take when transferring datasets to third parties for analysis. As organizations do not necessarily hold or have access to all of the data that may be necessary for an analytics exercise, they may transfer their own data to a third party to combine with its data for analytics. Alternatively, an organization may transfer data internally to a business analyst team. In each case, the organization needs to ensure that it is not moving data outside its secure perimeters and therefore putting data at risk. This is particularly the case where businesses are moving to cloud storage solutions and some big data analytics tools are specifically designed to pull data from multiple sources in a network for analysis. Organizations need to be particularly alert to the risk of rendering their carefully designed security protections irrelevant by inadvertently moving data outside those perimeters.
It is not, however, all bad news. Big data is also potentially a powerful tool in detecting security breaches. A Verizon paper estimated that information relating to around 80% of breaches was available in logs, but was not identified and acted upon. A major recent example of this was the Target security breach, where tools detected the intrusion some time before it was identified, but the information security team didn't spot it.
Big data analytics can assist organizations in sorting through false positives and analyzing the mass of data produced by security tools. Also helpful are larger scale projects such as SOLTRA and the FCAS, where financial institutions, critical infrastructure and regulatory bodies are sharing data and information to try to better protect against and respond to cyber-threats. The issue remains that, while big data analytics should provide security teams with a tool to identify intrusions more effectively, the human element is still the weakest link. While big data presents new risks, and new tools to combat them, organizations will still need to ensure that they have adequate systems in place to ensure compliance with legal and regulatory requirements, and that they continually review their systems to keep up to date with latest developments.
About the Author
Paul Glass is a senior associate in the Disputes and Investigations Group at Taylor Wessing. Paul's practice includes advising on a range of general commercial litigation and arbitration (under LCIA, ICC and AAA rules), and advising in specialist areas such as financial and IT disputes, as well as cyber security and data protection. Paul graduated from Oxford University with a BA in jurisprudence