Grassroots Data Security: Leveraging User Knowledge to Set Policy

Written by

When was the last time your job dealt you a good kick in the teeth? If you’re an IT security professional, you probably remember. When it’s your job to define the boundary between users and the perils of the Internet, you’re going to have the occasional bad day. Dealing with a data spill’s security and compliance problems is no fun. Neither is dealing with an executive who can’t do their work because your security rules stopped them from sharing a file.

Policy is the stage where these dramas play out. You have to strike the right balance between protection and productivity or face data loss and angry users. Just to be clear, we’re not talking about security strategies or grand philosophies.

The policies we’re talking about are the rules that govern acceptable information use, storage and access at the file level. Creating and maintaining them is high-stakes IT grunt work. Getting it right is crucial – but it’s not easy.

Large organizations employ teams of people to write and maintain policies that turn firewalls, intrusion detection systems, data loss prevention (DLP) tools, access controls, and other solutions into policy enforcement machines.

Even with the right staffing levels, keeping these policies up to snuff is elusive. There are a few reasons why, including:

  • Policy evaluation happens in isolation - DLP rules, for example, block or pass based only on what they can see in a specific file. Stop a file that should pass and you block the file and the business. Soon you’re writing a rule for that one guy in marketing who needs access to Engineering. Exceptions pile up. Complexity takes over.
  • Failures of imagination - Even when you perfectly understand a policy goal, it’s almost impossible to write the perfect security rule. A file could come from somewhere you didn’t expect or lack the meta data you thought you could rely on. Policy creation becomes policy forensics as you try to understand why a policy didn’t work as planned.
  • The “tight/loose” question - Should you create a “tight” rule that erroneously blocks legitimate traffic or a “loose” rule that might let sensitive information escape? It’s a tough tradeoff between user productivity and data security.
  • IT staff are not content experts - Information security professionals can’t be expected to know what makes one legal contract or sales document more business-critical than another. In many cases the documents they’re charged with protecting are too sensitive for them to even view. IT staffers aren’t content experts. Unfortunately, they need to be if they’re going to protect data.

With all these policy challenges there’s some good news: recent developments in deep learning and natural language processing make autonomous, accurate IT security policy more accessible than ever. However the tech, as interesting as it may be, isn’t what changes the game. It’s the way it reverts the process back to the grassroots that’s exciting.

Today, the IT team owns the entire problem. They write rules to discover and characterize content (What is this file? Do we care about it?). They write more rules to evaluate that content (Is it stored in the right place? Is it marked correctly?). Then they write still more rules to enforce a policy (block, quarantine, encrypt, log). Unsurprisingly, complexity, maintenance overhead, false positives and security lapses are inevitable.

It turns out data security policies are already defined. They’re hiding in plain sight. That’s because content creators are also the content experts and they’re demonstrating policy as they go. A sales team, for example, manages hundreds of quotes, contracts and other sensitive documents. The way they mark, store, share and use them defines an implicit data security policy. Every group of similar documents has an implicit policy defined by the expert content creators themselves.

The problem, of course, is how to extract that grassroots wisdom. Deep learning gives us two tools to do it: representation learning and anomaly detection.

Representation learning is the ability to process large amounts of information about a group of “things” (files in our case) and categorize those things. For data security, advances in natural language processing (a sub-specialty of deep learning) now give us insights into a document’s meaning that are far richer and more accurate than simple keyword matches.

When we combine document meaning with its storage location, ownership, sharing patterns and more, deep learning can expertly identify groups of similar files. It also uncovers the implicit policies content creators follow when working with those files.

Some files in a group will stand apart as anomalies. When those anomalies are security-relevant, we have everything we need to take action, and that’s when the IT team comes back into the picture. Now empowered with categorized data and insight into specific at-risk files, the team can protect those files as appropriate for the organization.

By enlisting the wisdom of those closest to the content, deep learning takes data security closer to its roots. IT security professionals no longer need to be content experts. They can go back to doing what they do best – protecting the organization and its stakeholders.

What’s hot on Infosecurity Magazine?