By Jon-Louis Heimerl
Data [dey-tuh] noun: individual facts or statistics
Information [in-fer-mey-shuhn] noun: knowledge concerning a particular fact or circumstance
When does data become consumable information? When we correctly manage security, we integrate security devices into our infrastructure in a manner designed to support our privacy, security, and regulatory requirements. The problem is that good security can generate a lot of data. This is exacerbated by the desire to ensure that the data is actually consumable information – stuff we can use.
42
African or European?
Data is just “stuff,” while information is what that stuff means. Is “42” simply 6 X 7, or is it really the answer to life, the universe and everything? Are “African or European” just words to you, or do they have something to do with the airspeed of an unladen swallow? To make sense of these, do you need the context of Douglas Adams and Monty Python? That is not your fault. It just is.
Your management of security data follows the same rules. Data is more valuable if viewed in context. If you have an IDS reporting a port scan on IP 192.161.0.12, that is simply a piece of data. You still have to figure out what that data means to you. Is it important or is it noise?
Getting Context
Your organization uses data, and the security parts of your organization use security relevant data.
For a non-security example, let’s use a 3000-piece puzzle. You have to put it together without looking at the picture on the box. You can look at a piece, and add context to that piece. Is it a corner piece, a side piece, or a middle piece? Does the piece have a part sticking out or does it have a hole? Is that something red and round on the piece? Is that something shiny? All of these observations add context to the pieces, as well as the puzzle as a whole.
When you add context to security information, it helps tell you how to build your entire security program. You go from supporting “data” to supporting “PCI data,” along with all that it means to be PCI compliant. You know that the environment that supports PCI data at BigBlueBank is going to receive more advanced security controls than the inventory control system at Joe’s Hat, Boot and Shoe Company. While the two data sets are both important to their respective companies, the specific regulatory requirements placed on the PCI data should result in enhanced controls at BigBlueBank. Even staff at Joe’s would agree that the number of size 10 boots in stock is not as sensitive as credit card data.
PCI has elevated requirements for a variety of technical controls, including data segregation and encryption, as well as incident response, policy, procedure, and training. If you add St. Mary’s Hospital to the mix, you can imagine that their trauma center has stronger availability/resiliency requirements than they do at Joe’s Hat, Boot and Shoe Company. The context within which the data works shapes the entire environment.
The supporting information adds context to the raw security data. Your IDS alert that was previously just “data” gets a whole new meaning if you have the context to know whether 192.161.0.12 is the system that holds your credit card database, or is an internal website that has limited value. Without security context, you might know that you have an alert, and that you are being attacked. But with good context, you can tell that the server being attacked is named “Mordor,” and is a Windows Server 2008, R2 SP1, running Oracle 11g Enterprise, that sits in the Princeton, NJ, data center in row 3, rack A12, and it holds all of your clinical patient records, so it falls under HIPAA and HITECH. That information, and context, should make a huge difference in how you manage and protect the information, as well as threats to it.
Advanced Analytics
Adding context to data gives you information. Analytics adds even more information by evaluating relationships between the various pieces.
You started sorting the puzzle pieces, adding context where you could. You might group pieces that have red on them, as well as pieces that are shiny, to see if you can find anything in common or see a pattern. You start assembling the frame of the puzzle by looking at the sides and corners.
When you look at how the pieces fit together, you are looking at the relationships between those pieces. That is analytics. Next, you look at the red pieces, and see how they fit together. After you assemble three or four pieces, you recognize that the red is a clown nose. Analytics gives you even more data since now you know that the puzzle has a clown in it. That piece of information improves the context that you had previously assigned to every other puzzle piece. Then you assemble some shiny pieces and realize it is a shiny hubcap on a wheel. Analytics.
Better yet, you can match those larger pieces of information together and realize that the puzzle probably includes a clown car, which automatically adds new context to all of the other pieces in the puzzle. Analytics helps you to recognize the giant daisy that squirts water, and the huge green shoe sticking out of the trunk. You utilize analytics to assemble multiple clowns, and the car. Contextual information enabled you to start building, but it was analytics that actually let you make progress and eventually finish the puzzle.
Of course, the same rules apply with information security. The context is invaluable, and lets you understand what your event and alert information means. But the analytics applied to those events forms a bigger picture of what is happening in your environment, and is even more important.
Context and Analytics in Practice
How does this work in real life?
Joe’s Hat, Boot and Shoe Company has a relatively immature security management practice. They generally ignore an external port scan. When they get a series of login failures on an internal system, they probably ignore that also, unless a systems/security admin happens to realize that those failures came from a known “important” system. They effectively ignore a privileged database login since they probably lacked context to see how important the system was and their level of security paranoia was relatively low. If Joe’s sees the elevated traffic levels, it may be cause for concern, but for the most part it is simply one more in a flood of other events. Keep in mind that Joe’s did not get just these five events. Joe’s got these five events along with another 3,000 or so events that evening. Chances are that the IT staff at Joe’s is not alerted to anything.
Bob’s Big Box store could probably care less about the 17th port scan they saw that week. BBB may also not be terribly worried about a series of external login failures, but when those failures are immediately followed by a success, analytics kicks into action. Was this a user mistyping a username and/or password, or was this a successfully guessed password? At the very least, good analytics has this marked as “curious.” This is probably marked even “curiouser and curiouser” when analytics checks back in time and sees BBB had been port scanned 10 minutes earlier. Suddenly, the port scan is not “just another port scan.” Can good analytics be applied to anything else interesting about the events? For example, did the port scan and login attempts come from the same IP address? This could lend additional context to the events.
BBB sees a series of internal login failures. Given that this followed shortly after the suspicious external logins, this is now marked with an elevated concern more like “interesting.” Their internal systems report the privileged account logon as a matter of due course, and it is only really interesting if it falls in a reasonable time sequence in the series of events that are undergoing analytics. Elevated outbound traffic volume would be the last straw. Analytics considered 3,000 events, and picked out a series of five that it decided were related – that they fit together like the corner of a puzzle.
What happens next depends on how BBB has defined their security profile. At the very least, an internal alert is issued, and if they are prepared, they would probably terminate outbound traffic at the firewall when the extra traffic was detected.
The five events are a dramatic oversimplification. So is the “five out of 3,000.” In reality, this could be thousands and potentially millions, of events, depending on your environment. If your environment consists of six systems, and one IT guy knows them all, he may be able to accomplish all of your analytics. But if yours is an organization of any size, doing meaningful analytics in a manual manner is going to be more a matter of luck than skill.
Jon-Louis Heimerl is Director of Strategic Security for Omaha-based Solutionary, Inc., a provider of managed security solutions, compliance and security measurement, and security consulting services. Heimerl has over 25 years of experience in security and security programs, and his background includes everything from writing device drivers in assembler to running a world-wide network operation center for the US Government. He has also performed commercial consulting for a variety of industries, including many Fortune 500 clients. Heimerl's consulting experience includes security assessments, security awareness training, policy development, physical intrusion tests and social engineering exercises.