Too bad there isn’t a “CSI: Network” show to learn from because in the digital world, dusting for fingerprints just won’t do. Cyberattacks don’t have a traditional, physical crime scene where evidence can be collected for investigation. Instead, we are facing a crime scene built from a complex structure of servers, networks and applications, scattered across many different geographical locations.
Just as real-world forensic detectives don’t always have the evidence they need, networks and applications only provide a partial and reduced set of evidence for cyber security forensics. For example, a log file from a server can show the health of the server and applications running at any given time, but it will not be able to tell exactly what information was exchanged with other servers, networks or applications. Similar arguments can be made for log files originating from networks and applications.
In any kinds of investigation, more evidence is better. To increase the amount of evidence, we need to shift our focus away from these devices and onto the actual information traversing our networks. By collecting this type of information, we can reconstruct a complete picture of what occurred by deploying full packet capture capabilities at strategic points across the network infrastructure.
Quick Retrieval of Packet Data is Critical
Of course, it’s not possible to manually sort through the entire packet capture database. Instead, the packet data must be indexed as it is written to the database to enable fast searching. The most common ways of indexing the packet data is on reception time, addresses, protocol number and port numbers. Indexing by reception time will enable us to quickly find all packet data captured within a certain timeframe; indexing by addresses, protocol number and port numbers will enable us to quickly find all packet data exchanged by either one user or between two users. The various types of indexes can also be combined, allowing us to search fast for all data exchanged between two parties within a given time window.
An efficient way to find packet data for a forensic investigation quickly is to index packet data on reception time, address, protocol number and port numbers. Efficiency can be improved further by associating every packet data origination from a given communication session with a unique session ID and indexing all packet data by their unique session ID. Doing this will quickly find all data packets belonging to a given session between two entities, such as in a specific YouTube video playback.
This overcomes the second significant issue, but it is also important to get the relevant packet data retrieved from the packet capture solution and into the hands of the forensic network security team for analysis as fast as possible.
There are a couple of reasons why fast retrieval is important: imagine investigating a possible security breach and quickly identifying some suspicious packet data in the packet capture database, only to then spend several hours retrieving the suspicious packet data. Firstly, this will prevent the forensic network security team from making progress until the retrieval process is complete. Secondly, there is a chance that the suspicious packet data will be overwritten by newly captured packets before the retrieval process is done.
Sorting out Storage
Being able to quickly find and retrieve packet data is like being able to travel through time. We now have a complete picture of what happened ten minutes, one hour, one day, one week, one month or one year ago on the network. The big question is: How far back in time must we be able to travel?
The Target breach showed us that the attackers were present in the company’s IT infrastructure for more than 200 days before the data breach was discovered, and a detailed investigation was initiated.
To make sure that all the data packets are available for review, an organization needs to determine how much data storage capacity it needs. It does this by obtaining different levels of packet capture history and determining how much packet data an average organization or enterprise is generating. Let us assume the small/medium enterprise will generate an average network load of 750Mbps across a 24-hour window, and a large enterprise will generate 5Gbps under the same conditions. Armed with this information, an organization can calculate the minimum required data storage capacity for one day, one week, one month and one year of packet capture storage.
Since many of the packet capture solutions currently on the market can scale to 1000TB of data storage; that equals roughly four months of packet data history for a small/medium enterprise, and less than a month for a large enterprise.
For organizations that find themselves in need of more storage, it is possible to expand the packet data history even further by compressing it before it is written to the packet capture database. The effect of compressing the packet data depends on the selected compression algorithm and the content of the data packet, as certain data is more suited for compression than other. Standard network packet data have a typical compression ratio of three; hence, compression can triple the packet capture history.
Seizing the Opportunity
As cybersecurity efforts escalate in response to increasingly sophisticated attacks,
deploying packet capture capabilities is an extremely effective weapon. Yet, a few challenges must be addressed in order to have a successful packet capture solution.
Budget is an important factor here. The two biggest cost drivers are the number of packet capture systems and the size of packet data history. A single packet capture system can cost up to $250K, depending on the exact system configuration, and a storage solution in the petabyte range could cost more than $1M.
These costs may seem like a hard pill to swallow, but it’s not nearly as hard to swallow as losing your job and your organization’s reputation because you weren’t willing to invest in rapid forensic investigation. All advantages in the cyberwar should be seriously considered, and high-speed, full packet capture and retrieval are among them.