Should I Threat Hunt My Systems or the Network?
I made a comment in an earlier blog post that too much data without context can negatively impact the threat hunting process. This is why I’m a firm believer that you should hunt for adversaries using network data, and save host level logs for forensic analysis. In this blog post, I’ll dive into the reasons why I feel threat hunting should be a network based activity.
Network and Host Based Threat Hunting Defined
Just to ensure we are on the same page, I wanted to define a few terms. Cyber threat hunting is the process of searching the network infrastructure for compromised systems. The goal is to validate the integrity of every system connected to the network. Forensics is the process we leverage when we feel a compromise has been identified. While I will threat hunt every system and device on the network, I will only perform forensics on the systems I suspect of being compromised.
Network based threat hunting analyzes packets on the network. Typically you use a network tap or port mirroring to monitor all traffic flowing between networks of different trust levels. Typically, this is an internal network and the Internet, but the technique can also be applied to business partner connections. The goal is to look for signs of an internal system being compromised.
With host based threat hunting, I use logs entries that are generated by the hosts I wish to protect. Typically, these logs are forwarded to a centralized server where they can be correlated and analyzed. I then perform my threat hunt on the centralized server. Again, my goal is to find data in the logs that is indicative of a system being compromised.
Threat Hunting System Visibility
One of the core realities of threat hunting is that we need access to data to verify integrity. I cannot validate whether a system is still in a trusted state unless I have access to data which identifies that system’s current operational state. If I’m hunting on the network, visibility is a non-issue. Any compromised system calling home is going to use the local Internet connection. I just need tooling and process to distinguish that traffic from normal communication patterns.
With host based threat hunting, data access can be hit or miss. It can be extremely difficult to identify if you are collecting logs from all of your devices. In fact, you can be pretty certain you are not, given the number of BYOD, IoT and forgotten hardware devices (like printers) that end up on the typical network. Clearly, if I’m not getting logs from a device, I have no way of evaluating that device’s integrity.
Further, modern malware tries really hard to cover its tracks. So even if I am collecting log entries from a host, the malware that now owns that system may be altering logs in an effort to go undetected. This puts us in a catch-22 where we are relying on an untrusted source to provide accurate information in order to identify its integrity. As an analogy, imagine if the full extent of a police investigation was to simply ask various individuals if they committed the crime, and 100% trust the integrity of their word.
So think of network based threat hunting as the great equalizer. While an attacker may be able to hide the processes they are running on a system, they cannot hide their packets on the wire. The best they can hope for is to get lost in the rest of the traffic on the network.
Threat Hunt Data Consistency
One of the nice things about analyzing network traffic is that communications are fairly consistent. The IP protocols we use for communications are governed by RFC standards. This means that Secure Shell traffic originating from a Windows system is going to look the same as traffic originating from Linux, Mac OS, BSD or various IoT devices. This permits me to have one consistent process for analyzing these traffic patterns, regardless of operating system. Yes, each operating system can have its own nuances in the way it communicates, but these minor differences can actually be leveraged as part of the threat hunting process.
Reviewing system log data is anything but consistent. Most logging revolves around Syslog which is little more than a general framework. The result is a complete lack of consistency between operating systems and even applications running on the same OS. Windows and Linux record very different runtime messages. Two different Secure Shell applications are going to log different information, in slightly different formats, even if they are both running on the same OS.
The root cause is that Syslog leaves much of what it does define completely open to interpretation. Should a log entry for a failed login be assigned a severity of error, warning, notice or information? I’ve seen applications use all of these. The result is that I need intimate knowledge or every OS and application running on my network before I can even hope to detect suspect behavior. This also means I’m customizing my data processing techniques for every unique application. Add a new app and there is no guarantee your existing processes will validate it properly.
Sorting Through Data Noise
This brings me back to my original point that there is such a thing as too much data in threat hunting. I want to make a distinction here; there is a difference between actionable data which can be leveraged to make informed decisions and unrelated data that simply raises the noise level.
The first challenge is determining which data is actionable and which is unrelated. Is that Apache error a common runtime error or did someone just perform a buffer overflow? Since an application is going to log both actionable and unrelated information, you need to start by sorting through which is which before you can even begin threat hunting. This process can very quickly lead to decision paralysis.
Further, once you have determined which data is actionable and which is not, it can be a PITA to whitelist unrelated information from future threat hunts so you are not duplicating efforts. For example, you don’t want to ignore all Apache errors, just the ones that have nothing to do with an adversary compromising the endpoint. How exactly do you do that? You can’t, which means you end up ignoring a majority of log entries during every threat hunt. This can lead to anchoring bias, a common cause of ignoring actionable data even when it does appear in the logs.
With network based threat hunting, it’s far easier to sort through actionable versus unrelated data. All communications originating on the internal network and headed to the Internet are a potential threat. As a threat hunter, I need to understand and evaluate the business case for every one of those sessions. When a session is deemed safe, it’s a straightforward process to ignore those sessions from future threat hunts (ignore all HTTPS to www.google.com, ignore all NTP to my designated time servers, etc.). This allows me to iterate the threat hunting process over time. As I whitelist more and more sessions I understand as safe, the data set I need to review for potential threats gets smaller and smaller. This helps to speed up the threat hunting process over time.
When to Use Host Logs
This does not mean host logs have no value. When we threat hunt the network, we are trying to determine if each session is trustworthy or a potential threat. Sometimes the network does not have all the answers. For example, I may see that one of my internal systems is holding open a TLS protected session with an external host 24 hours a day. For me personally, I would be 75% certain that this is due to a compromise or an internal employee breaching policy. With this evidence in hand, I would now turn to host logs to identify what process is responsible for that session, and why it is running on the system. At this point, however, I’ve switched into more of a forensic analysis mode than threat hunting.
Many times we operate under the belief that if a little is good, too much must be just enough. That is absolutely not the case with data points when performing a threat hunt, as too many unrelated data points can unnecessarily extend the time it takes to perform a successful threat hunt, as well as lead to actionable data getting missed. When searching for threats within your organization, start with the network and only pivot to host logs when you think you’ve found something suspicious.
Chris has been a leader in the IT and security industry for over 20 years. He’s a published author of multiple security books and the primary author of the Cloud Security Alliance’s online training material. As a Fellow Instructor, Chris developed and delivered multiple courses for the SANS Institute. As an alumni of Y-Combinator, Chris has assisted multiple startups, helping them to improve their product security through continuous development and identifying their product market fit.