Threat Hunting as a Process
We keep hearing this question, “How do I get started with threat hunting and what process should be used?” From a high level perspective, here’s what your threat hunting process should look like:
If you’ve worked in incident handling or forensics, the process and many of its steps should look familiar. Here’s a bit of color around each of the steps.
Threat hunting is arguably the most difficult security discipline to master. It requires a working knowledge of multiple security subjects including packet decoding, intrusion detection, forensics, malware analysis and incident handling. Further, attackers are updating their techniques all the time so a good threat hunting program needs to include constant educational updates.
Once you understand how attackers maintain connectivity with their compromised hosts, you need to identify how you will go about detecting this activity. Will you monitor the network, system logs, or both? What tools will you use and how? Who will be responsible for each step in the process? Will it be a single team or multiple? Will network hardware and Internet of Things (IoT) devices be monitored as well? It may be helpful to run a simulated event from start to finish in order to ensure you have all of the correct pieces in place.
While “Preparation” pulls all of the needed tools and processes together, “Detection” is their day-to-day execution. Your run books should identify how your tools and processes will be used to detect suspect activity. They should also identify how potential detections should be escalated including what other resources can be drawn on when needed.
If you have a documented forensic analysis processes, you can simply leverage it when needed. This will include everything from threat isolation to a detailed analysis of the compromise. A good forensic analysis should identify both when and how the attacker gained a foothold on your network. It should also identify if any other additional systems have been impacted.
Recovery identifies how we return to a normal operating state once the threat has been identified. Typically, recovery will include both short-term and long-term responses. For example a short-term response may be to swap out the impacted system with another capable of providing the same services. The long-term solution may be to identify how the attacker gained access, and implement steps to ensure it does not happen again (missing patches, loose firewall rules, etc.). It’s not uncommon for the long-term recovery plan to come out of the blameless postmortem process.
While some may refer to this as “lessons learned”, I greatly prefer the more descriptive “blameless postmortem” as it helps to point the analysis in the right direction. It is far too easy, and common, to try to blame a specific individual for an event. “Bob clicked an email he shouldn’t have” or “Betty missed installing a patch”. By blaming a person instead of a process, we can go back to business as usual and act surprised when it happens again. Humans are fallible. Our processes need to take this into account.
So the goal of a blameless postmortem is to identify how our processes can be improved. This may include keeping the attackers off of our network or expediting the process of detection once they have gained access. What we learn should also be fed back into the educational portion of our process so that we see continuous improvement.
We will dig deeper into each of the steps in this process in future blog entries.
Chris has been a leader in the IT and security industry for over 20 years. He’s a published author of multiple security books and the primary author of the Cloud Security Alliance’s online training material. As a Fellow Instructor, Chris developed and delivered multiple courses for the SANS Institute. As an alumni of Y-Combinator, Chris has assisted multiple startups, helping them to improve their product security through continuous development and identifying their product market fit.