Identifying Beacons Through Session Size Analysis
A lot of focus is placed on identifying beacons based on analyzing the connection timing between systems. However, session size can also be used for beacon analysis, and can provide insights that are not achievable through a strict time-based analysis. In this blog post I’ll walk through how to identify beacons based on session size, and how an analysis of session size can be leveraged to identify command and control (C&C) activation as well as weed out false positives.
Beacon Time Analysis – A Quick Review
The most common method of identifying beacons is by analyzing the timing gap between sessions. For example, If I see an internal system creating a connection to an external system once every 60 seconds, 24 hours a day, that is most certainly a beacon.
Time analysis has become more complex due to features like Cobalt Strike’s bsleep, which makes it dead simple for attackers to introduce jitter into the beacon’s timing. Introduce a jitter factor of 25% into our consistent “every 60 seconds” beacon, and now the time gap between sessions can vary anywhere from 45 to 75 seconds. This makes it far more difficult to statistically spot beacons based on timing.
Beacon Size Analysis – The Basics
Beacon size analysis is very similar to time analysis. First, I identify a number of sessions between two systems over a specific time period, and then compare the session sizes to see if there is a pattern. The longer the time period, the more accurate the results. For example, if I see 30,000 connections between two IP addresses over a 24 hour period of time, and every session is 200 bytes in size, I can be pretty certain I’m looking at a beacon.
Unlike timing where there is usually a consistent pattern (even the introduction of jitter can be considered a consistent pattern, but more on that in another blog post), your analysis needs to take into account that session sizes are expected to change. Beacon session size is made up of two components, the heartbeat and command orders. The heartbeat is that constant signal of the compromised system checking in to the C&C server and asking “Do you have anything for me to do?”, and hearing back “No, go back to sleep”. The command orders are what happens when the answer to the question “Do you have anything for me to do?” is “yes”, and additional data is transferred.
Detecting Beacon Activation Via Session Size
Let’s look at an example of how beacon size can be used to both detect beacon activity as well as identify when a backdoor has been activated. Have a look at Figure 1. In this AI-Hunter graph we are grouping the sessions based on the amount of data transferred during each session. In the course of this 24 hour analysis, the internal system initiated about 109,000 sessions to a specific IP address on the Internet.
Figure 1: Beacon signals being grouped by the amount of data transferred per session
Check out the consistency in session size. Out of the 109,000 sessions, nearly all of them were approximately 90 bytes in size. This consistency tells me that I’m looking at a beacon signal. Specifically, this 90 bytes is the heartbeat of the compromised system checking in to see if there are any C&C orders, and being told to go back to sleep.
Note that our graph shows two additional data points, one at approximately 110 bytes and another around 300 bytes. This tells me that over the 24 hour period, this backdoor was activated twice. This analysis is just looking at size, so we don’t know for sure what was different in these sessions. We would need to look at packet captures for that. However, the payload could be in clear text, obfuscated or encrypted and this size analysis would still be effective.
Even though we can’t see payload, we can make some educated guesses about what happened in these two additional sessions. Our heartbeat signal uses 90 bytes just to run the C&C channel. Our first data point a 110 bytes is not much larger than this. So we can tell that very little data was transferred. This could simply be a status check or a verification of the current directory. The 300 bytes is larger but still pretty small. It’s doubtful any additional rootkits were installed or files were exfiltrated from the system. The most likely candidate is another simple check like verifying what processes are running in memory. These data points will be invaluable once we initiate our incident response.
Using Session Size to Identify Email based C&C
One of the more popular C&C channels is email. A great example was the attacks against the Ukrainian power grid. Typically, spearphishing is used to get employees to run software on the local system. What gets installed is an email client that reached out to a specific account on GMail, Office 365 or Yahoo. This channel can then be used to relay C&C order, log the user’s keystrokes, or even grab screenshots. The attacker simply emails the marching orders to the account and waits for the compromised system to check in and retrieve the message. Conversely, the malware can use the email channel to exfiltrate data off of the local network.
C&C over email is especially challenging because it is difficult to distinguish between user activity and C&C activity. From a timing perspective, both are going to check in with the mail server on a regular basis to see if there are any new emails that need to be retrieved. Have a look at the time graph in Figure 2. We can see that the timing between connections are very tightly grouped together. However, this could be a user checking email, or a C&C channel, we really can’t tell the difference as both use similar timing intervals.
Figure 2: Time analysis of an internal system checking email. This could be a regular user account or it could be C&C traffic. Both use the same timing to retrieve messages.
This difficulty in distinguishing between user and C&C traffic over email is why many network based threat hunting tools ignore all traffic to mail servers. Since they are just analyzing timing, they can’t distinguish between user traffic and C&C traffic, so they ignore the traffic in an attempt to reduce false positive alerts. This is not optimal, as it means they have no way of detecting C&C channels over email when it actually does occur.
Identify Expected Behaviour
Let’s develop a hypothesis of what we expect user emails to look like from a session size perspective, and then analyze some traffic. The average employee sends and receives about 130 emails per day. This means that most of the time when the email client checks for email, there is none to retrieve. When messages are transmitted, sometimes they are a single sentence, sometimes a paragraph, and sometimes they have attachments. Since it would be rare for messages to be exactly the same size, I would expect the session size to vary widely for each message. Pulling this all together, I would expect to see a small session size most of the time as no messages are available. I would then expect to see some larger number (maybe as high as 130) for each of the message transmissions as just about all of them will be a unique size. Note that this hypothesis may skew a bit for users that only power up their system for a few hours a day. In that case there will be one large transfer as a majority of the messages are delivered.
Performing The Session Size Analysis
Now that we know what we expect to see from regular user traffic, have a look at Figure 3. Does this match our hypothesis?
Figure 3: This internal system is checking email. Every session size is identical, which can be a clear indication of a C&C channel.
The figure obviously does not match our hypothesis as all sessions are exactly the same size. While this could be a user that never receives any email, a more likely explanation is that this is the heartbeat from a C&C channels that has not been activated over this 24 hour period. We should perform a deeper analysis on the system to verify whether it has been compromised. If we observed a few additional session sizes, we would still want to do a deeper investigation as the most likely explanation would be a C&C channel that was activated by the attacker.
Lessons Learned
Time-based session analysis is an important process for identifying beacons between IP pairs. However, for some protocols, user and C&C activity can use similar timing. When this occurs, analyzing sessions based on a size comparison can be a quick solution for distinguishing between malicious and normal activity. Further, session size analysis can reveal details about a beacon that cannot be determined through a strict time-based analysis, such as whether a beacon has been activated and the likelihood that data has been stolen.
Chris has been a leader in the IT and security industry for over 20 years. He’s a published author of multiple security books and the primary author of the Cloud Security Alliance’s online training material. As a Fellow Instructor, Chris developed and delivered multiple courses for the SANS Institute. As an alumni of Y-Combinator, Chris has assisted multiple startups, helping them to improve their product security through continuous development and identifying their product market fit.