zcutter – More Flexible Zeek Log Processing
The Zeek network monitoring package has been a wonderful resource for the Internet Security community. It’s open source, so people can inspect it. It’s free, so anyone can afford it. It has a plugin architecture, so we can contribute analysis modules.
By default Zeek sends everything it has learned about your packets to log files. Because these have 2 standard formats, they can be sent over to other analysis tools (including one we’re proud of! 🙂). But there will always be times when we need to work with these logs directly, and that gets tricky.
Zeek Log Files
Zeek’s log files can have from hundreds to hundreds of thousands of lines, each with anywhere from around 5 to 40 columns. This makes them far too large to simply load into an editor to manually search for things. We need to fall back on command line tools to be able to make any progress with them.
In order to follow along with this discussion, we encourage you to log in to your Zeek sensor and make a copy of a day’s worth of logs so you can analyze the copy.
Reading Compressed Files
If you look in the directories that contain completed hours of logs (as opposed to the “current” directory that holds the logs from right now) you’ll see that the files are all compressed with a tool called gzip:
cd /opt/zeek/logs/2023-12-05 ls -al conn.* -rw-r--r-- 1 root root 15860 Dec 5 01:00 conn.00:00:00-01:00:00.log.gz -rw-r--r-- 1 root root 11935 Dec 5 02:00 conn.01:00:00-02:00:00.log.gz -rw-r--r-- 1 root root 10972 Dec 5 03:00 conn.02:00:00-03:00:00.log.gz ...
(The output you’ll get for these commands will be different, of course.)
Instead of uncompressing them, viewing them with “less”, and re-compressing them afterwards, we’ll use a tool that handles this for us:
This will temporarily open up the file for you and run “less” so you can view the uncompressed file. When you exit by pressing “q”, the temporary file is automatically deleted.
Looking For Lines That Contain Specific Text
Tools like “grep” allow us to find lines that contain a specific block of text. Since we’re dealing with compressed files it’s handy to use “zgrep” instead since it temporarily decompresses the files for us and deletes the temporary file when done (just like we did with “zless” above.)
Let’s say we want to find systems making UPNP multicast packets on our network. I can look for traffic destined for IP address 126.96.36.199 and UDP port 1900 with the following (the next two lines are a single command):
I’ve highlighted the text that was matched in that line.
Where grep Starts To Fall Flat
The zgrep approach works fine, but it doesn’t care about where that text is found on the line. If I search for traffic from a DHCP client to a DHCP server (UDP with a source port of 68 and a destination port of 67), I might try looking for this with:
zgrep '68.*67.*udp' conn.00\:00\:00-01\:00\:00.log.gz
Here’s the first line of output I got from that command:
So why did we get this – the source and destination ports are both 5353? It’s because “68” and “67” show up in places other than the port columns. Here’s that line again with the matches highlighted:
This is why I really want to view specific columns, not the entire line.
Looking At Specific Fields in Those Lines
Zeek logs include labels at the top to show what each of the columns are. If you look at any Zeek log file with a command like:
zcat conn.00\:00\:00-01\:00\:00.log.gz | head -n 8
you’ll see the first 8 lines of the file which include two lines that start with “#fields” and “#types”. Since they start with a comment character they won’t be considered log data, but Zeek-aware tools can use these labels to identify each column:
As you read across the #fields line, you can match each entry up with the corresponding type in #types . For example, the _node_name field is of type string .
Field Parsing Tools
Zeek’s authors realized it would be handy to extract specific columns from a Zeek log to make them easier to view. Let’s say I only wanted to see the source IP, source port, destination IP, destination port, and protocol. Here’s how I’d do that:
The zeek-cut program reads the raw log lines and extracts just the requested columns, making them much easier to read.
So where did I get the column names? These come from those header lines we saw above. Here are the field names I used:
Let’s get back to our original problem of wanting to see traffic going from udp port 68 to udp port 67. I’ll take the above output and feed it into grep to look for that pattern:
Since we give grep only the columns of interest, we get the output we wanted in the first place.
If you run man zeek-cut you’ll see the reference to all the options.
zeek-cut is part of the zeek-aux package. On very old linux systems the program used to be called bro-cut, which was part of the bro-aux package. Unfortunately, neither zeek-cut nor bro-cut is always available as part of the linux distribution so you’d be expected to compile it on your system.
As an alternative solution, we offer a zeek-cut companion tool called zcutter. In most cases zcutter is a drop-in replacement for zeek-cut; anywhere you could use the zeek-cut program you can use zcutter with the same output. zcutter offers a few more features, though:
- It will read one or more TSV or Json files as input. These files can be uncompressed, gzip-compressed, bzip2-compressed, or any mix of these.
- It will output either TSV or Json output, allowing it to convert between formats. if an output directory is specified, each input file will be sent to an equivalent output file in that directory.
Since zcutter is a python script it can be downloaded to any system that includes python3 (which includes all Linux systems and MacOS). It doesn’t require any compilation step. SInce it’s a python script it will run more slowly than the original zeek-cut.
It needs to be downloaded, made executable, and placed in your path. If you’re installing this for your own use only you can run the following commands:
mkdir -p ~/bin/ cd ~/bin/ wget https://raw.githubusercontent.com/activecm/zcutter/main/zcutter.py -O zcutter.py chmod 755 zcutter.py if ! type zeek-cut >/dev/null 2>&1 ; then ln -s zcutter.py zeek-cut ; fi
The last line will create a link to zeek-cut; this allows zcutter to step in for zeek-cut on systems that don’t already have it.
- Look at the source IP, method, host, and URI fields from an http log
zcat http.00\:00\:00-01\:00\:00.log.gz | nice zcutter.py id.orig_h method host uri -C | less -S -x 20
- Same as above, but automatically decompress input logs
nice zcutter.py id.orig_h method host uri -C -r http.00\:00\:00-01\:00\:00.log.gz | less -S -x20
- Convert all gzip compressed logs (except conn-summary logs) in this directory to json and save the uncompressed json logs in ~/json-out/
zcutter.py -j -o ~/json-out/ -r *.log.gz
- Like above, but compress the output logs at the end if successful:
zcutter.py -j -o ~/json-out/ -r *.log.gz && gzip -9 ~/json-out/*.log
- Like above, but preserve the paths under /V/source in /V/dest/ and compress with bzip2. The file glob after -r needs to match the number of levels down where the .log files are found
cd /V/source/ zcutter.py -o /V/dest/ -j -r */*/*.log.gz find /V/dest/ -mmin +1 -iname '*.log' -print0 | xargs -r -n 50 -0 nice -n 19 bzip2 -9
To see more examples of how it can be installed and used, please see https://github.com/activecm/zcutter/ .
- Different types of log files: https://docs.zeek.org/en/master/script-reference/log-files.html
- Processing Zeek json-format logs with jq: https://www.sans.org/blog/parsing-zeek-json-logs-with-jq/
- Additional article on tools to parse and process Zeek logs: https://docs.zeek.org/en/master/log-formats.html
- jq: https://jqlang.github.io/jq/ . If this is not installed by default you may be able to install it with “sudo apt install jq” or “sudo yum install jq”.
- zeek-cut is part of the zeek-aux package: https://github.com/zeek/zeek-aux . If your Linux operating system includes this as a package, you may be able to install it with “sudo apt install zeek-aux” or “sudo yum install zeek-aux”.
- zcutter intro: https://www.activecountermeasures.com/free-tools/zcutter/
- zcutter source and examples: https://github.com/activecm/zcutter
Interested in threat hunting tools? Check out AC-Hunter
Active Countermeasures is passionate about providing quality, educational content for the Infosec and Threat Hunting community. We appreciate your feedback so we can keep providing the type of content the community wants to see. Please feel free to Email Us with your ideas!
Bill has authored numerous articles and tools for client use. He also serves as a content author and faculty member at the SANS Institute, teaching the Linux System Administration, Perimeter Protection, Securing Linux and Unix, and Intrusion Detection tracks. Bill’s background is in network and operating system security; he was the chief architect of one commercial and two open source firewalls and is an active contributor to multiple projects in the Linux development effort. Bill’s articles and tools can be found in online journals and at http://github.com/activecm/ and http://www.stearns.org.