I believe that any "big enough" backend system should have a packet capture tool in place. Even if you don't want to build an intrusion detection system as such, it will be still an extremely valuable resource for issue investigation.

Whilst there're a number of dedicated hardware appliances capable of doing this, I think most organization cannot afford it, so do mine 😊. Thus I'd like to share my experience in finding the right open-source packet capture tool, and a brief usage of the tool that we ended up with (Stenographer).

What do we want from the tool?

Here are the requirements of the tool I was looking for:

Capture packets in real-time and write to disks, with minimal performance impact.
Do automatic house keeping on the captured packets such as data rollover when the allocated disks are full.
Allow to segregate the packet captures on different network interfaces to different disks. This capability will allow us to have a longer history on the interfaces known to have less activities.
Allow to read back the captured packets with provided filters.

Why Stenographer?

I have found three candidates:

Moloch (now Arkime)
PcapDB
Stenographer

All of them provide the basic requirements we need. The key distinct features of them are that:

Moloch has its own web interface and utilizes Elasticsearch DB for the index data storage. Its index is up to the application layer (whilst the other two are up to the transport layer only), which means the search/query ability is the best. However, obviously this comes with additional storage requirement and an entire set of Elasticsearch configuration.
PcapDB has its own web interface and utilizes Postgres for the index data storage. Its index is up to the transport layer so we can only search/query for fields like address, port (but hence less storage requirement compared to Moloch). Its strength is the ability to capture traffic in nodes across geographically disparate networks. It seems to be the only tool that can capture up to 100 Gbps.
Stenographer does not have web interface. It utilizes LevelDB SSTable for the index data storage. Same as PcapDB, its index is up to the transport layer. It only supports a query via command line (subset of BPF syntax). Its design principle is to heavily favor the write operation (writing the captured data), and expect that the query should return just a small subset of the captured data. The captured data is returned as pcap which will be passed through tcpdump; and then we can do anything with it as we will be able to do with tcpdump (e.g. additional filtering, writing to disk). Its capture capability is ~10Gbps.

We have selected Stenographer because:

It's by far the most lightweight tool with the simplest settings.
10 Gbps limitation is not an issue for us
Querying back the data with up to the transport layer filtering is good enough (e.g. addresses, ports, protocols).
Although no web interface is provided by default, it supports a remote query via gRPC. If really required, we can develop our own user interface that wraps gRPC.

Introduction to Stenographer

We can simply think of it as a tcpdump wrapper which offers additional capabilities:

Indexing (for queries)
File housekeeping
gRPC support

Stenographer consists of a few separate processes:

Stenotype - reads packet data off the wire, indexes it, and writes it to disk
Stenographer - a long-running server which manages Stenotype as a child process, watches disk usage and cleans up old files, and serves data to analysts based on their queries. It controls the query access using TLS with client certificates.
Stenocurl - simple shell script that wraps the curl utility, adding the various flags necessary to use the correct client certificate and verify against the correct server certificate to query packets
Stenoread - a simple addition to Stenocurl , which takes in a query string, passes the query to Stenocurl as a POST request, then passes the resulting PCAP file through tcpdump in order to allow for additional filtering, writing to disk, printing in a human-readable format, etc.

It expects that we have separate disks to write raw packets data and index data. We can define the disk percentage free as a housekeeping condition on which the rollover will start.

Queries are done by calling Stenoread command line with filter conditions.

Installation

The instruction provided on its website is simple to follow. Here is the summary:

Install Go and set all necessary paths
Run go get github.com/google/stenographer
Go to ~/go/src/github.com/google/stenographer/configs and edit steno.conf which is the configuration file. Supply the values as required (see INSTALL.md for parameter definitions).
In ~/go/src/github.com/google/stenographer, there is a script called install_el7.sh. The script will do everything from downloading the required packages to starting the services, so we can simply execute this script.

Note: on the machine I tried, somehow it failed installing levelDB and I have to do these beforehand:

Add epel-repo
Install levelDB-level

After installing:

stenographer user and group will be created. These are used to run Stenographer processes for read/write.
the configuration file and certificates created will be in /etc/stenographer. If we want to change the configuration subsequently, we will edit /etc/stenographer/config
Stenographer, Stenotype, Stenourl, Stenoread will be in /usr/bin
we will start Stenographer as a normal service (i.e. sudo service stenographer start)

And now it starts capturing packets coming in/going out from the network interfaces you configured!

Reading the Data

Run Stenoread with arguments. The first argument is for Stenographer; all other arguments are passed to tcpdump. For example:

# Request packets for any IPs in the range 1.1.1.0-1.1.1.255, writing them
# out to a local pcap file so they can be opened in Wireshark.
$ stenoread 'net 1.1.1.0/24' -w /tmp/output_for_wireshark.pcap

Reading the Data Remotely

With gRPC support, we can create a client program to issue a query command and get the result pcap remotely. This gRPC channel only supports encryption with client authentication, and the certificates are managed separately from those generated for the normal read.

gRPC support is optional and can be enabled by adding an Rpc section of settings to the config file, like this:

"Rpc": {
    "CaCert": "/data2/steno_files/certs/CertAuth.crt"
    , "ServerKey": "/data2/steno_files/certs/Stenographer.key"
    , "ServerCert": "/data2/steno_files/certs/Stenographer.crt"
    , "ServerPort": 18443
    , "ServerPcapPath": "/data2/steno_files"
    , "ServerPcapMaxSize": 1000000000
    , "ClientPcapChunkSize": 1000
    , "ClientPcapMaxSize": 5000000
}

The protobuf that defines Stenographer's gRPC service can be found in protobuf/steno.proto.

Capturing Packets from Multiple Network Interfaces

There’s no official document on how to set up Stenographer to capture packets from more than one network interfaces, though I have found https://github.com/google/stenographer/issues/122 raised, with a workaround suggested. Basically we have to create an instance template for systemd and then modify the file name of the config file accordingly (a typical *nix thing). For example,

Rename /etc/system/system/stenographer.service to /etc/system/system/stenographer@.service. In this file which points to a config file, use %i to refer to instance number (e.g. config.%i)
Rename the config file from config to config.1, config.2 and so on (up to the number of instances we want).

Configuring it like this means we will be able to run as many Stenographer instances as we want; each of them will point to a different config file (in which we will configure different network interfaces).

As for the read part, we can only point to one location of config file (hence one packets/index location) at a time. If we want to make a query to search through all locations and merge the result together, we have to modify the code. There’s also an example shown in that issue (we have to install another tool called mergecap – provided by Wireshark). It is confirmed that there is no plan for an enhancement to support automatic merge now in the main branch.

Miscellaneous

Stenographer writes log to the syslog (/var/log/user.log). Flag -v can be added for verbosity.
It is possible to specify a filter for packet capture (only packets which match this filter will be written by Stenotype). This is done via the –filter=HEX flag.
I have found an issue that the VM that ran Stenographer kept crashing. It turned out that the memory given to the VM was too low (2GB). In that case we need to set the --blocks=NUM flag to be a lower value than its default value which is 2048 (which essentially means Stenotype will need 2 GB (blocks * threads * 1MB) of memory).
When an unexpected issue happens like a process keeps crashing, we may try running a process e.g. Stenotype manually to see how it goes (see https://github.com/google/stenographer/issues/205)

Thanks a lot if you keep reading until this point! I hope that my experience is useful for you in some ways.

Blog

Intro to Stenographer - A Packet Capture Tool

Thanawat Mahatribhop