Storing server logs on CrateDB for fast search and aggregations

hlcianfagna

Hernán Lionel Cianfagna

Posted on August 10, 2023

Storing server logs on CrateDB for fast search and aggregations

Did you know that CrateDB can be a great store for your server logs?

If you have been using log aggregation tools or even some of the most advanced commercial SIEM systems, you have probably experienced the same frustrations I have:

  • timeouts when searching logs over long periods of time
  • a complex and proprietary query syntax
  • difficulties integrating queries on logs data into application monitoring dashboards

Storing server logs on CrateDB solves these problems, it allows to query the logs with standard SQL and from any tool supporting the PostgreSQL protocol; its unique indexing also makes full-text queries and aggregations super fast.
Let me show you an example.

First, we will need an instance of CrateDB, it may be best to have a dedicated cluster for this purpose, to separate the monitoring system from the systems being monitored, but for the purpose of this demo we can just have a single node cluster on a docker container:

sudo docker run -d --name cratedb --publish 4200:4200 --publish 5432:5432 crate -Cdiscovery.type=single-node
Enter fullscreen mode Exit fullscreen mode

Next, we need a table to store the logs, let's connect to http://localhost:4200/#!/console and run:

CREATE TABLE doc.systemevents (
    message TEXT
    ,INDEX message_ft USING FULLTEXT(message)
    ,facility INTEGER
    ,fromhost TEXT
    ,priority INTEGER
    ,DeviceReportedTime TIMESTAMP
    ,ReceivedAt TIMESTAMP
    ,InfoUnitID INTEGER
    ,SysLogTag TEXT 
    );
Enter fullscreen mode Exit fullscreen mode

Tip: if you are on a headless system you can also run queries with command-line tools.

Then we need an account for the logging system:

CREATE USER rsyslog WITH (PASSWORD='pwd123');
Enter fullscreen mode Exit fullscreen mode

and we need to grant permissions on the table above:

GRANT DML ON TABLE doc.systemevents TO rsyslog;
Enter fullscreen mode Exit fullscreen mode

We will use rsyslog to send the logs to CrateDB, for this setup we need rsyslog v8.2202 or higher and the ompgsql module:

sudo add-apt-repository ppa:adiscon/v8-stable
sudo apt-get update
sudo apt-get install rsyslog
sudo debconf-set-selections <<< 'rsyslog-pgsql rsyslog-pgsql/dbconfig-install string false'
sudo apt-get install rsyslog-pgsql
Enter fullscreen mode Exit fullscreen mode

Let's now configure it to use the account we created earlier:

echo 'module(load="ompgsql")' | sudo tee /etc/rsyslog.d/pgsql.conf
echo '*.* action(type="ompgsql" conninfo="postgresql://rsyslog:pwd123@localhost/doc")' | sudo tee -a /etc/rsyslog.d/pgsql.conf
sudo systemctl restart rsyslog
Enter fullscreen mode Exit fullscreen mode

If you are interested in more advanced setups involving queuing for additional reliability in production scenarios, you can read more about available settings in the rsyslog documentation.

Now let's imagine that we want to run a container with MediaWiki to host an intranet and we want all logs to go to CrateDB, we can just deploy this with:

sudo docker run --name mediawiki -p 80:80 -d --log-driver syslog --log-opt syslog-address=unixgram:///dev/log mediawiki
Enter fullscreen mode Exit fullscreen mode

If we now point a web browser to port 80 http://localhost/ we will see a new MediaWiki page.
Let's play around a bit to generate log entries, just click on "set up the wiki" and then once on Continue.
This will have generated entries in the doc.systemevents table with syslogtag matching the container id of the container running the site.

We can now use the MATCH predicate to find the error messages we are interested in:

SELECT devicereportedtime,message
FROM doc.systemevents
WHERE MATCH(message_ft, 'Could not reliably determine') USING PHRASE
ORDER BY 1 DESC;
Enter fullscreen mode Exit fullscreen mode
+--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| devicereportedtime | message                                                                                                                                                                     |
+--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|      1691510710000 | AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 172.17.0.3. Set the 'ServerName' directive globally to suppress this message |
|      1691510710000 | AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 172.17.0.3. Set the 'ServerName' directive globally to suppress this message |
+--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Enter fullscreen mode Exit fullscreen mode

Let's now see which log sources created the most entries:

SELECT syslogtag,count(*)
FROM doc.systemevents
GROUP BY 1
ORDER BY 2 DESC
LIMIT 5;
Enter fullscreen mode Exit fullscreen mode
+----------------------+----------+
| syslogtag            | count(*) |
+----------------------+----------+
| kernel:              |       23 |
| 083053ae8ea3[52134]: |       20 |
| systemd[1]:          |       15 |
| sudo:                |       10 |
| rsyslogd:            |        5 |
+----------------------+----------+
Enter fullscreen mode Exit fullscreen mode

I hope you found this interesting. Please do not hesitate to let us know your thoughts in the CrateDB Community.

💖 💪 🙅 🚩
hlcianfagna
Hernán Lionel Cianfagna

Posted on August 10, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related