Detecting PII leakage in logs
Wesley Skeen
Posted on March 13, 2023
First I wanted to mention I collaborated on this project and article with @mereta.
Before we begin, I want to direct you to the post I published to set up grafana locally using docker. Here you will find simple steps to get your environment set up to experiment.
Once you have this running, I want to direct you towards the promtail.yml
file. This is what we are going to change to let promtail apply our PII detection logic.
Pipeline Stages
We are going to add pipeline_stages
to this file.
Simply put, each log that gets passed through promtail will go through these stages. We can perform a number of actions that you can read in detail about here in the grafana docs, but I will go through stages to
- Detect PII
- Validate the result of the detection
- Create a label to hold the result
Detect PII
As part of the stages
section, we added the regex stage
- regex:
expression: '(?P<sensitive_email>([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+))'
Here we add an expression
. This is built up of 2 parts
(?P<{0}>({1}))
- 0 - This is the variable that holds the result of the regex match
- 1 - This is the actual regex used on the log content
Validate the result of the detection
Next we have the template stage
- template:
source: sensitive_email
template: '{{ not (empty .Value) }}'
This stage takes the result held in the variable that was set in the regex stage and applies some logic to it. This logic also updates the value of the variable.
Log | value in sensitive_email
|
{{ not (empty .Value) }} |
sensitive_email new value |
---|---|---|---|
My email is JP@mail.com | JP@mail.com | true | true |
My email is *** | false | false |
Create a label to hold the result
For this all we have to do is add the following
- labels:
sensitive_email:
This adds a label to the log and sets its value to what is held in sensitive_email
Example of it working
I added a log in my API
_logger.LogInformation($"my data is JP@mail.com");
Here is the result in Loki
As you can see, the log line is
and the value of sensitive_email
is true
New content of promtail.yml
With the above addition of pipeline_stages
this file should look like. I have added another example of detecting credit card PII.
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: system
pipeline_stages:
- match:
pipeline_name: "security"
selector: '{app="api"}'
stages:
- regex:
expression: '(?P<sensitive_creditcard>(?:\d[ -]*?){13,16})'
- regex:
expression: '(?P<sensitive_email>([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+))'
- template:
source: sensitive_creditcard
template: '{{ not (empty .Value) }}'
- template:
source: sensitive_email
template: '{{ not (empty .Value) }}'
- labels:
sensitive_creditcard:
sensitive_email:
static_configs:
- targets:
- localhost
labels:
job: varlogs
__path__: /var/log/*local.log
app: 'api'
Using the results of these stages
There are several things you can do with these new log labels. Among others, you could
- Create an alert to detect if PII has leaked into your logs.
- Create dashboards to monitor base on the new labels
- You can do some interesting things in grafana such as route these logs to a different tenant. This tenant would have special privileges to view logs with PII contained.
Improvements
Merge the results of the regex matches into a single label.
First we need to update the source template to
- template:
source: sensitive_email
template: '{{ if not (empty .Value) }} true {{ end }}'
- template:
source: sensitive_creditcard
template: '{{ if not (empty .Value) }} true {{ end }}'
then we add a new source template to merge the results
- template:
source: sensitive
template: '{{ or .sensitive_email .sensitive_creditcard false }}'
- labels:
sensitive:
Posted on March 13, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 29, 2024
November 28, 2024