Optimizing PostgreSQL Buffer Cache: Automating Analysis with a Bash Script
Dmitry Romanoff
Posted on September 7, 2024
Efficient buffer cache management is a crucial aspect of optimizing PostgreSQL database performance. The buffer cache plays a vital role in holding data pages in memory, thereby reducing disk I/O operations and improving query execution times. Understanding how buffer cache is allocated across different databases can provide valuable insights into optimizing performance and ensuring effective resource utilization.
In this article, we’ll walk through a practical Bash script that automates the process of checking buffer cache allocation for all databases on a PostgreSQL instance. The script simplifies the analysis and helps database administrators identify which tables are consuming the most buffer cache. This can be particularly useful for performance tuning and resource optimization.
Script Overview
The provided Bash script performs the following tasks:
- Retrieves Database List: It connects to the PostgreSQL instance and retrieves the list of databases currently in use.
- Collects Buffer Cache Information: For each database, the script queries buffer cache information to determine how much cache is allocated to different tables.
- Formats and Reports: The script formats the output for readability and displays a report of buffer cache usage.
How the Script Works
Let’s break down the script into its key components:
1. Setup and Initialization:
#!/bin/bash
# Script to check buffer cache allocation for all the databases on some db instance
nPort=$1
current_date_time="$(date +'%Y%m%d_%H%M%S')"
fileOne="./temp_output/check_buffer_cache_${current_date_time}.tmp"
fileTwo="./temp_output/check_buffer_cache_${current_date_time}_formatted.tmp"
echo " "
echo "Started..."
echo " "
The script begins by setting up the necessary variables, including the port number (nPort) and file paths for storing temporary output files. It also captures the current date and time for timestamping the output files.
2. Iterating Over Databases:
for i in `psql -h localhost -p ${nPort} -U postgres -t -c "select datname from pg_database where datname in (select datname from pg_stat_activity) order by 1"`; do
echo "Collecting buffercache info about database: $i"
The script queries the list of databases that are currently active and iterates over each database. For each database, it collects buffer cache information.
3. Collecting Buffer Cache Information:
psql -h localhost -p ${nPort} -U $i -d $i -qt <<SQL >> ${fileOne}
\\c ${i};
SELECT '${i}' as db_name, n.nspname, c.relname, count(*) AS buffers
FROM pg_buffercache b JOIN pg_class c
ON b.relfilenode = pg_relation_filenode(c.oid) AND
b.reldatabase IN (0, (SELECT oid FROM pg_database
WHERE datname = current_database()))
JOIN pg_namespace n ON n.oid = c.relnamespace and n.nspname ='public'
GROUP BY n.nspname, c.relname
ORDER BY 4 DESC
LIMIT 10;
SQL
For each database, the script executes a SQL query to retrieve buffer cache usage. It joins several PostgreSQL system catalog tables to get details about buffer allocation for each table, filtering to include only those in the ‘public’ schema.
4. Formatting and Displaying the Report:
cat ${fileOne} | awk -F"|" ' { if (NF==4) { printf("%-40s %-20s %-50s %s\n",$1, $2, $3, $4); } } ' | sort -nrk4 > ${fileTwo}
echo " "
echo "Report:"
echo " "
cat ${fileTwo}
echo " "
echo "Ended..."
echo " "
After collecting the data, the script formats it using awk to ensure readability. It sorts the output by the number of buffers in descending order and saves the result to another temporary file. Finally, it displays the formatted report.
Running the Script
To execute this script, you need to pass the port number of the PostgreSQL instance as an argument:
./check_buffer_cache.sh 5432
Replace 5432 with the port number of your PostgreSQL instance. The script will generate and display a report of buffer cache usage for all active databases.
Conclusion
This script is a handy tool for PostgreSQL administrators looking to gain insights into buffer cache allocation. By automating the collection and formatting of buffer cache data, the script helps in identifying performance bottlenecks and optimizing resource usage across multiple databases. For best results, consider running this script during different periods to monitor changes in cache usage and make informed decisions about database tuning.
Here’s the full script for your reference:
#!/bin/bash
# Script to check buffer cache allocation for all the databases on some db instance
nPort=$1
current_date_time="$(date +'%Y%m%d_%H%M%S')"
fileOne="./temp_output/check_buffer_cache_${current_date_time}.tmp"
fileTwo="./temp_output/check_buffer_cache_${current_date_time}_formatted.tmp"
echo " "
echo "Started..."
echo " "
for i in `psql -h localhost -p ${nPort} -U postgres -t -c "select datname from pg_database where datname in (select datname from pg_stat_activity) order by 1"`; do
echo "Collecting buffercache info about database: $i"
#echo " "
psql -h localhost -p ${nPort} -U $i -d $i -qt <<SQL >> ${fileOne}
\\c ${i};
SELECT '${i}' as db_name, n.nspname, c.relname, count(*) AS buffers
FROM pg_buffercache b JOIN pg_class c
ON b.relfilenode = pg_relation_filenode(c.oid) AND
b.reldatabase IN (0, (SELECT oid FROM pg_database
WHERE datname = current_database()))
JOIN pg_namespace n ON n.oid = c.relnamespace and n.nspname ='public'
GROUP BY n.nspname, c.relname
ORDER BY 4 DESC
LIMIT 10;
SQL
done
cat ${fileOne} | awk -F"|" ' { if (NF==4) { printf("%-40s %-20s %-50s %s\n",$1, $2, $3, $4); } } ' | sort -nrk4 > ${fileTwo}
echo " "
echo "Report:"
echo " "
cat ${fileTwo}
echo " "
echo "Ended..."
echo " "
Posted on September 7, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
September 7, 2024