zchtodd
Posted on September 28, 2019
In this article I will go through the process of developing and installing a simple foreign data wrapper. We’ll use the PostgreSQL Multicorn extension to develop our data wrapper in Python. The wrapper we’ll develop is a toy example that uses steganography to hide data inside of images. In my case, I chose an image of the Internet-famous grumpy cat.
Foreign data wrappers are an advanced feature of PostgreSQL that can simplify your information architecture. Complex applications can often involve multiple data stores that each serve a specific purpose, but foreign data wrappers expand on the functionality of PostgreSQL, making it possible to centralize your data.
The foreign data wrapper concept was first introduced in PostgreSQL 9.1 as a means of reading data outside of the database. The feature was expanded in version 9.3 to include the ability to write to external data sources. Best of all, this external data appears as tables that can be used in SQL queries.
As of this writing, there are over sixty data wrappers available for use. These wrappers can expose everything from CSV files, to other databases such as MySQL, or even Google Sheets.
Getting Started
We’ll use the Multicorn extension to get a big head start on writing our data wrapper. Data wrappers are normally written in C and must implement an extensive interface. The blackhole data wrapper, which discards any data written to it, is a minimal example that implements the C interface.
Fortunately, Multicorn will implement the C interface for us, while presenting a simpler interface that we can implement in Python.
We’ll need the following for this project:
- PostgreSQL 10
- PostgreSQL server development files (headers)
- Python3 (I'm using 3.6.4)
- Python3 development files (headers)
- Pip for Python3
- Multicorn package
- Stegano package
PostgreSQL Installation
On an Ubuntu or other Debian based system, installation will be done through the package manager like so:
sudo apt-get install postgresql-10
To install the development files:
sudo apt-get install postgresql-server-dev-10
For other platforms, please follow the instructions on the PostgreSQL download page.
Python Installation
On Ubuntu 18 or greater both Python 2 and Python 3 will be installed already. We will, however, need to install the Python package manager, the setuptools package, and the core development files.
sudo apt-get install python3-pip
python3 -m pip install --upgrade pip setuptools wheel
sudo apt-get install python3-dev
Multicorn Installation
Finally, we’ll need to install Multicorn. To do that, we’ll clone the Multicorn GitHub repository and install the package from source. Note that your system will need a functioning C compiler toolchain. On Ubuntu these dependencies can be obtained via the build-essential package.
sudo apt-get install build-essential
Once your system is ready to compile Multicorn:
git clone git@github.com:Kozea/Multicorn.git
Compiling and installing Multicorn is fairly simple, but there is one catch that we’ll need to pay attention to. By default, the library will link against Python 2, but it checks an environment variable called PYTHON_OVERRIDE that we can use to point to Python 3.
sudo PYTHON_OVERRIDE=python3 make install
Setting up the Data Wrapper Package
We should be ready to write our first data wrapper now that all of the dependencies are installed. To do that, Multicorn provides a Python base class that you’ll inherit from. The base class has quite a few methods, but creating a functioning wrapper can be as simple as overriding just one of them.
In our case, we’ll be overriding two methods so that we can both read and write to our table.
Your data wrapper will need to be installed as a Python package in order for Multicorn to discover it. Luckily if you’ve never created a Python package before it is fairly simple.
Your Python package structure should look similar to this:
grumpy
├── grumpy
│ └── __init__.py
└── setup.py
The init.py file will hold the actual wrapper implementation, while setup.py contains metadata about the package we’ve just created. Let’s take a look at the setup.py file.
from setuptools import setup
setup(
name='grumpy',
version='0.0.1',
author='Zach',
license='Postgresql',
packages=['grumpy']
)
Once the code is ready to go, installing it as simple as executing the following:
sudo python3 setup.py install
Implementing the Multicorn Interface
The code to support select and insert statements is fairly simple. The constructor of our SteganoForeignDataWrapper class requires the options and columns that a table using the wrapper has been declared with. The options argument, as we’ll see later, comes from the table declaration. The options will be represented as a dictionary in Python, and in our case, will store the path to the image.
The execute method is called when a select statement is received and is expected to act as a generator that yields rows. In the execute method we’ll use the stegano package to recover any secret messages that have been embedded in our picture of grumpy cat.
The insert method receives a dictionary with the new row. We’ll use the columns list to iterate over the row so that values are always stored in the same order. Here we’ll do the inverse of the execute method and use the stegano package to encode the values into the image. When implementing the insert method, you will also need to have a rowid_column property that identifies the column to be used as a primary key.
import csv
import io
from multicorn import ForeignDataWrapper
from stegano import lsbset
from stegano.lsbset import generators
class SteganoForeignDataWrapper(ForeignDataWrapper):
def __init__(self, options, columns):
super(SteganoForeignDataWrapper, self).__init__(options, columns)
self.options = options
self.columns = columns
def execute(self, quals, columns):
data = io.StringIO(
lsbset.reveal(self.options["path"], generators.eratosthenes())
)
reader = csv.reader(data, delimiter=",")
for line in reader:
row = {}
for i, column_name in enumerate(self.columns):
row[column_name] = line[i]
yield row
@property
def rowid_column(self):
return "id"
def insert(self, new_values):
out = io.StringIO()
writer = csv.writer(out, delimiter=",")
writer.writerow([str(new_values[column]) for column in self.columns])
updated_image = lsbset.hide(
self.options["path"], out.getvalue(), generators.eratosthenes()
)
updated_image.save(self.options["path"])
Activating the Data Wrapper
Using a foreign data wrapper in PostgreSQL is a three step process. First, you declare the extension that implements the data wrapper. Once the extension is declared, you create what PostgreSQL calls a server. The server declaration is an opportunity to pass options that will apply to all tables created using the extension.
Finally, you’ll create tables using a syntax that is mostly identical to any normal table.
Let’s take a look at an example of that process.
CREATE EXTENSION multicorn
CREATE SERVER multicorn_srv foreign data wrapper multicorn options (
wrapper 'grumpy.SteganoForeignDataWrapper'
);
CREATE FOREIGN TABLE grumpy (
test1 character varying,
test2 character varying
) server multicorn_srv options (path ‘/home/zach/grumpy.png’);
Because Multicorn itself actually implements the PostgreSQL C interface, it will act as the extension in this case. When we create the server we pass a “wrapper” option that Multicorn will use to locate the Python class that we’ve written.
In the table declaration I pass along the “path” option to indicate the image that will store the data.
Once everything is set up, using the table in a query is straightforward:
postgres=# insert into grumpy values ('baz', 'bar');
INSERT 0 1
postgres=# select * from grumpy;
test1 | test2
-------+-------
baz | bar
Wrapping up
This is a silly example here, but I hope I’ve shown not only how powerful the concept of data wrappers can be, but also how simple it can be to implement one of them. You can visit the PostgreSQL wiki page if you’d like to see a list of foreign data wrappers.
Posted on September 28, 2019
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.