Reading and Writing Files in SQLite

nalgeon

Anton Zhiyanov

Posted on February 27, 2023

Reading and Writing Files in SQLite

Sometimes it's useful to load a dataset from an external file or export query results to a file.

SQLite does not support file I/O operations by default. However, you can easily enable them using the sqlean-fileio extension.

Note. Unlike other DBMS, adding extensions to SQLite is a breeze. Download a file, run one database command — and you are good to go.

sqlean-fileio solves common import/export tasks such as:

  • Loading a JSON document from a file.
  • Reading a text file line by line.
  • Streaming query results to a file.
  • Importing all files in a directory.

Let's look at some examples.

Loading a JSON Document From a File

Suppose we have a JSON file containing employee data:

{
    "employees": [
        { "id": 11, "name": "Diane", "salary": 70 },
        { "id": 12, "name": "Bob", "salary": 78 },
        { "id": 21, "name": "Emma", "salary": 84 },
        { "id": 22, "name": "Grace", "salary": 90 },
        { "id": 23, "name": "Henry", "salary": 104 },
        { "id": 24, "name": "Irene", "salary": 104 },
        { "id": 25, "name": "Frank", "salary": 120 },
        { "id": 31, "name": "Cindy", "salary": 96 },
        { "id": 32, "name": "Dave", "salary": 96 },
        { "id": 33, "name": "Alice", "salary": 100 }
    ]
}
Enter fullscreen mode Exit fullscreen mode

And an employees table:

create table employees (
  id integer primary key,
  name text,
  salary integer
);
Enter fullscreen mode Exit fullscreen mode

To import the JSON data into the table, we combine fileo_read() with json_tree():

insert into employees(id, name, salary)
select
  json_extract(value, '$.id'),
  json_extract(value, '$.name'),
  json_extract(value, '$.salary')
from json_tree(
  fileio_read('employees.json')
)
where type = 'object' and fullkey like '$.employees%';
Enter fullscreen mode Exit fullscreen mode

fileio_read() loads the file as a blob, while json_tree() iterates over it. When the query completes, the data is imported into the employees table:

select * from employees;
Enter fullscreen mode Exit fullscreen mode
┌────┬───────┬────────┐
│ id │ name  │ salary │
├────┼───────┼────────┤
│ 11 │ Diane │ 70     │
│ 12 │ Bob   │ 78     │
│ 21 │ Emma  │ 84     │
│ 22 │ Grace │ 90     │
│ 23 │ Henry │ 104    │
│ 24 │ Irene │ 104    │
│ 25 │ Frank │ 120    │
│ 31 │ Cindy │ 96     │
│ 32 │ Dave  │ 96     │
│ 33 │ Alice │ 100    │
└────┴───────┴────────┘
Enter fullscreen mode Exit fullscreen mode

Reading a Text File Line by Line

Reading the whole file into memory, as we did with employees.json, may not be a good idea for very large files (e.g., logs with millions of lines). In this case, it is better to read the file line by line.

Suppose we have an app.log file with 1M lines:

ts=2023-02-26 13:00:00,level=INFO,message=begin processing
ts=2023-02-26 13:01:00,level=INFO,message=processed 1000 records
ts=2023-02-26 13:02:00,level=INFO,message=processed 2000 records
ts=2023-02-26 13:03:00,level=INFO,message=processed 3000 records
ts=2023-02-26 13:03:25,level=ERROR,message=invalid record data
ts=2023-02-26 13:03:25,level=INFO,message=processing failed
...
Enter fullscreen mode Exit fullscreen mode

And an app_log table:

create table app_log (
  line text
);
Enter fullscreen mode Exit fullscreen mode

Let's iterate over the log file with fileio_scan(), loading lines one by one, and inserting them into the table:

insert into app_log(line)
select value from fileio_scan('app.log');
Enter fullscreen mode Exit fullscreen mode
select count(*) from app_log;
-- 1000000
Enter fullscreen mode Exit fullscreen mode

Now we can extract the individual fields using the regexp_substr function from the sqlean-regexp extension:

alter table app_log add column ts text;
alter table app_log add column level text;
alter table app_log add column message text;

update app_log set ts = substr(regexp_substr(line, 'ts=[^,]+'), 4);
update app_log set level = substr(regexp_substr(line, 'level=[^,]+'), 7);
update app_log set message = substr(regexp_substr(line, 'message=[^,]+'), 9);
Enter fullscreen mode Exit fullscreen mode

Now each log field is stored in a separate column:

select ts, level, message from app_log limit 5;
Enter fullscreen mode Exit fullscreen mode
┌─────────────────────┬───────┬────────────────────────┐
│         ts          │ level │        message         │
├─────────────────────┼───────┼────────────────────────┤
│ 2023-02-26 13:00:00 │ INFO  │ begin processing       │
│ 2023-02-26 13:01:00 │ INFO  │ processed 1000 records │
│ 2023-02-26 13:02:00 │ INFO  │ processed 2000 records │
│ 2023-02-26 13:03:00 │ INFO  │ processed 3000 records │
│ 2023-02-26 13:03:25 │ ERROR │ invalid record data    │
└─────────────────────┴───────┴────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Neat!

Streaming Query Results to a File

Suppose we want to export the ERROR log lines into a separate file. Let's use fileio_append for that:

select sum(
  fileio_append('error.log', printf('%s: %s', ts, message) || char(10))
) from app_log
where level = 'ERROR';
Enter fullscreen mode Exit fullscreen mode

This is error.log after the export:

2023-02-26 13:03:25: invalid record data
Enter fullscreen mode Exit fullscreen mode

Importing All Files in a Directory

Suppose we have multiple log files:

app.log.1
app.log.2
app.log.3
...
Enter fullscreen mode Exit fullscreen mode

Let's import them all at once using the filio_ls() function.

First, we'll look at the files to make sure we're loading the correct data:

select * from fileio_ls('logs')
where name like 'logs/app.log%';
Enter fullscreen mode Exit fullscreen mode
┌────────────────┬───────┬────────────┬──────┐
│      name      │ mode  │   mtime    │ size │
├────────────────┼───────┼────────────┼──────┤
│ logs/app.log.2 │ 33188 │ 1677425479 │ 316  │
│ logs/app.log.3 │ 33188 │ 1677425496 │ 377  │
│ logs/app.log.1 │ 33188 │ 1677425467 │ 316  │
└────────────────┴───────┴────────────┴──────┘
Enter fullscreen mode Exit fullscreen mode

Seems fine. Now let's import them into the logs table:

create table logs(fname text, line text);
Enter fullscreen mode Exit fullscreen mode
with files as (
  select name from fileio_ls('logs')
  where name like 'logs/app.log%'
)
insert into logs(fname, line)
select files.name, value from fileio_scan(files.name), files;
Enter fullscreen mode Exit fullscreen mode

Let's double-check that all logs are imported:

select fname, count(*)
from logs
group by fname;
Enter fullscreen mode Exit fullscreen mode
┌────────────────┬──────────┐
│     fname      │ count(*) │
├────────────────┼──────────┤
│ logs/app.log.1 │ 5        │
│ logs/app.log.2 │ 5        │
│ logs/app.log.3 │ 6        │
└────────────────┴──────────┘
Enter fullscreen mode Exit fullscreen mode

Looks fine!

Installation and Usage

  1. Download the latest release

  2. Use with SQLite command-line interface:

sqlite> .load ./fileio
sqlite> select * from fileio_read('data.txt');
Enter fullscreen mode Exit fullscreen mode

See How to Install an Extension for usage with IDE, Python, etc.

See Extension Documentation for reference.

Follow @ohmypy on Twitter to keep up with new posts

💖 💪 🙅 🚩
nalgeon
Anton Zhiyanov

Posted on February 27, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related