Todd Birchard
Posted on February 28, 2019
Now that we've gotten the fundamentals of creating databases and tables out of the way, we can start getting into the meat and potatoes of SQL interactions: selecting , updating , and deleting data.
We'll start with the basic structure of these queries and then break into the powerful operations with enough detail to make you dangerous.
Selecting Data From a Table
As mentioned previously, SQL operations have a rather strict order of operations which clauses have to respect in order to make a valid query. We'll begin by dissecting a common SELECT statement:
SELECT
column_name_1,
column_name_2
FROM
schema_name.table_name
WHERE
column_name_1 = "Value";
This is perhaps the most common structure of SELECT queries. First, we list the names of the columns we'd like to select separated by commas. To receive all columns, we can simply say SELECT *
.
These columns need to come from somewhere, so we specify the table we're referring to next. This either takes a form of FROM table_name
(non-PostgreSQL), or FROM schema_name.table_name
(PostgreSQL). In theory, a semicolon here would result in a valid query, but we usually want to select rows that meet certain criteria.
This is where the WHERE
clause comes in: only rows which return "true" for our WHERE
conditional will be returned. In the above example, we're validating that a string matches exactly "Value"
.
Offsetting and Limiting Results in our Queries
When selecting data, the combination of OFFSET
and LIMIT
are critical at times. If we're selecting from a database with hundreds of thousands of rows, we would be wasting an obscene amount of system resources to fetch all rows at once; instead, we can have our application or API paginate the results.
LIMIT
is followed by an integer, which in essence says "return no more than X results."
OFFSET
is also followed by an integer, which denotes a numerical starting point for returned results, aka: "return all results which occur after the Xth result:"
SELECT
*
FROM
table_name
LIMIT 50 OFFSET 0;
The above returns the first 50 results. If we wanted to build paginated results on the application side, we could construct our query like this:
from SQLAlchemy import engine, session
# Set up a SQLAlchemy session
Session = sessionmaker()
engine = create_engine('sqlite:///example.db')
Session.configure(bind=engine)
sess = Session()
# Appication variables
page_number = 3
page_size = 50
results_subset = page_number * results limit
# Query
session.query(TableName).limit(page_size).offset(results_subset)
Such an application could increment page_number
by 1 each time the user clicks on to the next page, which would then appropriately modify our query to return the next page of results.
Another use for OFFSET
could be to pick up where a failed script left off. If we were to write an entire database to a CSV and experience a failure. We could pick up where the script left off by setting OFFSET
equal to the number of rows in the CSV, to avoid running the entire script all over again.
Sorting Results
Last to consider for now is sorting our results by using the ORDER BY
clause. We can sort our results by any specified column, and state whether we'd like the results to be ascending (ASC
) or descending (DESC
):
SELECT
*
FROM
schema_name.table_name
WHERE
column_name_1 = "Value"
ORDER BY
updated_date DESC
LIMIT 50 OFFSET 10;
Sophisticated SELECT Statements
Of course, we can select rows with WHERE
logic that goes much deeper than an exact match. One of the most versatile of these operations is LIKE
.
Using Regex with LIKE
LIKE
is perhaps the most powerful way to select columns with string values. With LIKE
, we can leverage regular expressions to build highly complex logic. Let's start with some of my favorites:
SELECT
*
FROM
people
WHERE
name LIKE "%Wade%";
Passing a string to LIKE
with percentage signs on both sides is essentially a " contains" statement. %
is equivalent to a wildcard, thus placing %
on either side of our string will return true whether the person's first name, middle name, or last name is Wade. Check out other useful combinations for %
:
- a%: Finds any values that start with "a".
- %a: Finds any values that end with "a".
- %or%: Finds any values that have "or" in any position.
- _r%: Finds any values that have "r" in the second position.
- a_%_%: Finds any values that start with "a" and are at least 3 characters in length.
- a%o: Finds any values that start with "a" and end with "o".
Finding Values that are NOT LIKE
The opposite of LIKE
is of course NOT LIKE
, which runs the same conditional, but returns the opposite true/false value of LIKE
:
SELECT
*
FROM
people
WHERE
name NOT LIKE "%Wade%";
Conditionals With DateTime Columns
DateTime columns are extremely useful for selecting data. Unlike plain strings, we can easily extract numerical values for month, day, and year from a DateTime by using MONTH(column_name)
, DAY(column_name)
, and YEAR(column_name)
respectively. For example, using MONTH()
on a column that contains a DateTime of 2019-01-26 05:42:34
would return 1
, aka January. Because the values come back as integers, it is then trivial to find results within a date range:
SELECT
*
FROM
posts
WHERE YEAR(created_at) < 2018;
Finding Rows with NULL Values
NULL
is a special data type that essentially denotes the "absence of something," therefore no conditional will never equal NULL
. Instead, we find rows where a value IS NULL
:
SELECT
*
FROM
posts
WHERE author IS NULL;
This should not come as a surprise to anybody familiar with validating data types.
The reverse of this, of course, is NOT NULL
:
SELECT
*
FROM
posts
WHERE author IS NOT NULL;
Inserting Data
An INSERT
query creates a new row, and is rather straightforward: we state the columns we'd like to insert data into, followed by the values to insert into said columns:
INSERT INTO table_name (column_1, column_2, column_3)
VALUES ("value1", "value2", "value3");
Many things could result in a failed insert. For one, the number of values must match the number of columns we specify; if we don't we've either provided too few or too many values.
Second, vales must respect a column's data type. If we try to insert an integer into a DateTime column, we'll receive an error.
Finally, we must consider the keys and constraints of the table. If keys exist that specify certain columns must not be empty, or must be unique, those keys must too be respected.
As a shorthand trick, if we're inserting values into all of a table's columns, we can skip the part where we explicitly list the column names:
INSERT INTO table_name
VALUES ("value1", "value2", "value3");
Here's a quick example of an insert query with real data:
INSERT INTO friends (id, name, birthday)
VALUES (1, 'Jane Doe', '1990-05-30');
UPDATE Records: The Basics
Updating rows is where things get interesting. There's so much we can do here, so let's work our way up:
UPDATE table_name
SET column_name_1 = 'value'
WHERE column_name_2 = 'value';
That's as simple as it gets: the value of a column, in a row that matches our conditional. Note that SET
always comes before WHERE
. Here's the same query with real data:
UPDATE celebs
SET twitter_handle = '@taylorswift13'
WHERE id = 4;
UPDATE Records: Useful Logic
Joining Strings Using CONCAT
You will find that it's common practice to update rows based on data that already exists in said rows: in other words, sanitizing or modifying data. A great string operator is CONCAT()
. CONCAT("string_1", "string_2")
will join all the strings passed to a single string.
Below is a real-world example of using CONCAT()
in conjunction with NOT LIKE
to determine which post excerpts don't end in punctuation. If the excerpt does not end with a punctuation mark, we add a period to the end:
UPDATE
posts
SET
custom_excerpt = CONCAT(custom_excerpt, '.')
WHERE
custom_excerpt NOT LIKE '%.'
AND custom_excerpt NOT LIKE '%!'
AND custom_excerpt NOT LIKE '%?';
Using REPLACE
REPLACE()
works in SQL as it does in nearly every programming language. We pass REPLACE()
three values:
- The string to be modified.
- The substring within the string which will be replaced.
- The value of the replacement.
We can do plenty of clever things with REPLACE()
. This is an example that changes the featured image of blog posts to contain the “retina image” suffix:
UPDATE
posts
SET
feature_image = REPLACE(feature_image, '.jpg', '@2x.jpg');
Scenario: Folder Structure Based on Date
I across a fun exercise the other day when dealing with a nightmare situation involving changing CDNs. It touches on everything we’ve reviewed thus far and serves a great illustration of what can be achieved in SQL alone.
The challenge in moving hundreds of images for hundreds of posts came in the form of a file structure. Ghost likes to save images in a dated folder structure, like 2019/02/image.jpg. Our previous CDN did not abide by this at all, so had a dump of all images in a single folder. Not ideal.
Thankfully, we can leverage the metadata of our posts to discern this file structure. Because images are added to posts when posts are created, we can use the created_at column from our posts table to figure out the right dated folder:
UPDATE
posts
SET
feature_image = CONCAT("https://cdn.example.com/posts/",
YEAR(created_at),
"/",
LPAD(MONTH(created_at), 2, '0'),
"/",
SUBSTRING_INDEX(feature_image, '/', - 1)
);
Let's break down the contents in our CONCAT
:
-
https://cdn.example.com/posts/
: The base URL of our new CDN. -
YEAR(created_at)
: Extracting the year from our post creation date (corresponds to a folder). -
LPAD(MONTH(created_at), 2, '0')
: Using MONTH(created_at) returns a single digit for early months, but our folder structure wants to always have months a double-digits (ie: 2018/01/ as opposed to 2018/1/ ). We can useLPAD()
here to 'pad' our dates so that months are always two digits long, and shorter dates will be padded with the number 0. -
SUBSTRING_INDEX(feature_image, '/', - 1)
: We're getting the filename of each post's image by finding everything that comes after the last slash in our existing image URL.
The result for every image will now look like this:
https://cdn.example.com/posts/2018/02/image.jpg
DELETE Records
Let's wrap up for today with our last type of query, deleting rows:
DELETE FROM celebs
WHERE twitter_handle IS NULL;
Posted on February 28, 2019
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 13, 2024