Essential SQL Commands for Data Science

jadieljade

Jadieljade

Posted on March 14, 2023

Essential SQL Commands for Data Science

SQL is the main tool used by data scientists, database administrators, and database engineers to extract and manipulate data from relational databases. Understanding the structure of a SQL statement and the key commands involved makes it easy to read and use. These commands can assist in typical tasks such as creating and removing databases, adding and deleting tables, and inserting and retrieving data. In this article, we will cover the various parts of a relational database, specific sections of the SQL language, the basic format of a SQL statement, and examples of crucial SQL statements used in managing a database of your own.

First what is a relational database?
A relational database organizes data into structured tables for finding shared A relational database is a type of database that stores and organizes data into tables, each consisting of rows and columns. It is a collection of related data organized into one or more tables with a unique key to identify each row, and the tables are related to each other using foreign keys.

In a relational database, each table represents an entity or concept in the real world, such as customers, orders, or products. Each row in a table represents a specific instance of that entity, and each column represents an attribute or characteristic of that entity. For example, a customer table might have columns for the customer's name, address, and email address.

The relationship between tables in a relational database is established using foreign keys, which are used to link rows in one table to corresponding rows in another table. This allows for complex queries and analysis to be performed across multiple tables.

Relational databases are widely used in data-driven applications and are favored for their ability to efficiently manage large amounts of structured data. Some of the popular relational database management systems include MySQL, Oracle, SQL Server, and PostgreSQL.

Subsets of SQL
This article will concentrate on the SQL commands that are frequently used in database management, including those used for creating, altering, and dropping databases and tables.
These commands will be divided into four categories:
• Data manipulation language (DML) commands
• Data definition language (DDL) commands
• Data control language (DCL) commands
• Transaction control statements (TCS)

DML commands enable the manipulation and operation of data within a database. Common examples of DML commands are SELECT, INSERT, and UPDATE.

DDL commands, on the other hand, allow the definition of the structure of a database. This includes creating new tables and objects, and altering their attributes such as table name, data type, etc. Examples of DDL commands are CREATE and ALTER.

DCL commands are responsible for regulating user permissions and access to a database. These commands are used to control and manage user access to data, and examples of DCL commands include GRANT and REVOKE.

TCS commands are utilized to handle transactions within a database, which are units of work that can either be committed or rolled back. Examples of TCS commands are COMMIT and ROLLBACK.

SQL queries follow a specific syntax and order, usually composed of several commands or clauses. These commands are typically capitalized, even though SQL is not case sensitive. It is best practice to write them in all uppercase to improve readability and consistency.

DML statements are the most common type of SQL query, and they follow a basic syntax as follows:

SELECT column_name AS alias_name
   FROM table_name
   WHERE condition
   GROUP BY column_name
   HAVING condition
   ORDER BY column_name DESC;

Enter fullscreen mode Exit fullscreen mode

The syntax of a DML statement can be broken down into several parts:

  • SELECT: The SQL command that specifies the type of query you want to execute. For DML queries, this can be either SELECT or UPDATE.
  • column_name: The name of the column(s) you want to query, or modify in the case of an UPDATE statement. You can give the column a temporary alias using the AS keyword and providing an alias name.
  • FROM: This clause specifies the table(s) from which you want to query data.
  • WHERE: This clause is used to filter the query results to those that meet a specific condition. It can be used in conjunction with operators like AND, OR, BETWEEN, IN, and LIKE to create more complex queries.
  • GROUP BY: This clause groups rows with the same values into summary rows.
  • HAVING: This clause filters the results of the query, similar to the WHERE clause, but it can be used with aggregate functions.
  • ORDER BY: An optional clause that is used to sort the query results in ascending or descending order.
  • DESC: By default, the ORDER BY clause sorts results in ascending order. DESC can be used to sort results in descending order.

It is important to note that not all SQL queries follow this exact syntax, but understanding this basic structure is helpful for managing databases and performing analysis.
The top SQL commands to learn

  1. CREATE DATABASE and ALTER DATABASE The CREATE DATABASE command creates a new database. A database must be created to store any tables or data. Syntax: CREATE DATABASE database_name; Example:
CREATE DATABASE marvel_database;
Enter fullscreen mode Exit fullscreen mode

The ALTER DATABASE command modifies an existing database. For example, the ALTER DATABASE command can add or remove files from a database.
Syntax:
ALTER DATABASE database_name action;
Example:

ALTER DATABASE marvel_database ADD FILE 'ir.txt';
Enter fullscreen mode Exit fullscreen mode
  1. USE USE is selects a database. This command is frequently used to begin working with a newly created database. Syntax: USE database_name; Example:
USE marvel_database;
Enter fullscreen mode Exit fullscreen mode

Once a database has been selected, all subsequent SQL commands will be executed on that database.
Keep in mind that the USE command can only select databases that have already been created.
If a database with the specified name does not exist, then an error will be returned.

  1. CREATE TABLE, ALTER TABLE, and DROP TABLE The CREATE TABLE command creates a new table in a database. A table must be created before any data can be inserted into it. Syntax: CREATE TABLE table_name ( column_name data_type, column_name data_type, ... ); Example:
CREATE TABLE heroes_table (
    id INTEGER,
    name VARCHAR(255),
    age INTEGER
);
Enter fullscreen mode Exit fullscreen mode

In this instance, we are generating a heroes_table with three columns named id, name, and age. It's necessary to specify the data type for each column, and there are several commonly used data types such as INTEGER, VARCHAR, and DATE.

The ALTER TABLE instruction alters an already existing table. One can use the ALTER TABLE command to add or remove columns from a table, for instance.
Syntax:
ALTER TABLE table_name action;
Example:

ALTER TABLE heroes_table 
ADD email VARCHAR(255);
Enter fullscreen mode Exit fullscreen mode

In this example, we are adding a new column called email to the people_table table. The data type for the new column must be specified.
It's also possible to use the ALTER TABLE command to modify an existing column's data type.
Syntax:
ALTER TABLE table_name
MODIFY COLUMN column_name data_type;
Example:

ALTER TABLE heroes_table 
MODIFY COLUMN last_name 
VARCHAR(128);
Enter fullscreen mode Exit fullscreen mode

To change the data type of a column, you must first delete all the data from that column.
Syntax:
ALTER TABLE table_name
DROP COLUMN column_name;
Example:

ALTER TABLE heroes_table 
DROP COLUMN email; 
Enter fullscreen mode Exit fullscreen mode

The DROP TABLE command deletes an entire table from a database. This command will permanently delete all data stored in the table.
Syntax:
DROP TABLE table_name;
Example:

DROP TABLE heroes_table;
Enter fullscreen mode Exit fullscreen mode

It's important to be careful when using the DROP TABLE command, as it cannot be undone! Once a table is deleted, all data stored in that table is permanently lost.

An alternative to DROP TABLE is to use TRUNCATE TABLE instead. This command will delete all data from a table, but it will not delete the table itself.
Syntax:
TRUNCATE TABLE table_name;
Example:

TRUNCATE TABLE heroes_table;
Enter fullscreen mode Exit fullscreen mode

In this example, we are deleting all data from the heroes_table table. The table itself is not deleted, so any column information is retained.

  1. INSERT INTO The INSERT INTO command inserts data into a table. Syntax: INSERT INTO table_name (column_name, column_name, ...) VALUES (value, value, ...); Example:
INSERT INTO heroes_table (id, name, age)
VALUES (NULL, 'Bucky Barnes', 100);
Enter fullscreen mode Exit fullscreen mode

In this example, we are inserting a new row into heroes_table. The first column in the table is id. We have specified that this column should be set to NULL, which means that the database will automatically generate a unique id for this row.
The second and third columns in the table are name and age, respectively. We have specified that these columns should be set to 'Bucky Barnes' and 100 for this row.

  1. UPDATE The UPDATE command modifies data already stored in a table. Syntax: UPDATE table_name SET column_name = value, column_name = value, ... WHERE condition; Example:
UPDATE people_table
SET name = 'Winter soldier', age = 102
WHERE id = 100;
Enter fullscreen mode Exit fullscreen mode

Important: The WHERE clause is required when using the UPDATE command. Without a WHERE clause, all rows in the table would be updated!

  1. DELETE The DELETE command deletes data from a table. Syntax: DELETE FROM table_name WHERE condition; Example:
DELETE FROM heroes_table
WHERE id = 100;
Enter fullscreen mode Exit fullscreen mode

In this example, we are deleting the row with id=100 from the heroes_table table.
As with the UPDATE command, it's important to note that the WHERE clause is required when using the DELETE command. As you may have already guessed, all rows in the table would be deleted without a WHERE clause.

  1. SELECT and FROM The SELECT command queries data FROM a table. Syntax: SELECT column_name, column_name, ... FROM table_name WHERE condition; Example:
SELECT name, age
FROM heroes_table
WHERE id = 100;
Enter fullscreen mode Exit fullscreen mode

The SELECT and FROM commands are two of the most important SQL commands, as they allow you to specify and retrieve data from your database.

  1. ORDER BY The ORDER BY command sorts the results of a query. Syntax: SELECT column_name, column_name, ... FROM table_name WHERE condition ORDER BY column_name [ASC | DESC]; Example:
SELECT name, age
   FROM heroes_table
   WHERE id = 100
   ORDER BY age DESC;
Enter fullscreen mode Exit fullscreen mode

In this example, we are querying people_table for the name and age of the row with id=100. We are then sorting the results by age, in descending order.
The ORDER BY command is often used in conjunction with the SELECT command to retrieve data from a table in a specific order.
It's important to note that the ORDER BY command doesn't just work with numeric data – it can also be used to sort text data alphabetically!
ASC: By default, the order is ascending (A, B, C, . . . Z)
DESC: Descending order (Z, Y, X, . . . A)

  1. GROUP BY The GROUP BY command groups the results of a query by one or more columns. Syntax: SELECT column_name, aggregate_function(column_name) FROM table_name WHERE condition GROUP BY column_name; Example:
SELECT name, count(*)
   FROM heroes_table
   WHERE country='US'
   GROUP BY names;
Enter fullscreen mode Exit fullscreen mode

In this example, we are querying heroes_table for all of the unique names in the table. We are then using the COUNT() function to count how many times each name occurs.
The GROUP BY command is often used with aggregate functions (such as COUNT(), MIN(), MAX(), SUM(), etc.), to group data together and calculate a summary value.
The columns specified by the GROUP BY clause must also be included in the SELECT clause.

  1. HAVING The HAVING command filters the results of a query based on one or more aggregate functions. Syntax: SELECT column_name, aggregate_function(column_name) FROM table_name WHERE condition GROUP BY column_name HAVING condition; Example:
SELECT name, count(*)
   FROM heroes_table
   WHERE country='US'
   GROUP BY names
   HAVING count(*) > 0;
Enter fullscreen mode Exit fullscreen mode

In this example, we are querying the heroes_table for all of the unique names in the table. We then use the COUNT() function to count how many times each name occurs.
Finally, we use the HAVING clause to filter out any names that don't occur at least once in the table.
Similar to the GROUP BY clause, we can also use the HAVING clause alongside aggregate functions to filter query results.
Aggregate functions:
• COUNT(): counts the number of rows in a table
• MIN(): finds the minimum value in a column
• MAX(): finds the maximum value in a column
• SUM(): calculates the sum of values in a column
• AVG(): calculates the average of values in a column
Columns specified in the GROUP BY clause must also be included in the SELECT clause.
HAVING is very similar to WHERE, but there are some important differences:
• WHERE is used to filter data before the aggregation takes place, while HAVING is used to filter data after the aggregation takes place.
• WHERE can be used with aggregate functions, but HAVING can only be used with columns included in the GROUP BY clause.
• WHERE is applied to individual rows, while HAVING is applied to groups of rows.

  1. UNION and UNION ALL The UNION command combines the results of two or more queries into a single dataset. It is often used to combine data from multiple tables into a single dataset. Syntax: SELECT column_name FROM table_name1 UNION SELECT column_name FROM table_name2; Example:
SELECT names FROM sidekicks_table
UNION
SELECT email FROM heroes_table;
Enter fullscreen mode Exit fullscreen mode

In this example, we use SELECT and UNION to query names from the sidekicks_table and then combine them with emails from the people_table into a single result set.
The number and order of columns must be the same in all of the SELECT statements being combined with UNION. Also, all the columns need to be the same data type.
To combine data from multiple tables where the number and order of columns are not the same into a single dataset, use UNION ALL instead of UNION.
Syntax:
SELECT column_name FROM table_name_one
UNION ALL
SELECT column_name FROM table_name_two;
Example:

SELECT names FROM heroes_table
UNION ALL ALL
SELECT email FROM heroes_table;
Enter fullscreen mode Exit fullscreen mode

In this example, we are querying people_table for all of the unique names in the table. We are then using the UNION ALL command to combine this dataset with another dataset containing all the unique email addresses in the table.

  1. JOIN A JOIN is a way to combine data from two or more tables into a single, new table. The tables being joined are called the left table and the right table. The most common type of join is an INNER JOIN. An inner join will combine only the rows from the left table that have a match in the right table. Syntax: SELECT column_name FROM left_table INNER JOIN right_table ON left_table.column_name = right_table.column_name; Example:
SELECT name, email FROM heroes_table
INNER JOIN sidekick_table 
ON people_table.id = yourtable.id;
Enter fullscreen mode Exit fullscreen mode

In this example, we are using INNER JOIN to combine data from the heroes_table and sidekick_table. We are joining the tables using the id column.
Although inner joins are the most common type of join, there are other types of joins that you should be aware of.
LEFT OUTER JOIN: A left join will combine all of the rows from the left table, even if there is no match in the right table.
Syntax:
SELECT column_name(s) FROM left_table
LEFT OUTER JOIN right_table
ON left_table.column_name = right_table.column_name;
RIGHT OUTER JOIN: A right join will combine all of the rows from the right table, even if there is no match in the left table.
Syntax:
SELECT column_name(s) FROM left_table
RIGHT OUTER JOIN right_table ON left_table.column_name = right_table.column_name;
FULL OUTER JOIN: A full outer join will combine all of the rows from both tables, even if there is no match in either table.
Syntax:
SELECT column_name(s) FROM left_table
FULL OUTER JOIN right_table ON left_table.column_name = right_table.column_name;
Joins can be very useful when combining data from multiple tables into a single result set. However, it's important to note that joins can limit performance and should be used sparingly.

  1. CREATE INDEX and DROP INDEX An index is a data structure that can be used to improve the performance of SQL queries. Indexes can speed up the data retrieval from a table by allowing the database to quickly find the desired data without having to scan the entire table. Creating an index on a column is a relatively simple process. Syntax: CREATE INDEX index_name ON table_name (column_name); Example:
CREATE INDEX people ON sidekick_table (names);
Enter fullscreen mode Exit fullscreen mode

Once an index is created, the database can use it to speed up the execution of SQL queries. Indexes are an important tool for database administrators to know about, and they can be handy for improving the performance of SQL queries.
Syntax:
DROP INDEX index_name ON table_name;
Example:

DROP INDEX people ON sidekick_table;
Enter fullscreen mode Exit fullscreen mode

Once an index is dropped, it can no longer be used by the database to speed up SQL query execution.

  1. GRANT and REVOKE The GRANT and REVOKE commands manage permissions in a database. The GRANT command gives a user permission to perform an action, such as creating a table or inserting data into a table. Syntax: GRANT permission_type ON object_name TO user; Example: GRANT CREATE TABLE ON important_database TO bob; The REVOKE command removes a user's permission to perform actions. Syntax: REVOKE permission_type ON object_name FROM user; Example: REVOKE CREATE TABLE ON important_database FROM bob; Managing permissions in a database is an important task for database administrators. The GRANT and REVOKE commands are two of the most important commands for managing permissions.
  2. LIKE The LIKE operator is used to search for data that matches a specific value. Syntax: SELECT column_name(s) FROM table_name WHERE column_name LIKE pattern; Example:
SELECT first_name 
FROM team_roster 
WHERE first_name LIKE '%a';
Enter fullscreen mode Exit fullscreen mode

In the example above, the query would return all of the records from the team_roster table where the first_name column contains a value that ends with the letter a.
Placing the modulo % after the letter a would return all of the records where the first_name column contains a value that starts with the letter a.
Putting a modulo % before and after the letter "a" would return all of the records where the first_name column contains a value that contains the letter "a".
Thank you for the read I hope you learnt something. Why did the SQL developer refuse to go skydiving? They didn't want to jump without a WHERE clause!

💖 💪 🙅 🚩
jadieljade
Jadieljade

Posted on March 14, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related