Mohamed El Laithy
Posted on October 25, 2024
Since its creation 50 years ago, SQL has become the primary language used in the database industry. Its clear syntax and strong community support have made it easier for people to access data. However, SQL does have some shortcomings:
- Fixed Structure: A query must follow a specific order (such as SELECT … FROM … WHERE … GROUP BY…). Deviating from this order can complicate things and may require subqueries or intricate patterns.
- Inverted Data Flow: Queries often begin in the middle, forcing users to build their logic outward from the FROM clauses found in subqueries or common table expressions (CTEs).
- Long Syntax: It can be frustrating to repeatedly list the same columns in SELECT, GROUP BY, and ORDER BY clauses, as well as in multiple subqueries.
These challenges can make SQL difficult for beginners. Even experienced users may find that reading or writing SQL takes more effort than necessary. A more streamlined syntax could benefit everyone.
Hence the question Is it possible to add something to SQL to give it the same power and flexibility, while still keeping everything that is familiar and works well in SQL?The answer is yes!
Introducing SQL pipe syntax
1. What is pipe syntax:
In simple terms, pipe syntax is an enhancement to standard SQL that makes it easier, more concise, and more flexible. It uses the same basic operators as standard SQL, maintaining the same meanings and mostly the same structure, but allows for operators to be applied in any order and as many times as needed.
2. How it works:
- Queries can start with FROM.
- Operators are written sequentially, using the |> pipe symbol.
- Each operator consumes its input table and produces an output table.
- Most pipe operators use the same syntax as standard SQL:
- SELECT, WHERE, JOIN, ORDER BY, LIMIT, etc.
- Standard and pipe syntax can be combined freely within the same query.
Example: Suppose you want to know the average number of taxi trips per year in Chicago by payment_type using a BigQuery public dataset. Here's how you might write that in standard and pipe syntax.
Using the standard syntax, you typically need to write a subquery:
-- Standard Syntax
SELECT AVG(num_trips) AS avg_trips_per_year, payment_type
FROM
(
SELECT EXTRACT(YEAR FROM trip_start_timestamp) as year, payment_type, COUNT(*) AS num_trips
FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
GROUP BY year, payment_type
)
GROUP BY payment_type
ORDER BY payment_type;
Here’s that same query using pipe syntax — no subquery needed!
-- Pipe Syntax
FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
|> EXTEND EXTRACT(YEAR FROM trip_start_timestamp) AS year
|> AGGREGATE COUNT(*) AS num_trips
GROUP BY year, payment_type
|> AGGREGATE AVG(num_trips) AS avg_trips_per_year GROUP BY payment_type ASC;
With pipe syntax, the query is simpler and more straightforward!
How it works:
Aligning Syntactic and Semantic Evaluation in SQL with Pipe Syntax
In traditional SQL, the syntactic clause order often diverges from the semantic evaluation order, leading to confusion for users. This misalignment complicates the process of understanding and debugging queries. In contrast, pipe syntax ensures that the order of operators matches the logical flow of data processing.
Figure 1: SQL syntactic clause order doesn’t match semantic
evaluation order.
Figure 2: In pipe syntax, operator order matches semantic
evaluation order. (Execution order will still be optimized.)
By adopting pipe syntax, users benefit from clearer query structures that enhance readability and ease of understanding, ultimately simplifying complex data transformations.
Advantages of adding pipe syntax inside SQL
The introduction of pipe syntax in SQL offers several benefits for developers. Here are some key advantages:
1. Easy to Learn
Learning new programming languages can be tough, especially in organizations with many users. Pipe syntax is not a separate language but a feature within SQL. For those who already know SQL, picking up pipe syntax is straightforward since it uses the same operators and similar structure. New users can also find it easier to start with pipe syntax, as it allows them to write queries more directly without getting bogged down by some of the complexities of standard SQL.
2. Incremental Adoption Without Migration
Migrating to a new language or system can be costly and time-consuming. Since pipe syntax is part of GoogleSQL, there's no need for a full migration to start using it. Developers can adopt the new syntax gradually, applying it wherever it’s useful, while existing queries remain unaffected. Pipe syntax can seamlessly interact with standard SQL code, making it easy to mix the two.
3. No Performance or Cost Issues
Pipe syntax operates on existing platforms like BigQuery without adding extra layers that could slow things down or increase costs. There are no hidden fees, and queries using pipe syntax maintain SQL's efficient performance. The SQL query optimizer treats both standard and pipe syntax queries similarly, ensuring they typically run at the same speed.
In summary, pipe syntax makes SQL easier to learn, allows gradual adoption, and doesn’t compromise performance or cost.
What can you use pipe syntax for?
Pipe syntax in SQL helps you write clearer, more efficient, and maintainable queries, whether you're exploring data, creating dashboards, building data pipelines, or analyzing logs. Here are some ways you can use it:
1. Ad-hoc Analysis and Query Debugging
When exploring data, you usually start by looking at the rows of a table. With pipe syntax, you can easily build your query step by step. You can start with a FROM clause, then add filters or aggregations and see the results at each step. This makes it simple to check what’s working and what isn’t. If you need to debug, you can run any part of the query up to a pipe symbol to see the intermediate results.
2. Data Engineering Lifecycle
As data volumes grow, processing and transforming data can get tricky. Pipe syntax simplifies this process. Instead of dealing with complex nested queries, you can use a more straightforward, linear structure. This makes it easier to create and maintain data pipelines. For example, you can speed up log data processing in BigQuery and Cloud Logging with pipe syntax.
3. Using LLMs and Natural Language with SQL
Research shows that SQL can be hard for large language models (LLMs) to understand. Pipe syntax breaks queries into clear steps, making it easier for LLMs to generate and read queries. This results in simpler, more readable queries that are easier for humans to validate.
Additionally, pipe syntax allows for better code suggestions and auto-completion because it’s easier to see what’s happening with each part of the query. This means you can get smarter suggestions for changes, helping you work more efficiently.
Overall, pipe syntax makes SQL more user-friendly and improves productivity for everyone.
Try to write pipe syntax at SQL examples
- Sign-up form to try out pipe syntax in BigQuery
Resources
- Pipe syntax documentation and detailed reference guide
- Demo video
- Blog post on using pipe syntax for log data analysis
- Research publication (VLDB 2024) with more background and language design details
- Open-source GoogleSQL implementation, ZetaSQL
Posted on October 25, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.