How sql engine handle join internally-part 1

amitiwary999

Amit Tiwary

Posted on December 3, 2022

How sql engine handle join internally-part 1

There are different types of logical join in SQL. Inner join and outer join are mainly used join. When we need to get the data from tables we use the join. But how do these joins work? Physical join is implemented inside RDMS. The user uses a logical join to write the query and a physical join is used by RDMS to do join operations. Part 2 of this blog post is here.

There are three different physical joins Nested loop join, Hash Join and merge join. Depending on the data in the joined tables, RDMS use any one physical join.

Nested loop joins: In nested loop join, out of the two tables, the table with a smaller number of records is selected, and it will loop through the second table until matches are found. This is available in MYSQL, Postgres and even in SQL servers too. This is not a scalable option for a large table. It is used mostly when the join operator doesn’t use equality. MySQL has only this option in versions lower than MYSQL 8.
For example lets say we need to get the students that height is smaller then the teacher.



SELECT 
    s.student_id, 
    t.teacher_id, 
    s.student_name, 
    t.teacher_name 
FROM 
    student s, teacher t 
WHERE 
    s.height < t.height


Enter fullscreen mode Exit fullscreen mode

Hash Join: Hash join is a way of executing a join where the hash table is used to find the match record. A Hash table creates in the memory. If the data is very large and memory is not sufficient to hold the data then it writes to disk. It is efficient compared to the Nested loop join. During the execution, RDMS build the in-memory hash table where rows from one of the join tables is stored using the join attributes as the key. Once it is done then the server starts reading rows from another table and finds the matching row from the hash table. This is used mostly when the join operators use equality.
Lets say we created two table. First table is student table and second table is class table. Each student belongs to one of the class. Now when we join these two table using the class id to get the student and details of class that this student belongs to, then sql server may use the hash join.



SELECT 
    s.student_Id ,
    s.student_name ,
    c.fund ,
    c.strength 
FROM 
    student AS s 
    JOIN
    class AS c ON 
        s.class_id = c.id;


Enter fullscreen mode Exit fullscreen mode

hash-table
When we do the join sql server create the hash table using the class table. 'class_id' is the key here. Now when sql go through student table, it get the class data from hash table using the class_id.

Merge Join: Merge join is used if the join condition uses an equality operator and both sides of the join are large. Merge join uses the sorted data inputs. So if there is an index on the expressions used in the join column then it is used to get the sorted data. But if the server is doing any operation to sort the data then please look at the indexes and better try to modify the indexes in order to achieve better results.

Read part 2 here.

💖 💪 🙅 🚩
amitiwary999
Amit Tiwary

Posted on December 3, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related