Internet Explorer
Posted on March 7, 2020
First of all before we dive in... What are trees? Obviously in computer science. A tree is just a data structure which is composed of nodes. Every tree has a root node. The root node has zero or more children nodes. Each child node has zero or more child nodes, and so on.
Okay, so what does this have to do with databases anyway?? Well i saw this in PgAdmin after running migrations some time ago.. Something similar to this:
CREATE INDEX nameIndex ON table USING btree (name);
Wait what?... What's btree?.. Well a quick google should tell you that, B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time.. Well again, another word here.. Logarithmic time..
lets visit the picture every programmer must have..
Its in the green so i think its good news right?..
A stackoverflow answer I liked for this was:
"O(log N) basically means time goes up linearly while the n goes up exponentially. So if it takes 1 second to compute 10 elements, it will take 2 seconds to compute 100 elements, 3 seconds to compute 1000 elements, and so on."
it is also known that logarithmic function is the opposite of an exponential function. When they say something grows exponentially, it is being multiplied. When something grows logarithmically, it is being divided. :)
Feel like Einstein yet?
Okay lets go on dear Einsteins.
So let's say you need to store a list of some numbers in some array and later you will want to search through it.. Damn me. I am going to use a list / array for that.. yeah so here we go..
var searchMeBoo = new List<int>() { 11,6,7,10,15, 17, 19 };
The list above contains seven items. So now i need to search for a particular number.. which sadly is at the end of the list which is 19. I will need to check the whole list till the very end before i get the match.We can denote this worst case as O(n) in asymptotic notation. This simply means if your list size is "n" at most, you need to do "n" number of searches to find a given value in an array.
Now lets be smart.. Let us improve the time needed to search this thing.
Lets make sure the data is going to be sorted. Whenever we put something in the List we have to make sure it keeps it order. If not call some .Sort() or sorted() on it in your language so we can get going. So now searching on such a sorted thing should begin in the middle. Then compare the selected value to the value being searched,
If the selected value is greater than search value, ignore the left side of the array and search the value on the right side and vice versa.
Now lets see it:
11,6,7,10,15, 17, 19 -> Sort -> 6,7,10,11,15, 17, 19
Here, we try to search number 19 from the List 6,7,10,11,15, 17,19 . If you do a normal search, then it will take seven units of time to search since the element is in the seventh position. But in the binary search, it will take only three searches, Because you will start your search in the middle which is 11..and compare with 19..which does not match ..so (404 Not found lol) Then since its greater than 11 you will then focus on the other half of the List, Then move to 17. Which 19 is still greater. Then FINALLY.. move to the last element which is 19. which is found lol..
We moved three times instead of seven times!!!
Well that was exciting.. Such an algorithm has a time complexity of O(Log n). Great lets move on.
Okay for some simple clarity, A B-tree is a balanced tree and not a binary tree. Now lets dive in. Take this image for example:
Remember nodes in the first picture in the post above? Well you could take a look again. Now over here, the branch node contains the biggest value in the leaf nodes, Lets take something like 57, which corresponds to the biggest value in 55,57,57 in the leaf node. Same applies to 83, 46 and 53. So in the algorithm a branch "layer" is made up of such values until all leaf nodes are covered by a branch node.
The next layer is built similarly, but on top of the first branch node level. The procedure repeats until all keys fit into a single node, the root node. The structure is a balanced search tree because the tree depth is equal at every position; the distance between root node and leaf nodes is the same everywhere.
Okay so now lets say we were gonna look for 57
Now lets traverse the tree to find that number. We are going to start at the root node on the left-hand. Now each value in there is processed in ascending order until a value is greater than or equal to the search term which is 57 in our case is found. In the figure it is the entry 83. The database follows the reference to the corresponding branch node which contains 57 which makes the (>=) comparison "True", it then moves down to get the data - simply.
Through this I have learnt that the B-tree enables the database to find a leaf node quickly :).
Anyway for deeper understanding please visit https://winand.at/ and grab the book “SQL Performance Explained” The author is the guy with all the knowledge. Markus Winand @MarkusWinand on twitter.
Posted on March 7, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.