Data Structures for Complete Beginners

What are data structures

Data structure are used to store, organize, manage, retrieve data and operations or methods that can be applied to data. And there are many other data structures which you can check in here.

What you are going to learn

Linear data structures

Array, Singly Linked List, Doubly Linked List, Stacks, Queues,

Non-linear data structures

Binary Search Trees, Binary Heaps, Priority Queues, Hash Tables

And Traversing tree with

Breadth First Search
Depth First Search

Array

What are Arrays:

Arrays are linear data structure that contains same data type and it has fixed length of something like (4.29 billion elements)

Array is used to implement many other data structures such as:

Hash Tables, Binary Heaps, Priority Queues

We can do followings with Array

Delete/insert data, update data, search data, sort data

Arrays are also used by sorting and searching algorithms such as:

Sorting algorithms: Bubble sort, Selection, sort, Insertion sort, Merge sort, Quick sort and more
Searching algorithms: Linear search, Binary Search and more

Singly Linked List

What are Singly Linked Lists

Singly Linked List is linear data structure that contains same data types in which data is inside so-called nodes and each node in Singly Linked List connected with the node next to it via pointers.
List contains also head and tail node to know where is the beginning of the list and where is the ending of the list

What are the advantages of SLL over Arrays

If you working in huge data sets like millions of items and if you delete and add a lot in the beginning, the middle or before last item all items are reindexed which takes a lot of time . So if this is the case then you should use SLL.
And also as SLL uses pointers to connect nodes, it uses any available space in memory unlike an array which uses fixed space in memory
So in short use it when you need insertion and deletion before last item

What are the disadvantages of SLL over an Array

SLL loses constant random access, as they ain’t index based
SLL use twice more memory. You might consider using an array over SLL if memory is the problem. So you sacrifice speed over memory

Big O of SLL

Adding and removing from the beginning is: O(1)
Adding to the end is: O(1), Removing from the end is: O(n)
Searching: Best Case O(1), Worst Case O(n)

Double Singly Linked List

What is Doubly Linked List

Doubly Linked List is very similar to Singly Linked List but it has extra pointer which points to a previous node

What are the advantages of DLL over SLL

You can loop both forwards and backwards because of extra pointer which points to a previous node
You can remove a tail node on constant time without setting up a loop because of a previous pointer

What are the disadvantages of DLL over SLL

DLL uses even more memory than SLL because of using extra pointer

Big O of DLL

Adding and removing from the beginning is: O(1)
Adding and removing from the end is : O(1)
Searching: Best Case O(1), Worst Case O(n)

Stack

What is Stack

Stack is a linear data structure who follows a single rule LIFO(Last-In-First-Out).

Analogy

You can imagine collections of plates that stacked up on top of one anohter and when you need to remove the plate you definitely remove from very top (pop) and the first plate that's put as is at the very bottom you remove you it last. So we have many examples of stack in our real life in which we also follow LIFO rule

Use Cases

We know stack as call-stack in programming languages where it stores invoked functions. And it has two main operations
- Push - puts function on the top of the call-stack
- Pop - removes function from the top of the call-stack

How to implement stack

We can implement stack with both arrays and SLL/DLL

Big O of stack

Adding and removing O(1)

Queue

What is Queue

Queue is a linear data structure who follows a single rule FIFO(Fist-In-First-Out). And it has two main operations
- Enqueue - stores element in the list
- Dequeue - removes element from the list

Analogy

Queues are the same as real life queues (lines). The first person on the line is served first and the last person in the line is served last

How to implement stack

We can implement stack with array but it’s not good candidate as we have to push from the end and remove from the beginning
Better option is either SLL/DLL depending on the condition.

Big O of queue

Adding and removing O(1)

Binary Search Trees

What is Binary Search Tree

BSTS is non-linear data structure which consists of parent/child relationships and it is family of tree data structure but in which there can be only two child nodes for each parent node

Analogy

You can visualize binary search tree as normal tree where it has main branch (root node) and from each branch only two branches (child nodes) can develop

Terminologies to BSTS

Root - the starting point of the tree and there can be only single root node on each BSTS
Parent - is the node which has at maximum two child nodes
Child - is the node which is of course child node of its parent node
Leaf - is the child node which does not have its own child nodes
Edge - is the pointer, connection between nodes

Rules to BSTS

Nodes bigger than the parent will be inserted on the right side and nodes smaller than the parent node will be inserted on the left side
Each parent can have maximum 2 child nodes
Child nodes can't point to its parent node
Sibling nodes can’t point to each other
No duplicate nodes are allowed

Big O of BST

Insertion - Average Case is O(log n); Worst Case O(n) when linear structure is created
Searching- Average Case os O(log n); Worst Case O(n) when linear structure is created

Tree Traversal

What is Tree Traversal

Tree Traversal is the process of visiting(updating/reading) each node exactly once in a tree data structure and It has two algorithms to traverse tree
1. BFS(Breadth First Search) visits nodes on the same level(horizontally). And it uses queue to store visited nodes
2. DFS(Depth First Search) visits nodes on the vertical level and and it uses stack to store visited nodes and it has its own 3 algorithms to traversing tree
3. Pre-Order - visits root node first then its left subtree then finally its right subtree. It’s used to copy existing tree. Order of visits: Root -> Left subtree -> Right subtree
4. Post-Order - visits left subtree then right subtree and in the end root It’s used to delete tree as it visits root lastly. Order of visits: Left subtree -> Right subtree -> Root
5. In-Order - visits left subtree then root and then right subtree. If used in bst, and store visited nodes it’ll result in sorted data. Order of visits: Left subtree -> Root -> Right subtree

Binary Heaps

What is Binary Heap

BH is a data structure which is family of tree data structures and has few similarities with BSTS like in BH each parent can have maximum two child nodes
BH is very compact and does not create linear structure tree like BSTS because in BH always left child is inserted first then its right child

How to implement BH

It can be implemented with sll/dll but it’s very easy and most often implemented with built-in data structure which is arrays and we use math formula to find parent/child nodes in array when inserting and removing nodes and they are as following
Finding left and right child nodes of current node
- For any n index in the array it’s left child is stored at the index of Math.floor(2 * current index + 1) and its right child is stored at the index of Math.floor(2 * current index + 2)
- Finding parent node of current node
- For any n index in the array its parent node is stored ath the index of Math.floor(current index - 1 / 2)

Max and Binary Heaps

In MaxBinaryHeap each parent node’s value must be greater than the value of its child nodes and the root node has the biggest value in the heap
In MinBinaryHeap each parent node’s value must be smaller than the value of its child nodes and the root node has the smallest value in the heap

Binary Heap use cases

BH is used to implement another data structure which is Priority Queue
And it also used with Traversing Graph

Big O of Binary Heaps

Insertion is O(log n)
Removal is O(log n)
Searching is O(n)

### Priority Queue

What is Priority Queue

Priority Queue is a data structure in which each node has priority and nodes with higher priority is executed(dequeued, served) first then nodes with lower priority

How to implement Priority Queue

As we mentioned in BH section that we use BH to implement Priority Queue. It can also created using arrays, sll/dll but it would be really hard to implement PQ with them. So practical way is to use BH
But does not confuse it with queue which follows FIFO but in PQ it does not matter if node is enqueued first but if that node has lower priority than last added which has higher priority, node with higher priority is dequeued first

Hash Tables

What is Hash Table

Hash Table is a data structure which consists of key-value pairs and it’s built-in in many programming languages in JavaScript objects and maps, in Python dictionaries and more.

How to implement Hash Tables

We implement hash tables with array
As arrays are index based, to find value of the key based on key, we have to convert the key into valid index and it’s done with hash function
Hash function just accepts the key and converts it into the same valid index all the time. And this process of generating index from the key is known as hashing or hash code

Hash Functions

Hash functions are used in cryptography, bitcoin, hash tables
Hash functions accepts key and generates the same index all the time given the same key
Prime numbers are used to help hash function to generate different indexes as much as possible which result in key-value pairs being distributed evenly in the array
Hash functions are very fast
Optimally length of the array also should be prime number

Hash collisions
Hash collision is when the same index is generated by hash function from two or more keys. And this can be solved in two ways

Linear probing - in which we do check if there is key-value pair in current index, if so move to next position. So in short in Linear probing there will be the same amount of key-value pairs as the length of the array
Separate chaining - in it we use another data structure like array/sll/dll to house key-value pairs that have the same index

Big O of Hash Tables
Average Case: Insertion O(1); Removing O(1); Accessing O(1)
When a lot of key-value pairs is inserted in the same index O(n)

SUMMARY:

You have learned most used data structures which you can implement in your next project

You can practise your new learned knowledge in coding challenge websites like Codewars(recommended for beginners), Hacker Rank and Leetcode