Data Structures for Complete Beginners

johongirr

Jaxongir

Posted on December 20, 2020

Data Structures for Complete Beginners

What are data structures

Data structure are used to store, organize, manage, retrieve data and operations or methods that can be applied to data. And there are many other data structures which you can check in here.

What you are going to learn

Linear data structures

  • Array, Singly Linked List, Doubly Linked List, Stacks, Queues,

Non-linear data structures

  • Binary Search Trees, Binary Heaps, Priority Queues, Hash Tables

And Traversing tree with

  • Breadth First Search
  • Depth First Search

Array

Alt text

What are Arrays:

  • Arrays are linear data structure that contains same data type and it has fixed length of something like (4.29 billion elements)

Array is used to implement many other data structures such as:

  • Hash Tables, Binary Heaps, Priority Queues

We can do followings with Array

  • Delete/insert data, update data, search data, sort data

Arrays are also used by sorting and searching algorithms such as:

  • Sorting algorithms: Bubble sort, Selection, sort, Insertion sort, Merge sort, Quick sort and more
  • Searching algorithms: Linear search, Binary Search and more

Singly Linked List

Alt text

What are Singly Linked Lists

  • Singly Linked List is linear data structure that contains same data types in which data is inside so-called nodes and each node in Singly Linked List connected with the node next to it via pointers.
  • List contains also head and tail node to know where is the beginning of the list and where is the ending of the list

What are the advantages of SLL over Arrays

  • If you working in huge data sets like millions of items and if you delete and add a lot in the beginning, the middle or before last item all items are reindexed which takes a lot of time . So if this is the case then you should use SLL.
  • And also as SLL uses pointers to connect nodes, it uses any available space in memory unlike an array which uses fixed space in memory
  • So in short use it when you need insertion and deletion before last item

What are the disadvantages of SLL over an Array

  • SLL loses constant random access, as they ain’t index based
  • SLL use twice more memory. You might consider using an array over SLL if memory is the problem. So you sacrifice speed over memory

Big O of SLL

  • Adding and removing from the beginning is: O(1)
  • Adding to the end is: O(1), Removing from the end is: O(n)
  • Searching: Best Case O(1), Worst Case O(n)

Double Singly Linked List

Alt text

What is Doubly Linked List

  • Doubly Linked List is very similar to Singly Linked List but it has extra pointer which points to a previous node

What are the advantages of DLL over SLL

  • You can loop both forwards and backwards because of extra pointer which points to a previous node
  • You can remove a tail node on constant time without setting up a loop because of a previous pointer

What are the disadvantages of DLL over SLL

  • DLL uses even more memory than SLL because of using extra pointer

Big O of DLL

  • Adding and removing from the beginning is: O(1)
  • Adding and removing from the end is : O(1)
  • Searching: Best Case O(1), Worst Case O(n)

Stack

Alt text

What is Stack

  • Stack is a linear data structure who follows a single rule LIFO(Last-In-First-Out).

Analogy

  • You can imagine collections of plates that stacked up on top of one anohter and when you need to remove the plate you definitely remove from very top (pop) and the first plate that's put as is at the very bottom you remove you it last. So we have many examples of stack in our real life in which we also follow LIFO rule

Use Cases

  • We know stack as call-stack in programming languages where it stores invoked functions. And it has two main operations
    • Push - puts function on the top of the call-stack
    • Pop - removes function from the top of the call-stack

How to implement stack

  • We can implement stack with both arrays and SLL/DLL

Big O of stack

  • Adding and removing O(1)

Queue

Alt text

What is Queue

  • Queue is a linear data structure who follows a single rule FIFO(Fist-In-First-Out). And it has two main operations
    • Enqueue - stores element in the list
    • Dequeue - removes element from the list

Analogy

  • Queues are the same as real life queues (lines). The first person on the line is served first and the last person in the line is served last

How to implement stack

  • We can implement stack with array but it’s not good candidate as we have to push from the end and remove from the beginning
  • Better option is either SLL/DLL depending on the condition.

Big O of queue

  • Adding and removing O(1)

Binary Search Trees

Alt text

What is Binary Search Tree

  • BSTS is non-linear data structure which consists of parent/child relationships and it is family of tree data structure but in which there can be only two child nodes for each parent node

Analogy

  • You can visualize binary search tree as normal tree where it has main branch (root node) and from each branch only two branches (child nodes) can develop

Terminologies to BSTS

  • Root - the starting point of the tree and there can be only single root node on each BSTS
  • Parent - is the node which has at maximum two child nodes
  • Child - is the node which is of course child node of its parent node
  • Leaf - is the child node which does not have its own child nodes
  • Edge - is the pointer, connection between nodes

Rules to BSTS

  • Nodes bigger than the parent will be inserted on the right side and nodes smaller than the parent node will be inserted on the left side
  • Each parent can have maximum 2 child nodes
  • Child nodes can't point to its parent node
  • Sibling nodes can’t point to each other
  • No duplicate nodes are allowed

Big O of BST

  • Insertion - Average Case is O(log n); Worst Case O(n) when linear structure is created
  • Searching- Average Case os O(log n); Worst Case O(n) when linear structure is created

Tree Traversal

What is Tree Traversal

  • Tree Traversal is the process of visiting(updating/reading) each node exactly once in a tree data structure and It has two algorithms to traverse tree
    1. BFS(Breadth First Search) visits nodes on the same level(horizontally). And it uses queue to store visited nodes
    2. DFS(Depth First Search) visits nodes on the vertical level and and it uses stack to store visited nodes and it has its own 3 algorithms to traversing tree
    3. Pre-Order - visits root node first then its left subtree then finally its right subtree. It’s used to copy existing tree. Order of visits: Root -> Left subtree -> Right subtree
    4. Post-Order - visits left subtree then right subtree and in the end root It’s used to delete tree as it visits root lastly. Order of visits: Left subtree -> Right subtree -> Root
    5. In-Order - visits left subtree then root and then right subtree. If used in bst, and store visited nodes it’ll result in sorted data. Order of visits: Left subtree -> Root -> Right subtree

Binary Heaps

Alt text

What is Binary Heap

  • BH is a data structure which is family of tree data structures and has few similarities with BSTS like in BH each parent can have maximum two child nodes
  • BH is very compact and does not create linear structure tree like BSTS because in BH always left child is inserted first then its right child

How to implement BH

  • It can be implemented with sll/dll but it’s very easy and most often implemented with built-in data structure which is arrays and we use math formula to find parent/child nodes in array when inserting and removing nodes and they are as following
  • Finding left and right child nodes of current node
    • For any n index in the array it’s left child is stored at the index of Math.floor(2 * current index + 1) and its right child is stored at the index of Math.floor(2 * current index + 2)
    • Finding parent node of current node
    • For any n index in the array its parent node is stored ath the index of Math.floor(current index - 1 / 2)

Max and Binary Heaps

  • In MaxBinaryHeap each parent node’s value must be greater than the value of its child nodes and the root node has the biggest value in the heap
  • In MinBinaryHeap each parent node’s value must be smaller than the value of its child nodes and the root node has the smallest value in the heap

Binary Heap use cases

  • BH is used to implement another data structure which is Priority Queue
  • And it also used with Traversing Graph

Big O of Binary Heaps

  • Insertion is O(log n)
  • Removal is O(log n)
  • Searching is O(n)

### Priority Queue

Alt text

What is Priority Queue

  • Priority Queue is a data structure in which each node has priority and nodes with higher priority is executed(dequeued, served) first then nodes with lower priority

How to implement Priority Queue

  • As we mentioned in BH section that we use BH to implement Priority Queue. It can also created using arrays, sll/dll but it would be really hard to implement PQ with them. So practical way is to use BH
  • But does not confuse it with queue which follows FIFO but in PQ it does not matter if node is enqueued first but if that node has lower priority than last added which has higher priority, node with higher priority is dequeued first

Hash Tables

Alt text

What is Hash Table

  • Hash Table is a data structure which consists of key-value pairs and it’s built-in in many programming languages in JavaScript objects and maps, in Python dictionaries and more.

How to implement Hash Tables

  • We implement hash tables with array
  • As arrays are index based, to find value of the key based on key, we have to convert the key into valid index and it’s done with hash function
  • Hash function just accepts the key and converts it into the same valid index all the time. And this process of generating index from the key is known as hashing or hash code

Hash Functions

  • Hash functions are used in cryptography, bitcoin, hash tables
  • Hash functions accepts key and generates the same index all the time given the same key
  • Prime numbers are used to help hash function to generate different indexes as much as possible which result in key-value pairs being distributed evenly in the array
  • Hash functions are very fast
  • Optimally length of the array also should be prime number

Hash collisions
Hash collision is when the same index is generated by hash function from two or more keys. And this can be solved in two ways

  1. Linear probing - in which we do check if there is key-value pair in current index, if so move to next position. So in short in Linear probing there will be the same amount of key-value pairs as the length of the array
  2. Separate chaining - in it we use another data structure like array/sll/dll to house key-value pairs that have the same index

Big O of Hash Tables
Average Case: Insertion O(1); Removing O(1); Accessing O(1)
When a lot of key-value pairs is inserted in the same index O(n)

SUMMARY:

You have learned most used data structures which you can implement in your next project

You can practise your new learned knowledge in coding challenge websites like Codewars(recommended for beginners), Hacker Rank and Leetcode

💖 💪 🙅 🚩
johongirr
Jaxongir

Posted on December 20, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related