Introduction to Data Structures and Algorithms with Python
MMK2020
Posted on February 20, 2022
What are data structures?
A data structure is a method for storing and organizing data in a virtual system. It is used to denote a particular way of organizing data for particular types of operation such as modification, navigation, and access of information.
There are numerous data structures ranging from arrays and lists to more complex structures such as trees, heaps and graphs.
Data structures help to:
- Manage and utilize large datasets
- Quickly search for particular data from a database
- Build clear hierarchical or relational connections between data points
- Simplify and speed up data processing
Each data structure has a task or situation it is most suited to solve.
*Widely used basic data structures: *
- Arrays
- Linked lists
- Stacks
- Queues
- Hash tables
- Trees
- Graphs
What is an algorithm?
An algorithm for a particular task is a finite sequence of instructions, each of which has a clear meaning and can be performed with a finite amount of effort in a finite length of time.
Some common categories of algorithms are:
- Searching
- Sorting
- Graph/tree traversing
- Dynamic programming
- Hashing and regex (string pattern matching)
Why data structures and algorithms
When solving real-world coding problems, it is required that runtime and resource efficiency are achieved. Knowing the data structure and algorithm which best fits the current solution will increase program performance and reduce time required to make it.
Proper understanding of data structures and algorithms ensures well-optimized and efficient code.
The efficiency or performance (complexity) of an algorithm relates to the resources required by it, such as how quickly it will run (time complexity), or how much computer memory (space complexity) it will use.
The ability to formulate an efficient algorithm depends on being able to organize the data in an appropriate manner. It is for this reason coding some coding interviews ask for demonstration of comprehension of data structures and algorithms.
Python Data Structures
There are four main types of data structures (Lists, Tuples, Sets, and Dictionaries) that come by default in Python and don’t need to be imported from user libraries.
There are four collection data types in the Python programming language:
- List is a collection which is ordered and changeable. Allows duplicate members.
- Tuple is a collection which is ordered and unchangeable. Allows duplicate members.
- Set is a collection which is unordered and unindexed. No duplicate members.
- Dictionary is a collection which is ordered* and changeable. No duplicate members. (*for python 3.7 and above)
Arrays in Python
An array is a collection of similar items which are indexed.
Python does not have a built in array type, but an array can be implemented using the inbuilt Python List.
An array is a collection of values of the same type saved under the same name. Each value in the array is called an “element” and indexing that represents its position. You can access specific elements by calling the array name with the desired element’s index. You can also get the length of an array using the len()
method.
Queues in Python
Queues are a linear data structure that store data in a “first in, first out” (FIFO) order. Unlike arrays, you cannot access elements by index and instead can only pull the next oldest element. This makes it great for order-sensitive tasks like online order processing or voicemail storage.
In programming terms, putting items in the queue is called enqueue, and removing items from the queue is called dequeue.
The append()
and pop()
methods can be used to implement a queue. However, this is inefficient because lists must shift all elements by one index whenever you add a new element to the beginning.
Instead, use the deque
class from Python’s collections module. Deques are optimized for the append and pop operations. The deque implementation also allows you to create double-ended queues, which can access both sides of the queue through the popleft() and popright() methods.
from collections import deque
# Initializing a queue
q = deque()
# Adding elements to a queue
q.append('a')
q.append('b')
q.append('c')
print("Initial queue")
print(q)
# Removing elements from a queue
print("\nElements dequeued from the queue")
print(q.popleft())
print(q.popleft())
print(q.popleft())
print("\nQueue after removing elements")
print(q)
# Uncommenting q.popleft()
# will raise an IndexError
# as queue is now empty
Stacks in Python
Stacks are a sequential data structure that act as the Last-in, First-out (LIFO) version of queues. The last element inserted in a stack is considered at the top of the stack and is the only accessible element. To access a middle element, you must first remove enough elements to make the desired element the top of the stack.
Adding elements is known as a push, and removing elements is known as a pop. You can implement stacks in Python using the built-in list structure. With list implementation, push operations use the append()
method, and pop operations use pop()
.
stack = []
# append() function to push
# element in the stack
stack.append('a')
stack.append('b')
stack.append('c')
print('Initial stack')
print(stack)
# pop() function to pop
# element from stack in
# LIFO order
print('\nElements popped from stack:')
print(stack.pop())
print(stack.pop())
print(stack.pop())
print('\nStack after elements are popped:')
print(stack)
# uncommenting print(stack.pop())
# will cause an IndexError
# as the stack is now empty
Linked lists in Python
Linked lists are a sequential collection of data that uses relational pointers on each data node to link to the next node in the list.
Unlike arrays, linked lists do not have objective positions in the list. Instead, they have relational positions based on their surrounding nodes.
The first node in a linked list is called the head node, and the final is called the tail node, which has a null pointer.
Linked lists can be singly or doubly linked depending if each node has just a single pointer to the next node or if it also has a second pointer to the previous node.
You can think of linked lists like a chain; individual links only have a connection to their immediate neighbours but all the links together form a larger structure.
Python does not have a built-in implementation of linked lists and therefore requires that you implement a Node class to hold a data value and one or more pointers.
class Node:
def __init__(self, dataval=None):
self.dataval = dataval
self.nextval = None
class SLinkedList:
def __init__(self):
self.headval = None
list1 = SLinkedList()
list1.headval = Node("Mon")
e2 = Node("Tue")
e3 = Node("Wed")
# Link first Node to second node
list1.headval.nextval = e2
# Link second Node to third node
e2.nextval = e3
Linked lists are primarily used to create advanced data structures like graphs and trees or for tasks that require frequent addition/deletion of elements across the structure.
Circular linked lists in Python
The primary downside of the standard linked list is that you always have to start at the Head node.
The circular linked list fixes this problem by replacing the Tail node’s null pointer with a pointer back to the Head node. When traversing, the program will follow pointers until it reaches the node it started on.
The advantage of this setup is that you can start at any node and traverse the whole list. It also allows you to use linked lists as a loopable structure by setting a desired number of cycles through the structure.
Circular linked lists are great for processes that loop for a long time like CPU allocation in operating systems.
Trees in Python
Trees are another relation-based data structure, which specialize in representing hierarchical structures. Used to implement top searching and sorting algorithms like binary search trees and binary heap.
Like a linked list, they’re populated with Node objects that contain a data value and one or more pointers to define its relation to immediate nodes.
Each tree has a root node that all other nodes branch off from. The root contains pointers to all elements directly below it, which are known as its child nodes. These child nodes can then have child nodes of their own. Binary trees cannot have nodes with more than two child nodes.
Any nodes on the same level are called sibling nodes. Nodes with no connected child nodes are known as leaf nodes.
The most common application of the binary tree is a binary search tree. Binary search trees excel at searching large collections of data, as the time complexity depends on the depth of the tree rather than the number of nodes.
Binary search trees have four strict rules:
- The left subtree contains only nodes with elements lesser than the root.
- The right subtree contains only nodes with elements greater than the root.
- Left and right subtrees must also be a binary search tree. They must follow the above rules with the “root” of their tree.
- There can be no duplicate nodes, i.e. no two nodes can have the same value.
Graphs in Python
Graphs are a data structure used to represent a visual of relationships between data vertices (the Nodes of a graph). The links that connect vertices together are called edges.
Edges define which vertices are connected but does not indicate a direction flow between them. Each vertex has connections to other vertices which are saved on the vertex as a comma-separated list.
Graphs are excellent for modelling networks or web-like structures. Can also be used to model social network sites like Facebook
There are also special graphs called directed graphs that define a direction of the relationship, similar to a linked list. Directed graphs are helpful when modelling one-way relationships or a flowchart-like structure.
They’re primarily used to convey visual web-structure networks in code form. These structures can model many different types of relationships like hierarchies, branching structures, or simply be an unordered relational web. The versatility and intuitiveness of graphs makes them a favourite for data science.
When written in plain text, graphs have a list of vertices (nodes) and edges:
V = {a, b, c, d, e}
E = {ab, ac, bd, cd, de}
In Python, graphs are best implemented using a dictionary with the name of each vertex as a key and the edges list as the values.
Create the dictionary with graph elements
graph = { "a" : ["b","c"],
"b" : ["a", "d"],
"c" : ["a", "d"],
"d" : ["e"],
"e" : ["d"]
}
# Print the graph
print(graph)
Hash tables in Python
Hash tables are a complex data structure capable of storing large amounts of information and retrieving specific elements efficiently.
This data structure uses key/value pairs, where the key is the name of the desired element and the value is the data stored under that name.
Each input key goes through a hash function that converts it from its starting form into an integer value, called a hash. Hash functions must always produce the same hash from the same input, must compute quickly, and produce fixed-length values. Python includes a built-in hash() function that speeds up implementation.
The table then uses the hash to find the general location of the desired value, called a storage bucket. The program then only has to search this subgroup for the desired value rather than the entire data pool.
Beyond this general framework, hash tables can be very different depending on the application. Some may allow keys from different data types, while some may have differently setup buckets or different hash functions.
Here is an example of a hash table in Python code:
import pprint
class Hashtable:
def __init__(self, elements):
self.bucket_size = len(elements)
self.buckets = [[] for i in range(self.bucket_size)]
self._assign_buckets(elements)
def _assign_buckets(self, elements):
for key, value in elements: #calculates the hash of each key
hashed_value = hash(key)
index = hashed_value % self.bucket_size # positions the element in the bucket using hash
self.buckets[index].append((key, value)) #adds a tuple in the bucket
def get_value(self, input_key):
hashed_value = hash(input_key)
index = hashed_value % self.bucket_size
bucket = self.buckets[index]
for key, value in bucket:
if key == input_key:
return(value)
return None
def __str__(self):
return pprint.pformat(self.buckets) # pformat returns a printable representation of the object
if __name__ == "__main__":
capitals = [
('France', 'Paris'),
('United States', 'Washington D.C.'),
('Italy', 'Rome'),
('Canada', 'Ottawa')
]
hashtable = Hashtable(capitals)
print(hashtable)
print(f"The capital of Italy is {hashtable.get_value('Italy')}")
References:
- https://www.springboard.com/library/software-engineering/data-structures-and-algorithms-in-python/
- https://www.springboard.com/library/software-engineering/data-structures-and-algorithms/#:~:text=A%20data%20structure%20is%20a,it%20into%20a%20target%20output.
- https://www.educative.io/blog/8-python-data-structures
Posted on February 20, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.