Honeybadger Staff
Posted on April 5, 2024
This article was originally written by Michael Barasa on the Honeybadger Developer Blog.
Many developers focus on developing core application functionalities and pay little or no attention to memory management until they run out of memory, and their apps start crashing, freezing, or experiencing random performance downgrades.
Computers have limited RAM, and it’s always best to make effective use of allocated resources. Trying to run a high memory-consuming app on a low-spec computer could cause it to crash, negatively impacting the user experience. Furthermore, a high memory footprint may also affect the performance of other apps and background services. When running a high memory-consuming app on the cloud, where resources are measured and charged for their use, you will likely end up with an expensive bill.
A high memory footprint can lead to undesirable consequences. Keep reading to learn what memory management entails and discover tips on lowering your Python app's memory footprint.
What is memory management, and why is it important?
Memory management is a complex process involving freeing and allocating computer memory to different programs, ensuring that the system operates efficiently. For example, when you launch a program, the computer has to allocate enough memory, and when the application is closed, the system frees memory and allocates it to another program.
Memory management has numerous benefits. First, it ensures that applications have the required resources to operate. The computer allocates memory to active processes and releases memory from inactive programs, which indicates effective memory utilization.
Second, proper memory management contributes to system stability. Since the computer handles memory allocation automatically, applications will always have access to the required memory, which reduces issues, such as random crashes and shutdowns. Memory management techniques, such as garbage collection, can assist in preventing memory leaks.
Third, memory management leads to better performance optimization. By continuously releasing and allocating memory, applications always have access to resources, which means they can quickly launch and execute.
Each application has a memory footprint, which refers to the amount of memory it consumes. A high memory footprint indicates an app is using a lot of memory, while a low footprint means it has low consumption.
Although computer systems can manage memory automatically, as a developer, you still have to keep your app's memory footprint in check. Using memory-intensive functions and inefficient data structures could cause your software to run out of memory, freeze, and even crash.
In the following section, we’ll explain how to measure your app's memory consumption. Later, we’ll discuss tips for lowering your memory footprint.
How to measure memory usage in Python
You can use any of the following methods to measure the amount of memory your application is using.
The psutil library
psutil is a Python library for fetching useful information about system utilization and active processes. Among other uses, the psutil
library allows you to monitor memory, CPU, disk, and network usage.
To demonstrate how the psutil
library works, we will use the following Python program that checks whether an integer is a prime number.
number = int(input("Please enter a number: "))
if number == 1:
print(num, "is not a prime number")
elif number > 1:
# check for factors
for i in range(2,number):
if (number % i) == 0:
print(number,"is not a prime number")
print(i,"times",number//i,"is",number)
break
else:
print(number,"is a prime number")
# if the input number is less than or equal to 1, it is not prime
else:
print(number,"is not a prime number")
We can check the above program's memory footprint by importing the psutil
module and adding the following function in the code.
import psutil # import Python psutil module
def memory_usage():
process = psutil.Process()
usage = process.memory_info().rss
# Using memory_info() to check consumption
return usage # Returning the memory in bytes
When you include and run the above function, it will show that the program uses 25886KB
of memory.
Resource module
We can also use the resource module, specifically the getrusage()
function, to check the amount of memory a program is using. We use it as follows:
import resource
def memory_usage():
usage = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
return usage
The sys module
The sys module also has getsizeof([])
and getallocatedblocks()
methods, which allow you to check a program's memory footprint and the allocated number of memory blocks, respectively. The sys
library can provide valuable insight for debugging purposes.
Here is how you can use the sys
module in your code.
import sys
def memory_usage():
usage = sys.getsizeof([])
return usage
Third-party libraries
Apart from in-built functions, you can also utilize third-party libraries, such as memory_profiler, pympler, or objgraph, to measure an app's memory footprint.
Common causes of high memory usage
A large memory footprint can lead to undesirable consequences, including random freezes, crashes, and, ultimately, a bad user experience. We’ll cover the common causes of high memory usage in the following sections.
Memory leaks
The term memory leak refers to a situation where memory is allocated to a particular task but is not released upon completion of the process. This means that your application is not running efficiently. The amount of available memory is also reduced significantly.
Memory leaks can lead to performance downgrades. Apart from your application freezing or crashing, other background services may become inoperable. Furthermore, as more apps demand memory, the computer system may be forced to close down certain processes.
External dependencies
Although third-party libraries allow us to add numerous functionalities to our applications without creating everything from scratch, they may cause high memory consumption in an app.
For example, some libraries do not free up memory spaces when a task is completed or continuously run unnecessary background processes, which strains the available resources.
Large datasets
Python is a popular programming language for data analysis, machine learning, and artificial intelligence. Training AI algorithms require a considerable amount of data and memory. If you train an AI model on an underpowered CPU, it may crash or cause your computer to freeze.
Unoptimized code
Not using the garbage collector effectively, defining and storing too many objects in memory, and using the wrong datatypes could increase your app's memory footprint.
Tips to lower your app’s memory footprint
Now that we know the common causes of high memory usage and how to measure memory consumption, let's look at how to lower your app's memory footprint.
Use generators instead of lists
Although extremely useful, lists usually consume lots of memory, especially when they store many values. When the list is called, each value is loaded into memory and used by the application. Generators are like lists, but with one distinction; they support lazy loading. Thus, values stored in generators are retrieved only when needed.
Let's compare the memory consumption of lists and generators.
Here is a list that stores values between 0 and 999:
import sys
list = [i for i in range(1000)] # Stores values from 0 to 999
print(list) # We print values in the list
print(sum(list)) # We calculate the sum of the values in the list
print(f"The list consumes {sys.getsizeof(list)} bytes") # We check the amount of memory the list has taken.
When you run the above code, it shows that the list consumes about 920 bytes
of memory.
In the following code sample, we use a generator instead of a list.
import sys
generatorlist = (i for i in range(1000)) # Stores values from 0 to 999
print(generatorlist) # We print values in the list
print(sum(generatorlist)) # We calculate the sum of the values in the list
print(f"The generator consumes {sys.getsizeof(generatorlist)} bytes") # We check the amount of memory the list has taken.
When the above code is executed, the generator consumes only 104 bytes
of memory. Thus, generators are significantly more efficient than lists.
Read data in smaller chunks
As discussed, dealing with large datasets can be memory intensive. The computer has to allocate enough resources to process and store all file contents, meaning there is a chance of your application slowing down, freezing, or even crashing completely.
You can lower an app's memory footprint by reading data in smaller chunks compared to loading entire datasets in memory. This technique allows you to analyze data quickly without experiencing major performance issues.
For example, the following code is not memory efficient since we are loading our entire datasets ('employee.csv') into memory.
import pandas as pd
def readEmployeeData():
df = pd.read_csv('employees.csv')['FIRST_NAME']
print(df.value_counts())
We can save memory by defining a chunksize
, or the number of rows our program should read from the dataset in one go, as demonstrated below.
import pandas as pd
def readDataInChunks():
result = None
for chunk in pd.read_csv("employees.csv", chunksize=200): #Setting the chunksize to 200 rows
employees = chunk["FIRST_NAME"]
chunk_result = employees.value_counts()
if result is None:
result = chunk_result
else:
result = result.add(chunk_result, fill_value=0)
result.sort_values(ascending=False, inplace=True)
print(result)
readDataInChunks()
In the above code, we read and compute information from a smaller dataframe or chunk, which is more memory efficient. We save the results from the computation in a list and then proceed to analyze the next chunk of data, until we've analyzed the entire dataset.
Use memory-efficient dependencies
Before importing and using a third-party library in your project, research its key features and reliability. Ask questions, such as how much memory the library uses, and determine whether there are possible memory leaks. Being involved in online tech communities, such as Stack Overflow, can help you access valuable information much faster.
Use memory-profiling tools
It's a good idea to use memory profiling tools, such as memory_profiler, valgrind, and pympler, to measure an app’s memory footprint before pushing your application to production. This step ensures you're not caught off-guard and avoid negatively impacting the user experience.
For example, let's see how we can use memory_profiler to analyze memory consumption.
We can simply install memory_profiler
with the following command.
$ pip install -U memory_profiler
Once the dependency is installed, add the @profile
annotation above the function you wish to analyze.
import pandas as pd
@profile #Adding the @profile annotation
def readDataInChunks():
result = None
for chunk in pd.read_csv("employees.csv", chunksize=20):
employees = chunk["FIRST_NAME"]
chunk_result = employees.value_counts()
if result is None:
result = chunk_result
else:
result = result.add(chunk_result, fill_value=0)
result.sort_values(ascending=False, inplace=True)
print(result)
readDataInChunks()
We can then execute the program with the following command.
python -m memory_profiler example.py
Alongside the program's log results, you should see the following output. You can use this information to optimize certain portions of your code.
Line # Mem usage Increment Occurrences Line Contents
=============================================================
9 56.156 MiB 56.156 MiB 1 @profile
10 def readDataInChunks():
11 56.160 MiB 0.004 MiB 1 result = None
12 57.566 MiB 1.133 MiB 4 for chunk in pd.read_csv("employees.csv", chunksize=20):
13 57.555 MiB 0.062 MiB 3 employees = chunk["FIRST_NAME"]
14 57.555 MiB 0.090 MiB 3 chunk_result = employees.value_counts()
15 57.555 MiB 0.000 MiB 3 if result is None:
16 57.152 MiB 0.000 MiB 1 result = chunk_result
17 else:
18 57.562 MiB 0.121 MiB 2 result = result.add(chunk_result, fill_value=0)
19
20 57.570 MiB 0.004 MiB 1 result.sort_values(ascending=False, inplace=True)
21 57.621 MiB 0.051 MiB 1 print(result)
Conclusion
As you work on a software project, having a low memory footprint should be on the top of your list and not just an afterthought. Applications with low memory consumption can experience fewer crashes and freezes and, thus, improve the overall user experience.
Using generators instead of lists, avoiding memory-intensive libraries, and reading data in smaller chunks are some helpful tips for lowering your app's memory footprint.
Posted on April 5, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 29, 2024