Justin Watson
Posted on July 18, 2024
- What is NumPy anyways?
- The Rules of NumPy Arrays
- Use Cases for NumPy
- Installing NumPy
- NumPy Array Fundamentals
- Beyond the Basics
- Conclusion
What is NumPy anyways?
Simply put, NumPy(Numerical Python) is a Python Library specifically created to implement support for multi-dimensional arrays and matrices by allowing developers to import mathematical functions to be used on these arrays. That said, a NumPy array is not your average array, as we're about to explore.
The Rules of NumPy Arrays
The heart of the NumPy library is the ndarray
object. This is an object that "represents a multidimensional, homogeneous array of fixed-size items" as the NumPy Docs eloquently put it. In English, this essentially means that we can represent a 1-dimensional or higher-dimensional array, where each element is the same data type. Additionally, NumPy arrays have a fixed size at creation, and changing a ndarray
's size technically creates a new array. These restrictions seem less than ideal at first glance, but it is in this limitation that NumPy is able to shine and focus on it's use case.
Use Cases for NumPy
NumPy, as it's abbreviated name implies, is an absolute powerhouse when it comes to numerical computation. NumPy is actually written in pre-compiled C code, which serves as the secret sauce to it's speed and efficiency. In technical terms, this is what's known as "vectorization", or plainly put, the absence of explicit looping and indexing, in exchange for these operations happening in the background. Let's take a look at an example of this, in comparison to non-vectorized code.
Pretty neat yeah? Let's see how this would work without vectorization.
The magic of NumPy's vectorization is beautifully demonstrated in the simple line fahrenheit_temps = celsius_temps * conversion_factor + 32
. Instead of manually looping through each temperature, this single line effortlessly applies the conversion formula to the entire array at once. This is possible because NumPy cleverly "broadcasts"(a concept we will dive into later in the blog) the single conversion factor to match the size of the temperature array, allowing each element to be transformed simultaneously. This is once again thanks to the C code under the hood taking care of our heavy lifting, so to speak. This might not seem very important in smaller scale work, but if you see yourself working on larger scale datasets down the line, you want to keep this efficiency in mind.
In summary, NumPy calculates at a level of efficiency that Python's built-in sequences can't keep up with, at least when it comes to high volume data. As. a result of this, it is widely used for mathematical/scientific computing in the following areas:
- Data Analysis and Manipulation: NumPy provides the foundation for libraries like pandas, enabling efficient handling of large datasets, cleaning, filtering, transforming, and aggregating data for analysis and visualization. If you have even somewhat of an interest in utilizing Python for data analysis and aggregation, NumPy needs to be in your toolbox.
- Linear Algebra: NumPy offers extensive tools for linear algebra operations, including matrix multiplication and linear equations.
Installing NumPy
If you already have python >3.4 set up, you already have pip by default and can just use the line pip install numpy
in your terminal and move on with the blog. If you have not actually set up Python on your local machine yet, or just want to know about a new tool that may be helpful to you, I'd highly recommend the Anaconda Distribution(no actual snakes involved in installation don't worry). This distribution comes assembled with easy to use searching and installation for packages like NumPy, Pandas, Matplotlib for visualization, and many other commonly used packages for scientific computing and data science alongside an attractive GUI to easily manage all of this. You can download it and install it from the hyperlink provided. The NumPy package can be easily installed from the Anaconda Navigator GUI, but for those inclined to use the command line, here's what I recommend, directly from the NumPy Docs.
NumPy Array Fundamentals
With all of the introduction and set up out of the way, let's establish some of the fundamentals of a NumPy array. This will focus more on NumPy specific aspects, rather than teaching you the basics of arrays in general. Let's start with some methods of creating an array in NumPy.
numpy.zeros(shape, dtype)
There is a similar method called numpy.ones() that works relatively the same as .zeros(). numpy.empty() also works similarly, but fills the array with random elements instead. This results in a marginally faster way to fill up an array, if the use case does not require all zeroes or ones.
numpy.arange(start, stop, step, dtype)
numpy.linspace(start, stop, end)
As with .zeros(), .arange() has a similar method called .linspace() that takes in the start, stop, and number of elements. This is more useful when you want precise control over the number of elements and need evenly spaced floating-point values, and most especially want to stop the stepping at a specific number. Let's see this illustrated more clearly.
Beyond the Basics
Now it's time to get a bit fancy. A common task in NumPy is indexing and slicing across the rows and columns of matrices. This is a bit more complex than your usual array traversal when NumPy arrays come into the equation. Take some time to really digest this.
Indexing and slicing
Key attributes
In NumPy, an array's dimensions describe its structure: a one-dimensional array is a list, a two-dimensional array is a table, and higher dimensions represent increasingly complex structures that take a bit more explanation than this blog aims to provide today.
Deleting elements
Sorting NumPy elements
- [::-1] is a handy trick to reverse an array.
- np.argsort() doesn't directly sort the values; it returns an array of indices that indicate how the array would be sorted. Sometimes, you don't need to actually rearrange the data. This is a niche situation, but can be marginally beneficial for memory.
Conclusion
There's a lot more to NumPy, but rest assured this is plenty of information to start playing around with larger arrays. I'd highly recommend the NumPy docs. I personally used this alongside the Intro to NumPy course on Datacamp. Thank you for reading!
Posted on July 18, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.