Numpy arrays at lightspeed ⚡ Part 3
Ali Sherief
Posted on March 23, 2020
In this post I will cover some parts of numpy that are good to have handy when you are using it. This is a miscellaneous toolbox of concepts which I haven't sorted out but nevertheless still shine out when their value is needed.
A little more indexing thing
You can also reshape an array by setting its shape
property to a new size tuple, like x.shape = (2,5)
.
Say you have this large array: z = np.arange(81).reshape(3,3,3,3)
. Indexing an array using lists of indices is different from using tuples of indices.
>>> z[[1,1,1,1]] # Grabs the entire second dimension, and makes four copies of it on another array
array([[[[27, 28, 29],
[30, 31, 32], ...
>>> z[(1,1,1,1)] # Gets the element in the corresponding array coordinate/position
40
Ellipsis can be used as a shorthand for entire slices of intermediate dimensions. Slices also work too:
>>> z[1, Ellipsis, 1] # same as [1,...,1], itself same as [1, :, :, 1]
array([[28, 31, 34],
[37, 40, 43],
[46, 49, 52]])
>>> z[1,1,1,slice(0,2)] # same as [1,1,1,0:2]
array([39, 40])
Enclosing tuples don't seem to affect indexing at all, so using z[(1,1)]
is the same as using z[1,1]
. You can even create an indexing variable without a tuple, courtesy of Python syntax.
>>> indices = 1, Ellipsis, 1 # Same as (1, Ellipsis, 1)
>>> z[indices]
array([[28, 31, 34],
[37, 40, 43],
[46, 49, 52]])
Again, it's worth repeating from Part 1, since this is a very common error people make, instead of assigning to same index in an array multiple times:
>>> x = np.arange(0, 50, 10)
>>> x
array([ 0, 10, 20, 30, 40])
>>> x[np.array([1, 1, 3, 1])] += 1 # Doesn't work properly
>>> x
array([ 0, 11, 20, 31, 40])
Do somehting like this instead:
>>> x[np.array([3])] += 1
>>> x[np.array([1])] += 3
>>> x
array([ 0, 13, 20, 31, 40]) # You probably meant to to do this
Data types
Numpy has several data types that correspond to C types. They can be used where a dtype
keyword argument is allowed.
There are platform-defined types which correspond to the types you're used to in procedural languages like C and C++, and also types that always hold a specific number of bits. The following are scalar types which are not instances of dtype objects but can be used in place of a dtype object, that is, any function where the dtype
keyword argument is accepted.
Numpy scalar type | C type | Description |
---|---|---|
np.bool_ | bool | Boolean (True or False) stored as a byte |
np.byte | signed char | Platform-defined |
np.ubyte | unsigned char | Platform-defined |
np.short | short | Platform-defined |
np.ushort | unsigned short | Platform-defined |
np.intc | int | Platform-defined |
np.uintc | unsigned int | Platform-defined |
np.int_ | long | Platform-defined |
np.uint | unsigned long | Platform-defined |
np.longlong | long long | Platform-defined |
np.ulonglong | unsigned long long | Platform-defined |
np.half / np.float16 | Half precision float: sign bit, 5 bits exponent, 10 bits mantissa | |
np.single | float | Platform-defined single precision float: typically sign bit, 8 bits exponent, 23 bits mantissa |
np.float / np.double | double | Platform-defined double precision float: typically sign bit, 11 bits exponent, 52 bits mantissa. |
np.longdouble | long double | Platform-defined extended-precision float |
np.csingle | float _Complex | Complex number, represented by two single-precision floats (real and imaginary components) |
np.cfloat / np.cdouble | double _Complex | Complex number, represented by two double-precision floats (real and imaginary components). |
np.clongdouble | long double _Complex | Complex number, represented by two extended-precision floats (real and imaginary components). |
np.int8 | int8_t | Byte (-128 to 127) |
np.int16 | int16_t | Integer (-32768 to 32767) |
np.int32 | int32_t | Integer (-2147483648 to 2147483647) |
np.int64 | int64_t | Integer (-9223372036854775808 to 9223372036854775807) |
np.uint8 | uint8_t | Unsigned integer (0 to 255) |
np.uint16 | uint16_t | Unsigned integer (0 to 65535) |
np.uint32 | uint32_t | Unsigned integer (0 to 4294967295) |
np.uint64 | uint64_t | Unsigned integer (0 to 18446744073709551615) |
np.intp | intptr_t | Integer used for indexing, typically the same as ssize_t |
np.uintp | uintptr_t | Integer large enough to hold a pointer |
np.float32 | float | |
np.float64 / np.float_ | double | Note that this matches the precision of the builtin python float. |
np.complex64 | float _Complex | Complex number, represented by two 32-bit floats (real and imaginary components) |
np.complex128 / np.complex_ | double _Complex | Note that this matches the precision of the builtin python complex. |
Look at the table carefully. Some numpy types don't have the same size as similarly named C counterparts. In particular:
-
np.uint
is unsigned long, not unsigned int. -
np.float
andnp.cfloat
are the same size as C's double and double _Complex, respectively.
(Reading through tables like this is good, as I didn't know C and C++ had a float/double complex type until now.)
Calling np.<whatever-type>()
as a function with whatever numbers, lists or ndarrays you want to cast will cast them into an ndarray of that type. So these functions behave like C's (int)
or C++'s static_cast<int>
. An example:
In [1]: np.uint8([1,2,3])
Out[1]: array([1, 2, 3], dtype=uint8)
# This also works:
In [2]: np.array([1,2,3]).astype(np.int8)
Out[2]: array([1, 2, 3], dtype=int8)
There are some "meta-types", specifically, generic types, existing in numpy that describe classes of types, such as integers and floating point numbers. One of their uses is figuring out an element type:
>>> d = np.dtype(int)
>>> d
dtype('int32')
>>> np.issubdtype(d, np.integer)
True
>>> np.issubdtype(d, np.floating)
False
The generic types are: number, inexact, floating, complexfloating, integer, signedinteger, unsignedinteger, character, generic, flexible, and void. I have created a script which will classify each dtype according to their subtype. If prints whether each numpy dtype is a subtype of a generic type. I thought about annotating the table above with these results but then it would get too cluttered.
In [1]: np_types = ["bool_", "byte", "ubyte", "short", "ushort", "intc", "uintc
...: ", "int_", "uint", "longlong", "ulonglong", "half", "single", "float16"
...: , "float", "double", "longdouble", "csingle", "cfloat", "cdouble", "clo
...: ngdouble", "int8", "int16", "int32", "int64", "uint8", "uint16", "uint3
...: 2", "uint64", "intp", "uintp", "float32", "float64", "float_", "complex
...: 64", "complex128", "complex_"]
In [2]: np_generics = [np.number, np.inexact, np.floating, np.complexfloating,
...: np.integer, np.signedinteger, np.unsignedinteger, np.character, np.gene
...: ric, np.flexible]
In [3]: for dt in np_types:
...: for gn in np_generics:
...: print("{0} is {1}: {2}".format(dt, gn.__name__, np.issubdtype(dt, gn)))
bool_ is number: False
bool_ is inexact: False
bool_ is floating: False
bool_ is complexfloating: False
bool_ is integer: False
bool_ is signedinteger: False
bool_ is unsignedinteger: False
bool_ is character: False
bool_ is generic: True
bool_ is flexible: False
byte is number: True
byte is inexact: False
byte is floating: False
...
More data types
There are more combinations of objects that a dtype can specify than just integers and floating point numbers. They can also represent byte order, the field names and types inside the type if it's a structure, and the shape and datatype of elements if dtype is an array, and even a void type (for raw data).
This allows dtypes to emulate the full gamut of C and C++ types without enforcing the strict typing present in those languages.
A dtype string is composed of the following characters, in order:
-
>
for big endian,<
for little endian, and=
for hardware default - a character signifying the type. Could be
?
for boolean,b
for signed byte,B
for unsigned byte,i
for signed integer,u
for unsigned integer,f
for floating point,c
for complex floating point,m
for atimedelta
python object,M
for adatetime
object,O
for some other python object,U
for unicode string, andV
for void, or raw data, void is rarely needed - the number of bytes in the type. (for Unicode string it's number of characters instead)
>>> dt = np.dtype('i4') # 32-bit signed integer
>>> dt = np.dtype('f8') # 64-bit floating-point number
>>> dt = np.dtype('c16') # 128-bit complex floating-point number
>>> dt = np.dtype('a25') # 25-length zero-terminated bytes
>>> dt = np.dtype('U25') # 25-character string
At the beginning of a sequence of dtype characters you can insert dimensions like (3,2)
, which makes it a 3x2 array of that type.
You can put multiple dtypes in the same string separated by commas to represent a struct. Structs can be indexed by strings 'f0', 'f1', etc. unless you give the struct different field names. Here's what all this would look like:
# Array dtype
In [1]: dt = np.dtype(('(2,3)U4'))
In [2]: np.array([(('fffeee', '4', '5'), 'b'), 'as', 'dw'], dt)
Out[2]:
array([[['fffe', '4', '5'],
['b', 'b', 'b']],
[['a', 'a', 'a'],
['s', 's', 's']],
[['d', 'd', 'd'],
['w', 'w', 'w']]], dtype='<U4')
# This fails because arrays aren't structs
In [3]: np.array([(('fffeee', '4', '5'), 'b'), 'as', 'dw'], dt)['f0']
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-143-9d57896acaf5> in <module>
---------> 1 np.array([(('fffeee', '4', '5'), 'b'), 'as', 'dw'], dt)['f0']
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
# Structure dtype
In [4]: dt = np.dtype(('i4, U4, f4')) # fields default to 'f0', 'f1', 'f2'
In [5]: np.array([(1, 'a', 1.23), (2, 'b', 2.4), (3, 'c', 2.3)], dt)
Out[5]:
array([(1, 'a', 1.23), (2, 'b', 2.4 ), (3, 'c', 2.3 )],
dtype=[('f0', '<i4'), ('f1', '<U4'), ('f2', '<f4')])
In [6]: np.array([(1, 'a', 1.23), (2, 'b', 2.4), (3, 'c', 2.3)], dt)['f0']
Out[5]: array([1, 2, 3], dtype=int32)
Now that we got those out of the way, there are other parameter formats np.dtype()
can take:
-
dtype(type)
wheretype
is any string innp.sctypeDict.keys()
. It's worth elaborating thatnp.sctypeDict()
contains the mappings between all numpy dtypes to fixed-width scalar types. -
dtype(flexible_dtype, itemsize)
whereflexible_dtype
is a numpy type, anditemsize
is the number of bytes it will take. This only makes sense for variable width types like unicode string and void. Other types won't honoritemsize
.
In [1]: np.dtype(np.int, 10)
Out[1]: dtype('int64')
In [2]: np.dtype('U', 10)
Out[2]: dtype('<U')
In [3]: np.array('1234567890', np.dtype('U', 10))
Out[3]: array('1234567890', dtype='<U10')
In [4]: np.array('12345678901', np.dtype('U', 10))
Out[4]: array('12345678901', dtype='<U11')
-
dtype(flexible_dtype, shape)
, same as above, but shape can be a tuple of array dimensions. -
dtype([(field_name, field_dtype, shape), ...])
makes an array type. It has the name of a field followed by its shape and type string as formulated above. Note: A dtype string can also make arrays.
In [1]: dt = np.dtype([('R','u1'), ('G','u1'), ('B','u1'), ('A','u1')])
In [2]: np.array([(1,1,1,1), (2,2,22,2)], dt)
Out[2]:
array([(1, 1, 1, 1), (2, 2, 22, 2)],
dtype=[('R', 'u1'), ('G', 'u1'), ('B', 'u1'), ('A', 'u1')])
In [3]: np.array([(1,1,1,1), (2,2,22,2)], dt)['G']
Out[3]: array([1, 2], dtype=uint8)
-
dtype({'names': <list>, 'formats': <list>, 'offsets': <list>, 'titles': <list>, 'itemsize': <list>})
, this makes a struct where thenames
is a list of field names,formats
is a list of dtype strings or numpy scalar types,offset
can only be used for padding integer dtypes (it's a list of offsets in the structure each field is, like C structs),titles
is a list of textual descriptions of each field, anditemsize
is the size the entire dtype should be padded to, which you can make larger than the actual size of the struct if you want to e.g. align it to a power of 2. Note: A dtype string can also make structs.
In [1]: dt = np.dtype({'names': ['r','g','b','a'],
...: 'formats': [np.uint8, np.uint8, np.uint8, np.uint8]})
In [2]: np.array([(1,2,3,4),(5,256,265,255)],dt)
Out[2]:
array([(1, 2, 3, 4), (5, 0, 9, 255)],
dtype=[('r', 'u1'), ('g', 'u1'), ('b', 'u1'), ('a', 'u1')])
In [3]: np.array([(1,2,3,4),(5,256,265,255)],dt)['r']
Out[3]: array([1, 5], dtype=uint8)
-
dtype((base_dtype, new_dtype))
, creates an object ofnew_dtype
but masquerades it asbase_dtype
.
In [1]: dt = np.dtype((np.int32,{'real':(np.int16, 0),'imag':(np.int16, 2)}))
In [2]: np.array([(6+(6<<20),5+(5<<20)),(4+(4<<20), 3+(3<<20))],dt)
Out[2]:
array([[6291462, 5242885],
[4194308, 3145731]],
dtype=(numpy.int32, [('real', '<i2'), ('imag', '<i2')]))
In [3]: np.array([(6+(6<<20),5+(5<<20)),(4+(4<<20), 3+(3<<20))],dt)['real']
Out[3]:
array([[6, 5],
[4, 3]], dtype=int16)
In [4]: np.array([(6+(6<<20),5+(5<<20)),(4+(4<<20), 3+(3<<20))],dt)['imag']
Out[4]:
array([[96, 80],
[64, 48]], dtype=int16)
This is a rough hack to use the bottom bits of an int32 as a real number and the top bits as an imaginary number.
In [1]: dt = np.dtype(('u4', [('r','u1'),('g','u1'),('b','u1'),('a','u1')]))
In [2]: np.array([-1,-2,-3,-4], dt)
Out[2]:
array([4294967295, 4294967294, 4294967293, 4294967292],
dtype=(numpy.uint32, [('r', 'u1'), ('g', 'u1'), ('b', 'u1'), ('a', 'u1')]))
In [3]: np.array([-1,-2,-3,-4], dt)['r']
Out[3]: array([255, 254, 253, 252], dtype=uint8)
In [4]: np.array([-1,-2,-3,-4], dt)['g']
Out[4]: array([255, 255, 255, 255], dtype=uint8)
This method of dtype formulation would also allow you to access red, green, blue and alpha parts of a color by indexing, without having to make an array of structures or doing bitwise operations. Instead, it wraps all of these types around a uint32
.
Useful dtype members
The following members are mostly things that I've gone over in the above sections, but they are convenient to have handy to access programmatically:
-
<dtype-obj>.type
: Underlying type used by numpy for this dtype -
<dtype-obj>.kind
: One of the character codes above that identifies the type -
<dtype-obj>.char
and<dtype-obj>.num
: More identifiers of the type -
<dtype-obj>.str
: The dtype string, more formally known as the array-protocol typestring -
<dtype-obj>.name
: Name of the dtype -
<dtype-obj>.size
: Number of bytes this dtype takes -
<dtype-obj>.byteorder
: A character describing the endian order of this dtype
For structs you also have access to:
-
<dtype-obj>.fields
: Dictionary of names of fields and their dtypes. This is None if there are no fields. -
<dtype-obj>.name
: List of field names. This is None if there are no fields.
For arrays you also have access to:
-
<dtype-obj>.subdtype
: Tuple of its dtype and shape. This is None if not an array dtype. -
<dtype-obj>.subdtype
: Array dtype's shape tuple. This is()
if not an array dtype.
Array scalars
A single element from a numpy array is also a numpy type. The consistency of types between elements and arrays gives elements most of the members that are available for arrays. For example, you have dtype
, size
, reshape()
and transpose()
among many others.
Numeric limits
Being written in C, numpy data types are not overflow-resistant like Python's numeric types. So you can check the minimum and maximum values of each type with np.linfo()
for integer types, and np.finfo()
for floating point types.
In [1]: np.iinfo(np.int64)
Out[1]: iinfo(min=-9223372036854775808, max=9223372036854775807, dtype=int64)
In [2]: np.finfo(np.float64)
Out[2]: finfo(resolution=1e-15, min=-1.7976931348623157e+308, max=1.7976931348623157e+308, dtype=float64)
Now, a word about floating point precision, since integer precision is well known to be limited to the number of bits it has:
Python uses 64-bit floats (double
in other words) to represent floats. Numpy has its own float types. In particular, it also has a long double
-like 80-bit floating point type that you saw above (np.longdouble
), unless you are using Windows, in which case long double
is only a 64-bit float because of the Windows compiler, MSVC. Now np.longdouble
is the most precise floating point type that is possible in numpy, but there are symbolic variables and rational numbers support in the sympy
package if you really need more precision, or unlimited precision. That will not be covered in this post.
Outside of numpy, extra precision offered by np.longdouble
, if any, will be lost when converting it to Python's 64-bit float type, like if you want to format it to a string, or call any python (not numpy) function that expects a python float
.
np.finfo(np.longdouble).eps
is the smallest representable long double number. If correct values with np.longdouble
is important to you, check that passing 1
and 1 + np.finfo(np.longdouble).eps
to your functions don't get squished into the same number, particularly if long doubles are 80 bits on your system as eps
will be lost when converting to a python float causing both values to be equivalent (to 1).
And we're done
Much material was sourced from the numpy manual.
If you see anything incorrect here please let me know so I can fix it.
Image by Arek Socha from Pixabay
Posted on March 23, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.