Numpy arrays at lightspeed ⚡ Part 3

zenulabidin

Ali Sherief

Posted on March 23, 2020

Numpy arrays at lightspeed ⚡ Part 3

In this post I will cover some parts of numpy that are good to have handy when you are using it. This is a miscellaneous toolbox of concepts which I haven't sorted out but nevertheless still shine out when their value is needed.

A little more indexing thing

You can also reshape an array by setting its shape property to a new size tuple, like x.shape = (2,5).

Say you have this large array: z = np.arange(81).reshape(3,3,3,3). Indexing an array using lists of indices is different from using tuples of indices.

>>> z[[1,1,1,1]] # Grabs the entire second dimension, and makes four copies of it on another array
array([[[[27, 28, 29],
         [30, 31, 32], ...
>>> z[(1,1,1,1)] # Gets the element in the corresponding array coordinate/position
40
Enter fullscreen mode Exit fullscreen mode

Ellipsis can be used as a shorthand for entire slices of intermediate dimensions. Slices also work too:

>>> z[1, Ellipsis, 1]       # same as [1,...,1], itself same as [1, :, :, 1]
array([[28, 31, 34],
       [37, 40, 43],
       [46, 49, 52]])
>>> z[1,1,1,slice(0,2)]     # same as [1,1,1,0:2]
array([39, 40])
Enter fullscreen mode Exit fullscreen mode

Enclosing tuples don't seem to affect indexing at all, so using z[(1,1)] is the same as using z[1,1]. You can even create an indexing variable without a tuple, courtesy of Python syntax.

>>> indices = 1, Ellipsis, 1    # Same as (1, Ellipsis, 1)
>>> z[indices]
array([[28, 31, 34],
       [37, 40, 43],
       [46, 49, 52]])
Enter fullscreen mode Exit fullscreen mode

Again, it's worth repeating from Part 1, since this is a very common error people make, instead of assigning to same index in an array multiple times:

>>> x = np.arange(0, 50, 10)
>>> x
array([ 0, 10, 20, 30, 40])
>>> x[np.array([1, 1, 3, 1])] += 1    # Doesn't work properly
>>> x
array([ 0, 11, 20, 31, 40])
Enter fullscreen mode Exit fullscreen mode

Do somehting like this instead:

>>> x[np.array([3])] += 1
>>> x[np.array([1])] += 3
>>> x
array([ 0, 13, 20, 31, 40])         # You probably meant to to do this
Enter fullscreen mode Exit fullscreen mode

Data types

Numpy has several data types that correspond to C types. They can be used where a dtype keyword argument is allowed.

There are platform-defined types which correspond to the types you're used to in procedural languages like C and C++, and also types that always hold a specific number of bits. The following are scalar types which are not instances of dtype objects but can be used in place of a dtype object, that is, any function where the dtype keyword argument is accepted.

Numpy scalar type C type Description
np.bool_ bool Boolean (True or False) stored as a byte
np.byte signed char Platform-defined
np.ubyte unsigned char Platform-defined
np.short short Platform-defined
np.ushort unsigned short Platform-defined
np.intc int Platform-defined
np.uintc unsigned int Platform-defined
np.int_ long Platform-defined
np.uint unsigned long Platform-defined
np.longlong long long Platform-defined
np.ulonglong unsigned long long Platform-defined
np.half / np.float16 Half precision float: sign bit, 5 bits exponent, 10 bits mantissa
np.single float Platform-defined single precision float: typically sign bit, 8 bits exponent, 23 bits mantissa
np.float / np.double double Platform-defined double precision float: typically sign bit, 11 bits exponent, 52 bits mantissa.
np.longdouble long double Platform-defined extended-precision float
np.csingle float _Complex Complex number, represented by two single-precision floats (real and imaginary components)
np.cfloat / np.cdouble double _Complex Complex number, represented by two double-precision floats (real and imaginary components).
np.clongdouble long double _Complex Complex number, represented by two extended-precision floats (real and imaginary components).
np.int8 int8_t Byte (-128 to 127)
np.int16 int16_t Integer (-32768 to 32767)
np.int32 int32_t Integer (-2147483648 to 2147483647)
np.int64 int64_t Integer (-9223372036854775808 to 9223372036854775807)
np.uint8 uint8_t Unsigned integer (0 to 255)
np.uint16 uint16_t Unsigned integer (0 to 65535)
np.uint32 uint32_t Unsigned integer (0 to 4294967295)
np.uint64 uint64_t Unsigned integer (0 to 18446744073709551615)
np.intp intptr_t Integer used for indexing, typically the same as ssize_t
np.uintp uintptr_t Integer large enough to hold a pointer
np.float32 float
np.float64 / np.float_ double Note that this matches the precision of the builtin python float.
np.complex64 float _Complex Complex number, represented by two 32-bit floats (real and imaginary components)
np.complex128 / np.complex_ double _Complex Note that this matches the precision of the builtin python complex.

Look at the table carefully. Some numpy types don't have the same size as similarly named C counterparts. In particular:

  • np.uint is unsigned long, not unsigned int.
  • np.float and np.cfloat are the same size as C's double and double _Complex, respectively.

(Reading through tables like this is good, as I didn't know C and C++ had a float/double complex type until now.)

Calling np.<whatever-type>() as a function with whatever numbers, lists or ndarrays you want to cast will cast them into an ndarray of that type. So these functions behave like C's (int) or C++'s static_cast<int>. An example:

In [1]: np.uint8([1,2,3])                                                       
Out[1]: array([1, 2, 3], dtype=uint8)
# This also works:
In [2]: np.array([1,2,3]).astype(np.int8)                                       
Out[2]: array([1, 2, 3], dtype=int8)
Enter fullscreen mode Exit fullscreen mode

There are some "meta-types", specifically, generic types, existing in numpy that describe classes of types, such as integers and floating point numbers. One of their uses is figuring out an element type:

>>> d = np.dtype(int)
>>> d
dtype('int32')

>>> np.issubdtype(d, np.integer)
True

>>> np.issubdtype(d, np.floating)
False
Enter fullscreen mode Exit fullscreen mode

The generic types are: number, inexact, floating, complexfloating, integer, signedinteger, unsignedinteger, character, generic, flexible, and void. I have created a script which will classify each dtype according to their subtype. If prints whether each numpy dtype is a subtype of a generic type. I thought about annotating the table above with these results but then it would get too cluttered.

In [1]: np_types = ["bool_", "byte", "ubyte", "short", "ushort", "intc", "uintc
   ...: ", "int_", "uint", "longlong", "ulonglong", "half", "single", "float16"
   ...: , "float", "double", "longdouble", "csingle", "cfloat", "cdouble", "clo
   ...: ngdouble", "int8", "int16", "int32", "int64", "uint8", "uint16", "uint3
   ...: 2", "uint64", "intp", "uintp", "float32", "float64", "float_", "complex
   ...: 64", "complex128", "complex_"]
In [2]: np_generics = [np.number, np.inexact, np.floating, np.complexfloating, 
   ...: np.integer, np.signedinteger, np.unsignedinteger, np.character, np.gene
   ...: ric, np.flexible] 
In [3]: for dt in np_types: 
   ...:     for gn in np_generics: 
   ...:         print("{0} is {1}: {2}".format(dt, gn.__name__, np.issubdtype(dt, gn))) 
bool_ is number: False
bool_ is inexact: False
bool_ is floating: False
bool_ is complexfloating: False
bool_ is integer: False
bool_ is signedinteger: False
bool_ is unsignedinteger: False
bool_ is character: False
bool_ is generic: True
bool_ is flexible: False
byte is number: True
byte is inexact: False
byte is floating: False
...
Enter fullscreen mode Exit fullscreen mode

More data types

There are more combinations of objects that a dtype can specify than just integers and floating point numbers. They can also represent byte order, the field names and types inside the type if it's a structure, and the shape and datatype of elements if dtype is an array, and even a void type (for raw data).

This allows dtypes to emulate the full gamut of C and C++ types without enforcing the strict typing present in those languages.

A dtype string is composed of the following characters, in order:

  • > for big endian, < for little endian, and = for hardware default
  • a character signifying the type. Could be ? for boolean, b for signed byte, B for unsigned byte, i for signed integer, u for unsigned integer, f for floating point, c for complex floating point, m for a timedelta python object, M for a datetime object, O for some other python object, U for unicode string, and V for void, or raw data, void is rarely needed
  • the number of bytes in the type. (for Unicode string it's number of characters instead)
>>> dt = np.dtype('i4')   # 32-bit signed integer
>>> dt = np.dtype('f8')   # 64-bit floating-point number
>>> dt = np.dtype('c16')  # 128-bit complex floating-point number
>>> dt = np.dtype('a25')  # 25-length zero-terminated bytes
>>> dt = np.dtype('U25')  # 25-character string
Enter fullscreen mode Exit fullscreen mode

At the beginning of a sequence of dtype characters you can insert dimensions like (3,2), which makes it a 3x2 array of that type.

You can put multiple dtypes in the same string separated by commas to represent a struct. Structs can be indexed by strings 'f0', 'f1', etc. unless you give the struct different field names. Here's what all this would look like:

# Array dtype
In [1]: dt = np.dtype(('(2,3)U4'))                                            

In [2]: np.array([(('fffeee', '4', '5'), 'b'), 'as', 'dw'], dt)               
Out[2]: 
array([[['fffe', '4', '5'],
        ['b', 'b', 'b']],

       [['a', 'a', 'a'],
        ['s', 's', 's']],

       [['d', 'd', 'd'],
        ['w', 'w', 'w']]], dtype='<U4')

# This fails because arrays aren't structs
In [3]: np.array([(('fffeee', '4', '5'), 'b'), 'as', 'dw'], dt)['f0']         
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-143-9d57896acaf5> in <module>
---------> 1 np.array([(('fffeee', '4', '5'), 'b'), 'as', 'dw'], dt)['f0']

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices


# Structure dtype
In [4]: dt = np.dtype(('i4, U4, f4'))     # fields default to 'f0', 'f1', 'f2'

In [5]: np.array([(1, 'a', 1.23), (2, 'b', 2.4), (3, 'c', 2.3)], dt)           
Out[5]: 
array([(1, 'a', 1.23), (2, 'b', 2.4 ), (3, 'c', 2.3 )],
      dtype=[('f0', '<i4'), ('f1', '<U4'), ('f2', '<f4')])

In [6]: np.array([(1, 'a', 1.23), (2, 'b', 2.4), (3, 'c', 2.3)], dt)['f0']    
Out[5]: array([1, 2, 3], dtype=int32)
Enter fullscreen mode Exit fullscreen mode

Now that we got those out of the way, there are other parameter formats np.dtype() can take:

  • dtype(type) where type is any string in np.sctypeDict.keys(). It's worth elaborating that np.sctypeDict() contains the mappings between all numpy dtypes to fixed-width scalar types.
  • dtype(flexible_dtype, itemsize) where flexible_dtype is a numpy type, and itemsize is the number of bytes it will take. This only makes sense for variable width types like unicode string and void. Other types won't honor itemsize.
In [1]: np.dtype(np.int, 10)                                                  
Out[1]: dtype('int64')

In [2]: np.dtype('U', 10)                                                     
Out[2]: dtype('<U')

In [3]: np.array('1234567890', np.dtype('U', 10))                             
Out[3]: array('1234567890', dtype='<U10')

In [4]: np.array('12345678901', np.dtype('U', 10))                            
Out[4]: array('12345678901', dtype='<U11')
Enter fullscreen mode Exit fullscreen mode
  • dtype(flexible_dtype, shape), same as above, but shape can be a tuple of array dimensions.
  • dtype([(field_name, field_dtype, shape), ...]) makes an array type. It has the name of a field followed by its shape and type string as formulated above. Note: A dtype string can also make arrays.
In [1]: dt = np.dtype([('R','u1'), ('G','u1'), ('B','u1'), ('A','u1')])

In [2]: np.array([(1,1,1,1), (2,2,22,2)], dt)                                 
Out[2]: 
array([(1, 1,  1, 1), (2, 2, 22, 2)],
      dtype=[('R', 'u1'), ('G', 'u1'), ('B', 'u1'), ('A', 'u1')])

In [3]: np.array([(1,1,1,1), (2,2,22,2)], dt)['G']                            
Out[3]: array([1, 2], dtype=uint8)
Enter fullscreen mode Exit fullscreen mode
  • dtype({'names': <list>, 'formats': <list>, 'offsets': <list>, 'titles': <list>, 'itemsize': <list>}), this makes a struct where the names is a list of field names, formats is a list of dtype strings or numpy scalar types, offset can only be used for padding integer dtypes (it's a list of offsets in the structure each field is, like C structs), titles is a list of textual descriptions of each field, and itemsize is the size the entire dtype should be padded to, which you can make larger than the actual size of the struct if you want to e.g. align it to a power of 2. Note: A dtype string can also make structs.
In [1]: dt = np.dtype({'names': ['r','g','b','a'], 
   ...:                'formats': [np.uint8, np.uint8, np.uint8, np.uint8]})

In [2]: np.array([(1,2,3,4),(5,256,265,255)],dt)                              
Out[2]: 
array([(1, 2, 3,   4), (5, 0, 9, 255)],
      dtype=[('r', 'u1'), ('g', 'u1'), ('b', 'u1'), ('a', 'u1')])

In [3]: np.array([(1,2,3,4),(5,256,265,255)],dt)['r']                         
Out[3]: array([1, 5], dtype=uint8)

Enter fullscreen mode Exit fullscreen mode
  • dtype((base_dtype, new_dtype)), creates an object of new_dtype but masquerades it as base_dtype.

In [1]: dt = np.dtype((np.int32,{'real':(np.int16, 0),'imag':(np.int16, 2)}))

In [2]: np.array([(6+(6<<20),5+(5<<20)),(4+(4<<20), 3+(3<<20))],dt)           
Out[2]: 
array([[6291462, 5242885],
       [4194308, 3145731]],
      dtype=(numpy.int32, [('real', '<i2'), ('imag', '<i2')]))


In [3]: np.array([(6+(6<<20),5+(5<<20)),(4+(4<<20), 3+(3<<20))],dt)['real']   
Out[3]: 
array([[6, 5],
       [4, 3]], dtype=int16)

In [4]: np.array([(6+(6<<20),5+(5<<20)),(4+(4<<20), 3+(3<<20))],dt)['imag']   
Out[4]: 
array([[96, 80],
       [64, 48]], dtype=int16)
Enter fullscreen mode Exit fullscreen mode

This is a rough hack to use the bottom bits of an int32 as a real number and the top bits as an imaginary number.

In [1]: dt = np.dtype(('u4', [('r','u1'),('g','u1'),('b','u1'),('a','u1')]))

In [2]: np.array([-1,-2,-3,-4], dt)                                           
Out[2]: 
array([4294967295, 4294967294, 4294967293, 4294967292],
      dtype=(numpy.uint32, [('r', 'u1'), ('g', 'u1'), ('b', 'u1'), ('a', 'u1')]))

In [3]: np.array([-1,-2,-3,-4], dt)['r']                                      
Out[3]: array([255, 254, 253, 252], dtype=uint8)

In [4]: np.array([-1,-2,-3,-4], dt)['g']                                      
Out[4]: array([255, 255, 255, 255], dtype=uint8)
Enter fullscreen mode Exit fullscreen mode

This method of dtype formulation would also allow you to access red, green, blue and alpha parts of a color by indexing, without having to make an array of structures or doing bitwise operations. Instead, it wraps all of these types around a uint32.

Useful dtype members

The following members are mostly things that I've gone over in the above sections, but they are convenient to have handy to access programmatically:

  • <dtype-obj>.type: Underlying type used by numpy for this dtype
  • <dtype-obj>.kind: One of the character codes above that identifies the type
  • <dtype-obj>.char and <dtype-obj>.num: More identifiers of the type
  • <dtype-obj>.str: The dtype string, more formally known as the array-protocol typestring
  • <dtype-obj>.name: Name of the dtype
  • <dtype-obj>.size: Number of bytes this dtype takes
  • <dtype-obj>.byteorder: A character describing the endian order of this dtype

For structs you also have access to:

  • <dtype-obj>.fields: Dictionary of names of fields and their dtypes. This is None if there are no fields.
  • <dtype-obj>.name: List of field names. This is None if there are no fields.

For arrays you also have access to:

  • <dtype-obj>.subdtype: Tuple of its dtype and shape. This is None if not an array dtype.
  • <dtype-obj>.subdtype: Array dtype's shape tuple. This is () if not an array dtype.

Array scalars

A single element from a numpy array is also a numpy type. The consistency of types between elements and arrays gives elements most of the members that are available for arrays. For example, you have dtype, size, reshape() and transpose() among many others.

Numeric limits

Being written in C, numpy data types are not overflow-resistant like Python's numeric types. So you can check the minimum and maximum values of each type with np.linfo() for integer types, and np.finfo() for floating point types.

In [1]: np.iinfo(np.int64)                                                     
Out[1]: iinfo(min=-9223372036854775808, max=9223372036854775807, dtype=int64)

In [2]: np.finfo(np.float64)                                                   
Out[2]: finfo(resolution=1e-15, min=-1.7976931348623157e+308, max=1.7976931348623157e+308, dtype=float64)
Enter fullscreen mode Exit fullscreen mode

Now, a word about floating point precision, since integer precision is well known to be limited to the number of bits it has:

Python uses 64-bit floats (double in other words) to represent floats. Numpy has its own float types. In particular, it also has a long double-like 80-bit floating point type that you saw above (np.longdouble), unless you are using Windows, in which case long double is only a 64-bit float because of the Windows compiler, MSVC. Now np.longdouble is the most precise floating point type that is possible in numpy, but there are symbolic variables and rational numbers support in the sympy package if you really need more precision, or unlimited precision. That will not be covered in this post.

Outside of numpy, extra precision offered by np.longdouble, if any, will be lost when converting it to Python's 64-bit float type, like if you want to format it to a string, or call any python (not numpy) function that expects a python float.

np.finfo(np.longdouble).eps is the smallest representable long double number. If correct values with np.longdouble is important to you, check that passing 1 and 1 + np.finfo(np.longdouble).eps to your functions don't get squished into the same number, particularly if long doubles are 80 bits on your system as eps will be lost when converting to a python float causing both values to be equivalent (to 1).

And we're done

Much material was sourced from the numpy manual.

If you see anything incorrect here please let me know so I can fix it.

Image by Arek Socha from Pixabay

💖 💪 🙅 🚩
zenulabidin
Ali Sherief

Posted on March 23, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related