Tensor performance benchmarks for Python and C++
Andrew Garcia
Posted on June 25, 2022
What are tensors in computing?
Perhaps this requires a longer discussion but in computing, tensors are a mathematical abstraction. You may find the typical definition of tensors is "multidimensional arrays", and this may be interpreted as a collection of arrays within an array (I will refer to these manifestations as nested vectors in the table below). Tensors, however, are vectorized ("flattened") tensors at the lowest level of code and thus, are typically fastest in this primitive form.
Tensor performance: axes swaps
The image on the cover is a graphical representation of a tensor with 64 elements from 1 to 64. A tensor of any side length
can be easily build as a vector in Python and C++:
# Python
vector = [i for i in range(L*L*L)]
# C++
std::vector<int> vector;
for (int i=1; i<= L*L*L; i++) vector.push_back(i);
For the benchmarks the tensor is a 100 x 100 x 100 tensor with 1 million elements from 1 to 1 million. This is a huge tensor; making a dictionary with all its coordinates results in a 12 MB dictionary file!
Here's a graphical zoomed-in representation of this tensor to compare with the significantly smaller one in the cover photo:
The operation to benchmark has been chosen as the swapping of this huge tensor's 0th axes with its 2nd axes, with index reversal of all indices (that is, the first element becomes the last, the second the penultimate, and so on). This operation has been chosen as transposition has a high time complexity of
.
The graphical representation of this transposed tensor is presented below. It's also zoomed-in because they are large data and hard to move (even when using WebGL which uses the GPU):
The benchmarked operation was run with different computing methodologies and tensor representations. As was evident, vectorized tensors are superior in performance to nested vectors. Not so evidently, this trend reverses in Python when a JIT-compiler is used.
Language | Tensor Init. Container | JIT Compiling | Operation time | Timer |
---|---|---|---|---|
Python | numpy.zeros((N,N,N)); Nested Vectors | - | 2,520 ms ± 229 ns | inline %timeit |
Python | numpy.zeros(N*N*N); Vectorized | - | 613 ms ± 68.4 ms | inline %timeit |
Python | numpy.zeros((N,N,N)); Nested Vectors | Numba | 1.38 ms ± 63,600 ms | inline %timeit |
Python | numpy.zeros(N*N*N); Vectorized | Numba | 5.33 ms ± 186,000 ms | inline %timeit |
C++ | vector<vector<vector<int>>>; Nested Vectors | - | 10.5501ms | chrono::high_resolution_clock |
C++ | vector<int>; Vectorized | - | 0.015105 ms | chrono::high_resolution_clock |
The operations under JIT compilers show a large spread in operation time because they initially take some time to initialize and compile the scripts into machine code. Interestingly enough, this suggests there may be a more succint way to write machine code for multidimensional arrays compared to vectorized tensors when programming in Python.
Finally, C++ vectorized beats all. It beats JIT-compiled Python code by (i.e. 100x faster) and conventional Python executions by orders of magnitude.
-Andrew
Posted on June 25, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.