Advent of Code 2020: Day 03 using numpy and vectorized calculations
Yuan Gao
Posted on December 5, 2020
In which we use more of that tasty numpy vectorized calculations to make quick work of the math.
Things mentioned in this post: numpy, vectorized calculations
The Challenge Part 1
Link to challenge on Advent of Code 2020 website
The challenge involves traversing a grid that is either empty, or contains a tree, using predefined movement rules through this grid. There is some information about how the grid of trees wraps as well.
Ultimately what this comes down to is figuring out what grid squares will be traversed, and counting the trees in it.
The naïve approach
The naive approach would be something along these lines:
- Read the data in
- Recreate a grid of the data to mimic the structure that it was in
- Iteratively move a virtual tobogganist through each of its position in the grid, generating a list of 2D coordinates
- Checking this list of 2D coordinates in the grid of data to see how many of them contain trees.
Or in other words: load the data, figure out what data to check, check the data.
I won't actually implement this one, but jump into the meaty stuff.
Optimizing the math
The thing is, we don't actually need to generate the 2D coordinates. If we loaded all the data into a 1D array rather than a 2D one, we would simply calculate the correct 1D offset in the data and check if the tree was in there. Think of it is taking our Christmas woollen sweater, and pulling on the string and unravelling the formerly 2D (or 3D I guess) sweater into a single 1D string. The interesting thing is that the design of that sweater is still on that 1D string. And if we knew the dimensions of all the parts of the sweater accurately enough, we could figure out exactly where each part of the pattern on the original sweater would end up on the 1D string. What a great way to spend Christmas!
The second thing is that we don't need to iteratively move the tobogganist. We know the simple geometric rules by which the movement works, so we could generate this list of (now 1D) coordinates directly using math!
So let's get started. First, let's import the data in as binary, using numpy:
data = np.fromfile("input.txt", dtype='uint8')
Output
array([46, 46, 35, 46, 35, 46, 46, 46, 35, 46, 35, 46, 35, 46, 35, 35, 46, ...
As we can see, this is the ascii representation of the trees. the number 46 is the "." character, while the number 35 is the "#" character. For convenience, we can get 35 by doing ord("#")
.
Next, let's generate all the vertical index numbers for each movement of the toboggan (323 is the height of my data)
index = np.arange(323)
Output
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, ...
Imagine these numbers are the vertical line that the tobogannist is on. When the index is 0, the tobogannist is in the top of the course. When the index is 1, the tobogannist has moved down one line
To calculate where the tobogannist is in the horizontal direction, since we know the tobogannist has moved by 3 across for every row, we can simply multiply our index by 3 to get the x position:
x_pos = index * 3
Output
array([ 0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, ...
However, the width of the data is only 31 wide, how can we be at a position of 36? The challenge rules say that the trees actually repeat ad infinitum to the right. While we could double up the data, we can achieve the same effect by just wrapping the tobogannist around to the other side. We can achieve this using the modulo (%) operator (31 is the width of my data)
x_pos = (index * 3) % 31
Output
array([ 0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 2, 5, 8, 11, 14, 17, ...
It can now be seen that the numbers have wrapped around, saving us from having to duplicate the actual data sideways.
Now, we actually now have two arrays, one for the x position in the grid, and one for the y position in the grid. Had our data been 2D, we could now simply look into each of those positions to see if there is a tree. Since our data is not 2D, we need to first map our x/y position into our 1D data space. This is simply
offsets = index * (width+2) + x_pos
Output
array([ 0, 36, 72, 108, 144, 180, 216, 252, 288, ...
Note the width+2
the extra +2 is to account for the newline characters that are in my data. In some cases you may find only +1 i sneeded.
These numbers are now the indexes of data in our data file that we want to check for trees. We can actually do that all in one go using numpy:
data[idx]
Output
array([46, 46, 46, 46, 46, 46, 35, 46, 35, 35, 35, 35, 46, 35, 35, 35, 46,
46, 46, 35, 35, 35, 46, 46, 46, 35, 35, 46, 46, 35, 35, 35, 35, 46,
46, 35, 35, 46, 46, 35, 46, 35, 35, 35, 35, 46, 46, 35, 46, 46, 35,
46, 46, 35, 35, 46, 46, 46, 35, 35, 46, 46, 46, 46, 35, 35, 35, 35,
35, 46, 35, 46, 46, 46, 35, 46, 46, 46, 35, 35, 35, 35, 35, 35, 46,
46, 46, 46, 46, 35, 35, 46, 35, 35, 35, 35, 35, 35, 35, 46, 35, 46,
46, 35, 35, 35, 35, 46, 35, 46, 35, 35, 46, 35, 35, 35, 46, 46, 35,
35, 46, 46, 35, 35, 35, 46, 35, 35, 35, 35, 35, 35, 35, 35, 46, 35,
46, 35, 35, 46, 35, 46, 46, 35, 46, 46, 46, 35, 46, 46, 46, 35, 46,
46, 46, 35, 35, 46, 46, 46, 46, 35, 35, 46, 35, 35, 46, 46, 46, 46,
46, 35, 35, 46, 35, 35, 46, 46, 35, 46, 35, 46, 46, 35, 35, 35, 35,
35, 35, 35, 46, 46, 46, 35, 35, 46, 35, 35, 35, 35, 46, 35, 35, 35,
35, 35, 46, 35, 46, 35, 46, 35, 35, 35, 35, 35, 35, 46, 46, 35, 35,
35, 35, 35, 35, 46, 46, 46, 46, 35, 35, 35, 35, 46, 46, 35, 46, 46,
35, 46, 35, 46, 46, 46, 46, 46, 46, 46, 35, 35, 35, 35, 46, 35, 46,
46, 46, 35, 46, 46, 35, 46, 35, 35, 35, 35, 35, 46, 46, 46, 35, 35,
46, 46, 46, 35, 35, 35, 35, 35, 46, 35, 35, 35, 35, 35, 46, 46, 46,
35, 46, 35, 46, 46, 35, 46, 35, 35, 46, 35, 46, 35, 35, 35, 35, 35,
35, 46, 46, 46, 46, 35, 35, 35, 46, 35, 35, 46, 46, 35, 46, 35, 35],
dtype=uint8)
That's it, this is every grid square that the tobogganist encounters. Every number "35" (ascii for the # symbol) is a tree. We just need to count these.
Again, numpy lets us check this with ease (remember, earlier tree was set to ord("#")
or 32)
data[idx] == tree
Output
array([False, False, False, False, False, False, True, False, True,
True, True, True, False, True, True, True, False, False,
False, True, True, True, False, False, False, True, True,
False, False, True, True, True, True, False, False, True,
True, False, False, True, False, True, True, True, True,
False, False, True, False, False, True, False, False, True,
True, False, False, False, True, True, False, False, False,
False, True, True, True, True, True, False, True, False,
False, False, True, False, False, False, True, True, True,
True, True, True, False, False, False, False, False, True,
True, False, True, True, True, True, True, True, True,
False, True, False, False, True, True, True, True, False,
True, False, True, True, False, True, True, True, False,
False, True, True, False, False, True, True, True, False,
True, True, True, True, True, True, True, True, False,
True, False, True, True, False, True, False, False, True,
False, False, False, True, False, False, False, True, False,
False, False, True, True, False, False, False, False, True,
True, False, True, True, False, False, False, False, False,
True, True, False, True, True, False, False, True, False,
True, False, False, True, True, True, True, True, True,
True, False, False, False, True, True, False, True, True,
True, True, False, True, True, True, True, True, False,
True, False, True, False, True, True, True, True, True,
True, False, False, True, True, True, True, True, True,
False, False, False, False, True, True, True, True, False,
False, True, False, False, True, False, True, False, False,
False, False, False, False, False, True, True, True, True,
False, True, False, False, False, True, False, False, True,
False, True, True, True, True, True, False, False, False,
True, True, False, False, False, True, True, True, True,
True, False, True, True, True, True, True, False, False,
False, True, False, True, False, False, True, False, True,
True, False, True, False, True, True, True, True, True,
True, False, False, False, False, True, True, True, False,
True, True, False, False, True, False, True, True])
The output of this is another array but this time containing either True or False depending on whether that grid had a tree or not. We can simply sum these up, again using numpy: np.sum(data[idx] == tree)
So, all together now, the whole code looks like:
import numpy as np
data = np.fromfile("input.txt", dtype='uint8')
across = 3
width = 31
height = 323
tree = ord("#")
index = np.arange(height)
x_pos = (index * across) % width
offsets = index * (width+2)
idx = x_pos + offsets
print("trees", np.sum(data[idx] == tree))
That was it! Again, if we were code golfing, we can compact this even more:
import numpy as np
print("trees", np.sum(np.fromfile("input.txt", dtype='uint8')[(np.arange(323)*3)%31+np.arange(323)*33] == 35))
It's unreadable, but it's compact.
The Challenge Part 2
In part 2, simply instead of having a fixed rule of "1 down, 3 across", there are several different cases with different rules. Because one of the rules goes "2 down", we have to generalize our code a bit more to account for this. Namely, we now need to use a separate value for index
and y_pos
:
index = np.arange(height/down).astype('uint32')
y_pos = index*down
Here, we divide the number of indices by how many down we go. For example if down=2, then we have half as many indexes to cover. The rest of the code is pretty much the same
import numpy as np
data = np.fromfile("input.txt", dtype='uint8')
rules = [[1, 1], [3, 1], [5, 1], [7, 1], [1, 2]]
tree = ord("#")
width = 31
height = 323
multiply = 1
for across, down in rules:
index = np.arange(height/down).astype('uint32')
y_pos = index * down
x_pos = (index * across) % width
idx = x_pos + y_pos * (width+2)
trees = np.sum(data[idx] == tree)
multiply *= trees
print("trees", multiply)
Pretty nifty eh?
Onward!
Posted on December 5, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.