Advent of Code 2021: Day 03 with Python, and more numpy

Loading the data

The data is a big list of binary values, but we actually need to load the data as a big matrix of 1 or 0 to be able to efficiently slice the matrix while doing calculations.

To do this, we can have numpy fix the data for us. A regular import of lines using "U" format code for strings, imports the data as an array of strings. We also want to know how many bits are in each value, which we get from the first entry.

lines = np.loadtxt("day_3.txt", "U")
bits = int(len(lines[0]))

Resulting in:

array(['011010010110', '101110100110', '011011011110', '100011010001',
       '011011100111', '011000100101', '101101011001', '010001001001',
       '100010110111', '001110110101', '100111011001',

Then, we can make a view of the data, which will give us a single large array containing one character per entry, and we also cast it to int type:

lines.view('U1').astype(int)

Resulting in:

array([0, 1, 1, ..., 1, 1, 0])

Finally, we need to put this back into the right shape that it was in before, so we reshape it. The final import code is:

lines = np.loadtxt("day_3.txt", "U")
bits = int(len(lines[0]))
data = lines.view('U1').astype(int).reshape(lines.shape[0], bits)

Binary array to int conversion

For both parts of the question, we need to convert an array of bits, like [0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0] into a number like 1602. There are several ways of doing it, the way I've chosen is to multiply the array by another array that is made up of powers of 2: [..., 32, 16, 8, 4, 2, 1]. So I will pre-calculate pow2 early on in the code:

pow2 = 1<<np.arange(bits)[::-1]

Other methods involve converting the array into a string, and using python's built-in int() function to convert it. int has base-conversion abilities:

bin2dec = lambda x: int("".join(map(str,x.astype(int))),2)

Finally, numpy actually has a function to pack bits back into bytes, however it's a little awkward to use as you need to pad it out to the correct number of values for your eventual type.

bin2dec = lambda x: np.packbits(np.pad(x,[32-bits, 0])).view('>u4')

I'll be using the pow2 method, precalculating it, and doing pow2.dot(x) any time I want to convert x to a number.

Part 1

Part 1 requires counting the number of 0 or 1 in each column and comparing them to see which one is more. Since numpy can directly tell us all the elements that have a given value, and it can also do this per axis.

So the number of 1 in every bit position is:

ones = (data == 1).sum(axis=0)

The result is a count for each column:

array([483, 487, 485, 467, 513, 509, 492, 532, 508, 497, 490, 506])

Doing the same for zeros, and then comparing the two produces an array of true/false, and this can be turned back into a number my doing a dot product against pow2 which was precalculated earlier.

ones = (data == 1).sum(axis=0)
zeros = (data == 0).sum(axis=0)
result = pow2.dot(ones > zeros) * pow2.dot(ones < zeros)

Part 2

Part 2 is...long. I don't even know if I can paraphrase the problem exactly. The idea is to take each column of bits successively, and test to see if each row has a bit value that is the same as the majority of other bits, and discard the value if not. Repeating for the minority case.

To achieve this, we loop through the columns using a for loop:

for idx in range(bits):
  column = data[:, idx]

And then for this columm of pixels, we figure out whether there are more 1 than 0 by taking the sum of the column, and seeing if that number is greater than half the size of the column:

column.sum()*2 >= column.size

This will return True if the 1s are more common than the 0s. Since we need to then collect all the entries that match this value, we can do:

column == (column.sum()*2 >= column.size)

This will return an array of True or False depending on whether the value in column is the same as the majority value or not.

Then, we can select just these entries out of our data matrix, and discard the rest, by using this array as the subscript when accessing the data matrix:

data[column == (column.sum()*2 >= column.size)]

We have to apply this algorithm iteratively over the columns of the array, and stop when data gets to 1 row

for idx in range(bits):
  column = data[:, idx]
  data = data[column == (column.sum()*2 >= column.size)]
  if len(data) == 1:
    break

However, we have to repeat this for the minority case as well. So the final code looks like:

a = b = data
for idx in range(bits):
    acol = a[:, idx]
    bcol = b[:, idx]
    a = a[acol == (acol.sum()*2 >= acol.size)] if len(a) > 1 else a
    b = b[bcol == (bcol.sum()*2 < bcol.size)] if len(b) > 1 else b
result = pow2.dot(a[0] == 1) * pow2.dot(b[0] == 1)

Full Code

import numpy as np

lines = np.loadtxt("day_3.txt", "U")
bits = int(len(lines[0]))
data = lines.view('U1').astype(int).reshape(lines.shape[0], bits)
pow2 = 1 << np.arange(bits)[::-1]

ones = (data == 1).sum(axis=0)
zeros = (data == 0).sum(axis=0)
result = pow2.dot(ones > zeros) * pow2.dot(ones < zeros)
print("Part 1 result:", result)

a = b = data
for idx in range(bits):
    acol = a[:, idx]
    bcol = b[:, idx]
    a = a[acol == (acol.sum()*2 >= acol.size)] if len(a) > 1 else a
    b = b[bcol == (bcol.sum()*2 < bcol.size)] if len(b) > 1 else b
result = pow2.dot(a[0]) * pow2.dot(b[0])
print("Part 2 result:", result)