Advent of Code - Day 3

yorek

Davide Mauri

Posted on December 4, 2022

Advent of Code - Day 3

Today's Advent of Code challenge is really interesting. Somehow easy, but with a couple of interesting discussion points.

Day 3: Rucksack Reorganization is about helping elves to organize and prioritize their supplies.

I imported the input data as usual, by coping it from the website to my Azure Data Studio query and then using STRING_SPLIT to import each single line - that represents the rucksack content - in its own row:

declare @input nvarchar(max) = 'QLFdFCdlLcVqdvFLnFLSSShZwptfHHhfZZZpSwfmHp
rTJRjjbJTgzDJjdsRsfwtfNwtfmZpZNhmmzt
...
jGrGqjJfqccrfqGcGplrJpFvzggqmCtMzmsMnvMvvCgm';

drop table if exists dbo.ch03_input;
create table ch03_input 
(
    id int identity not null primary key,
    items varchar(100) collate Latin1_General_BIN2
);

insert into 
    dbo.ch03_input (items)
select 
    trim(replace([value], char(13), ''))
from
    string_split(@input, char(10))
go
Enter fullscreen mode Exit fullscreen mode

The items in the rucksacks are identified by a letter and the identifiers are case sensitive. For this reason, I used the Latin1_General_BIN2 collation that will make sure I adhere to the requirements and get the best performance possible with strings, as explained in Day 2 challenge solution post.

Part 1

In the first part of the challenge, the rucksack list is split in two, and you must find the item type - represented by its letter - that is in both lists. It is a string comparison problem: which letter of the first list is also in the second list?

The first step is to split the list into two list of the same size:

select 
    *,
    len(items) as itemcount,
    left(items, len(items)/2) as comp1,
    right(items, len(items)/2) as comp2
into    
    #step1
from 
    ch03_input;
Enter fullscreen mode Exit fullscreen mode

I also calculate the length of string as it will come useful in the next step, where I'll split the rucksack string into its letters and store each letter in its own row:

select top(100) row_number() over (order by a.object_id) as n into #n from sys.columns a cross join sys.columns b

select 
    *,
    substring(items, n, 1) as item 
into
    #step2 
from 
    #step1 s  
cross join
    #n n
where 
    n.n <= s.itemcount
Enter fullscreen mode Exit fullscreen mode

Splitting a string in its letters is easy if you have a table with numbers, which is what I'm creating as first thing in the query above. Then I use that numbers table to generate a row for each letter in the string, via the CROSS JOIN and for each row generate extract the Nth letter of the string. The WHERE clause uses itemcount to make sure that I generate exactly one row for each letter in each string, and no more than that.

The string split in its composing letters

Then I need to find which item is in both compartments. This means checking if a letter is in a string and that can be done using CHARINDEX:

select distinct
    id, items, comp1, comp2, item,
    charindex(item, comp1, 1) as p1, 
    charindex(item, comp2, 1) as p2
into 
    #step3
from 
    #step2 
where 
    charindex(item, comp1, 1) != 0 
and 
    charindex(item, comp2, 1) != 0
order by id
Enter fullscreen mode Exit fullscreen mode

The query needs a DISTINCT as there can be more than one item of the same type in the string, I need just only one per type. The fact that I need to use a DISTINCT rings some bells (or bring some smell): I'm fairly sure I can refactor my code to do this operation earlier and avoid checking for a letter that has been found already. I'll do this later if I have enough free time. For now, I want to see if my solution works; after that I can optimize it.

Now that the items present in both compartments have been found, I have to assign to each item type the priority value as described in the challenge. Priorities values are based on alphabetical order, so I can use ASCII to get the letter value and transform it to the corresponding priority value. Priority values are ordered differently than the ASCII order, so a CASE statement is needed to apply the right transformations:

select
    item,
    case 
        when item like '[a-z]' then ASCII(item) - ASCII('a') + 1
        when item like '[A-Z]' then ASCII(item) - ASCII('A') + 27
    end as priority,
    id,
    comp1,
    comp2
into
    #priorities
from
    #step3
order by
    item
Enter fullscreen mode Exit fullscreen mode

And now just summing all the priorities will give the answer:

select sum(priority) from #priorities
Enter fullscreen mode Exit fullscreen mode

Answer is correct, so let's move to the next part of the challenge.

Part 2

Elves gather in groups of three, and the goal is to find which item type is carried by everyone in the group.

The first step is to create a way to easily group the elves together. By using the existing id columns and the modulo operator I can find when a new group begins. When the result of the modulo operation is equal to one:

Using modulo to identify groups

I just need to know when a group starts, so I can set everything not equal to 1 to 0:

First element in group is set to 1, others to 0

Now I can then use a simple and fast running total to generate a group_id that will allow quick identification of all items in a single group. Amazing, isn't it?

All elves in the same group have the same group id

Funny enough my SQL Guru friend Itzik mentioned this technique with the running total when we met yesterday evening. Funny that I would have needed it right the next day. Thanks Itzik!

The final query is the following:

select 
    *,
    len(items) as itemcount,
    sum(case when (id % 3) != 1 then 0 else 1 end) over (order by id) as group_id
into    
    #step1
from 
    ch03_input;
Enter fullscreen mode Exit fullscreen mode

easy, fast, and elegant!

Once that a way to identify each group is there, the challenge is almost solved. It is just a matter of splitting the strings into their letters, as I did for part one too.

select 
    *,
    substring(items, n, 1) as item 
into
    #step2 
from 
    #step1 s  
cross join
    #n n
where 
    n.n <= s.itemcount
Enter fullscreen mode Exit fullscreen mode

Now I have everything I need to see which letter is present in all three rucksacks. As simple GROUP BY filtering only those letters that appears exactly three times by using the HAVING clause will give the answer:

with cte as
(    
    select distinct id, group_id, item from #step2 
)
select
    group_id,
    item
into
    #step3
from
    cte
group by 
    group_id, item
having
    count(*) = 3
order by
    group_id
Enter fullscreen mode Exit fullscreen mode

The tricky part here is the DISTINCT operator in the Common Table Expression. That DISTINCT makes sure I can differentiate between an item appearing three times in the same rucksack vs an item appearing one item in all three rucksacks. We're interested only in the latter, and not in the first.

Now I just have to apply the same logic to get the priority value used before, calculate the overall total and I'll get the solution to Part 2. Challenge done.

Try it yourself

The full solution is available here: yorek/aoc-2022

The technique to deal with islands of data is useful in so many practical uses case that I really recommend you to deep dive into it. Take advantage of the free book chapter on the subject available here: Gaps and islands

Alternative solution to Part 2

An alternative solution could have been a simple JOIN between the three rucksacks in the same group:

select distinct
    a.id as group_id,
    a.item as item
into
    #step3
from 
    #step2 a 
inner join
    #step2 b on a.id + 1 = b.id and a.item = b.item
inner join
    #step2 c on b.id + 1 = c.id and b.item = c.item
where
    a.id % 3 = 1
order by a.id
go
Enter fullscreen mode Exit fullscreen mode

That would work just fine, but it will only work for a group of exactly three elves. While perfectly fitting the requirement, I find the chosen solution gives more flexibility and is way more elegant and future proof. Or agile if you wish. It requires a bit of lateral thinking, which is always a good ability to exercise, so great to have a chance to use it. Given that the resulting query touches the table only once instead of, I also suspect it will also be faster. It is worth digging into it a bit more if you have time.

Have fun!

💖 💪 🙅 🚩
yorek
Davide Mauri

Posted on December 4, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Advent of Code - Day 3
sql Advent of Code - Day 3

December 4, 2022

Advent of Code - Day 10
sql Advent of Code - Day 10

December 11, 2022

Advent of Code - Day 2
sql Advent of Code - Day 2

December 2, 2022

Advent of Code - Day 1
sql Advent of Code - Day 1

December 1, 2022