Advent of Code F# — Day 14

ps. look out for all my oth­er solu­tions for Advent of Code chal­lenges here.

 

Day 14

See details of the chal­lenge here.

Today’s chal­lenge is very sim­i­lar to that of Day 5, but the require­ments for a valid hash val­ue is dif­fer­ent this time.

As before, we’ll start by defin­ing a hash func­tion that will accept a string and return its MD5 hash val­ue rep­re­sent­ed as hexdec­i­mal val­ues. After com­par­ing the per­for­mance of using String.format + String.Concat (what I used in Day 5) and BitConverter.ToString it was a no brain­er to go with the lat­ter as it proved to be over 10 times faster even with the addi­tion­al String.Replace and String.ToLower calls in this case.

Since part 2 requires us to add key stretch­ing to the hash gen­er­a­tion process, our solve func­tion will take a hash func­tion as argu­ment.

A brute force approach would be to use Seq.windowed 1001 to pro­duce slid­ing win­dows of 1001 MD5 hash­es so we can look for pat­terns where:

  1. the first hash has a sequence of 3, eg, “777”
  2. one of the next 1000 hash val­ues has a sequence of 5 of that char­ac­ter, ie, “77777”

This naive approach took 10s to com­pute part 1, and over 9 mins for part 2. Most of the time were spent look­ing for sequences of 5, so an opti­miza­tion here would be to find all sequences of 5 and stor­ing their indices in a cache. This way, when­ev­er you find a hash with a sequence of 3 — say “777” at index 18 — you can look up the cache for indices asso­ci­at­ed with “7” and see if there’s any index in the range of 19 — 1018.

This sim­ple opti­miza­tion took the run time for part 1 to around 300ms.

 

Part 2

Of course, in order to make this process even more secure, you’ve also
imple­ment­ed key stretch­ing.

Key stretch­ing forces attack­ers to spend more time gen­er­at­ing hash­es.
Unfor­tu­nate­ly, it forces every­one else to spend more time, too.

To imple­ment key stretch­ing, when­ev­er you gen­er­ate a hash, before you use it,
you first find the MD5 hash of that hash, then the MD5 hash of that hash, and so
on, a total of 2016 addi­tion­al hash­ings. Always use low­er­case hexa­dec­i­mal
rep­re­sen­ta­tions of hash­es.

For exam­ple, to find the stretched hash for index 0 and salt abc:

- Find the MD5 hash of abc0: 577571be4de9dcce85a041ba0410f29f.
- Then, find the MD5 hash of that hash: eec80a0c92dc8a0777c619d9bb51e910.
- Then, find the MD5 hash of that hash: 16062ce768787384c81fe17a7a60c7e3.
- …repeat many times…
- Then, find the MD5 hash of that hash: a107ff634856bb300138cac6568c0f24.

So, the stretched hash for index 0 in this sit­u­a­tion is a107ff.… In the end,
you find the orig­i­nal hash (one use of MD5), then find the hash-of-the-pre­vi­ous-hash
2016 times, for a total of 2017 uses of MD5.

The rest of the process remains the same, but now the keys are entire­ly dif­fer­ent.
Again for salt abc:

- The first triple (222, at index 5) has no match­ing 22222 in the next thou­sand
hash­es.
- The sec­ond triple (eee, at index 10) hash a match­ing eeeee at index 89, and so
it is the first key.
- Even­tu­al­ly, index 22551 pro­duces the 64th key (triple fff with match­ing fffff
at index 22859.

Giv­en the actu­al salt in your puz­zle input and using 2016 extra MD5 calls of key
stretch­ing, what index now pro­duces your 64th one-time pad key?

This imple­men­ta­tion took around 2 mins to run, and most of the time are spent in gen­er­at­ing the hash val­ues (since they are 2016 times more expen­sive than pre­vi­ous­ly).

I was able to shave 10s off the run­time by hand rolling the bytes-to-hexdec­i­mal part of the hash func­tion, but it’s still far from good enough real­ly. So, please let me know in the com­ments any ideas you have on what we could try next.

 

Links