Project Euler – Problem 14 Solution

Problem

The following iterative sequence is defined for the set of positive integers:

n –> n/2 (n is even)

n –> 3n + 1 (n is odd)

Using the rule above and starting with 13, we generate the following sequence:

13 –> 40 –> 20 –> 10 –> 5 –> 16 –> 8 –> 4 –> 2 –> 1

It can be seen that this sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1.

Which starting number, under one million, produces the longest chain?

NOTE: Once the chain starts the terms are allowed to go above one million.

Solution

let nextNumber n = if n%2L = 0L then n/2L else 3L*n+1L

let findSequenceLength n =
    let mutable count = 1L
    let mutable current = n

    while current > 1L do
        current <- nextNumber current
        count <- count + 1L
    count

let longestSeq = &#91;1L..999999L&#93; |> Seq.maxBy findSequenceLength

Having played around with several other approaches, I finally settled down on this solution though I had to use mutable variables in a while loop which is somewhat dissatisfying but it performs quite a bit better than the more functional approach.

As you probably know already, in F# everything is immutable by default and to make a mutable variable you have to mark the variable with the mutable keyword. To assign a value to a mutable variable you need to use the <- operator.

Anyways, the above code is fairly simple, with the findSequenceLength function doing the bulk of the work and returns the number of elements in a sequence using a while loop. It can be equally be written using a more functional (but slower) approach but building the sequence with Seq.unfold and counting the length of the sequence

// the sequence returned by Seq.unfold does not include 1, so for completeness sake, add 1 to the length
let findSequenceLength n = Seq.length(Seq.unfold (fun x -> if x = 1L then None else Some(x, nextNumber x)) n) + 1
  • Omu

    let p14 =
    let collatz n =
    let rec loop n i =
    match n with
    |n when n = 1I -> i
    |n when n%2I = 0I -> loop (n/2I) (i+1I)
    |n -> loop (3I*n+1I) (i+1I)
    loop n 1I
    [1I..1000000I]|> Seq.maxBy collatz

  • Anonymous

    i think max.By is a great funciton

  • Bryan Slatner

    I gave up my functional approach for yours that uses mutable.

    I’m still vexed that it takes almost 9 seconds to run :)

    So I tried 3 different approaches:

    1) Basically the same as you have above
    2) Caching. Once I compute the length of one sequence, why compute it again? It’s bound to show up again.
    3) Parallel. I have 8 cores, let’s use them all!

    Approach 1 takes about 9 seconds.

    Approach 2 is actually slower at 10.5 seconds. Grumble grumble.

    Approach 3 reduces the time by 75%. Which is about what you’d expect.

    https://gist.github.com/bslatner/c77e18135552ce4d8969

    Incidentally, when I was using my “pure functional” length computation algorithm (similar to Omu’s comment), even with Approach 3 I still had time to go inside, make a sandwich, eat it, and come back before it spat out an answer :)

  • Bryan Slatner

    Actually, I realized I made an error. My map-based cache strategy totally failed. I forgot that maps are immutable. When I replaced the map with a Dictionary, the caching solution is fastest at about 1.5 seconds.

  • Yan Cui

    I think it’s still ‘functional’ so long you’re encapsulating all the mutability – Eg the use of mutable state inside your ‘collatzLength’ function – so that you still end up with a nice immutable API such as collazLength : int -> int.

    Another common technique in FP is called memoization, and you can do something like this to apply caching without explicitly exposing the cache and can stay functional in your approach.
    https://gist.github.com/theburningmonk/1d90b466a530698a3127
    this version actually ran faster yours because the result of every step in the sequence is cached whereas your version caches the result at the end of the sequence so you’ll still have computed some numbers multiple times (although that’s easy to address).

    Regarding perf between Map and Dictionary, I did some benchmarks before http://theburningmonk.com/benchmarks/ and the difference is HUGE when you have to do lots of updates.

  • Bryan Slatner

    If I’m reading your benchmark right, when would you ever use a map?

  • Yan Cui

    I would argue that most of the time the difference (despite being huge) is irrelevant until you’ve identified the performance critical 3%.

    e.g. if your access pattern is such that updates are really infrequent but looks up happen a lot.

    It’s important to take the context of the work into account. If you’re working on a line-of-business app (where faster code that does the wrong thing probably means losing money faster) then correctness and scalability might be more important than raw performance.
    Whereas when you’re working on coding challenges you’re generally trying to find the fastest solution to a problem that is probably designed to stretch your data structures’ performance to begin with.

    You might also find changes easier to reason with immutable maps, and in cases where you need to take an existing map and generate a number of variants (like in some of the Advent of Code challenges) the clone-and-update semantics is exactly what you’re after.

    You can have the same argument about most data structures – e.g. in pure performance terms nothing is as fast to iterate as an array of structs in .Net, the hardware architecture is optimized for iterating over an array of these guys very quickly. But arrays are mutable, fixed in size, and they often don’t match how we’d like to work with it to solve our problem domain effectively.

    Whereas F# lists have terrible cache locality since they’re linked lists, and are much slower to iterate over, but they’re really fast to append to the front using cons (::), whereas appending an array = allocating a bigger array and copy over all the old values.

    So, what data structure to use always depend on what you’re doing and what’s important in your current context.

    Phil Trelford has a good talk on some of the pros and cons of a number of different data structures from a performance point-of-view here https://www.youtube.com/watch?v=hx2vOwbB-X0