Deep Dive on Project Euler's Problem 1

ryanmcg86
Mar 4, 2022
13 min read

Updated: Dec 20, 2022

Sometimes we look at the easiest problem on a practice forum like Project Euler, quickly solve it, and move on to more challenging problems. But sometimes, we get a little obsessed with the problem because there are way more layers to it than initially meets the eye, and decide to write a blog detailing the whole thing. This is one of those latter times.

The problem given is worded as follows:

If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6, and 9. The sum of these multiples is 23. Find the sum of all the multiples of 3 or 5 below 1000.

Using python to answer this, the quickest simple answer would likely look something like this:

def ThreeFive(lim):
    multiples = set()
    for i in range(1, lim):
        if i % 3 == 0 or i % 5 == 0:
            multiples.add(i)
    return sum(multiples)
    
ThreeFive(1000)
#returns 233168

This solution takes into account the trick of this problem, which is the fact that numbers which are divisible by both 3 and 5, like 15 or 30, should only be summed once. It does so by utilizing a set rather than a list. With that, most people would have been satisfied and moved on to Problem 2.

But I am not most people. Immediately, I realized that if I wanted an optimized solution, this solution would not be satisfactory as the limit grew unreasonably large. In algorithmic speak, the runtime on this is O(N), where N is the given limit.

My first thought was that there must be a way to count multiples, in a similar way that we can count from 1 to N with the simple equation n(n + 1) / 2. After a bit of digging, I found that the equation does already exist. Written in Python, it looks like this:

#Sum of Multiples function
def SoM(lim, mul):
    n = (lim - 1) // mul
    return (n * (n + 1) * mul) // 2

With this in mind, a potential solution might look like this:

def ThreeFive(lim):
    return SoM(lim, 3) + SoM(lim, 5)
#returns 266333

Hey! That's not the right answer. Oh right, this doesn't account for the issue we discussed earlier where we should only be counting the numbers that are a multiple both of 3 and 5 once. With that in mind, we should be subtracting the sum of multiples of 15, as that's the lowest common denominator of 3 and 5, and these are the numbers that have been counted too many times. So let's try this instead:

def ThreeFive(lim):
    return SoM(lim, 3) + SoM(lim, 5) - SoM(lim, 15)
#returns 233168

Okay, now our problem runs in constant time, or O(1). No matter how large the limit is, this function only has to perform the one calculation to output an answer, we can't possibly get any better than this.

But then I got to thinking some more about this problem, and I wanted to see if I can generalize this solution a bit more. Here's a first pass:

def solution(lim, a, b):
    return SoM(lim, a) + SoM(lim, b) - SoM(lim, a * b)

Alright, this let's us find the sum of all multiples for any 2 given factors a and b, but what if there were 3 factors instead of 2? Before we write this thing, let's figure out what we have to actually do here. Let's use the example where our factors are 3, 5, and 2. First we need to sum up the sum of multiples for 2, 3, and 5 where the limit is 1000. Next we would need to subtract the sum of multiples of 2 * 3, but then we also need to subtract the sums of multiples of the other combinations of factors, 2 * 5 and 3 * 5, as those would also have been counted 1 time too many. Let's try that:

def solution(lim, a, b, c):
    return SoM(lim, a) + SoM(lim, b) + SoM(lim, c) - SoM(lim, a * b) - SoM(lim, a * c) - SoM(lim, b * c)

solution(1000, 2, 3, 5)
#returns 350002

We actually don't yet know if that's correct. Let's try this with a smaller limit so we can manually count the sum ourselves. Let our limit be 50. Here is the list of all the natural numbers below 50 that are multiples of 2, 3 or 5: 2, 3, 4, 5, 6, 8, 9, 10, 12, 14, 15, 16, 18, 20, 21, 22, 24, 25, 26, 27, 28, 30, 32, 33, 34, 35, 36, 38, 39, 40, 42, 44, 45, 46, 48. The sum of that list is 857. Let's try running our solution function with the new limit:

solution(50, 2, 3, 5)
#returns 827

Huh? It seems like we're missing some multiples. Reviewing the list in the paragraph above confirms that we have every multiple, so lets check our logic. We're taking the sum of multiples for 2, 3, and 5, and then subtracting the sum of multiples for 6, 10, and 15. Let's take a deeper look at what that's actually doing:

multiples = []

#Sum of Multiples for 2 up to 50 = 600
for i in range(1, 50):
    if i % 2 == 0:
        multiples.append(i)
 
'''The following values just got added to the multiples list:
2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48

The current value of multiples is [2, 4, 6, 8, 10, 12, 14, 16, 18, 20, #22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48]'''

#Sum of Multiples for 3 up to 50 = 408
for i in range(1, 50):
    if i % 3 == 0:
        multiples.append(i)
        
'''The following values just got added to the multiples list:
3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48

The current value of multiples (after being sorted) is [2, 3, 4, 6, 6, 8, 9, 10, 12, 12, 14, 15, 16, 18, 18, 20, 21, 22, 24, 24, 26, 27, 28, 30, 30, 32, 33, 34, 36, 36, 38, 39, 40, 42, 42, 44, 45, 46, 48, 48]'''

#Sum of Multiples for 5 up to 50 = 225
for i in range(1, 50):
    if i % 5 == 0:
        multiples.append(i)
        
'''The following values just got added to the multiples list:
5, 10, 15, 20, 25, 30, 35, 40, 45

The current value of multiples (after being sorted) is [2, 3, 4, 5, 6, 6, 8, 9, 10, 10, 12, 12, 14, 15, 15, 16, 18, 18, 20, 20, 21, 22, 24, 24, 25, 26, 27, 28, 30, 30, 30, 32, 33, 34, 35, 36, 36, 38, 39, 40, 40, 42, 42, 44, 45, 45, 46, 48, 48]'''

#Sum of Multiples for 6 = 216
for i in range(1, 50):
    if i % 6 == 0:
        multiples.remove(i)
        
'''The following values just got removed from the multiples list:
6, 12, 18, 24, 30, 36, 42, 48

The current value of multiples (after being sorted) is [2, 3, 4, 5, 6, 8, 9, 10, 10, 12, 14, 15, 15, 16, 18, 20, 20, 21, 22, 24, 25, 26, 27, 28, 30, 30, 32, 33, 34, 35, 36, 38, 39, 40, 40, 42, 44, 45, 45, 46, 48]'''

#Sum of Multiples for 10 = 100
for i in range(1, 50):
    if i % 10 == 0:
        multiples.remove(i)
        
'''The following values just got removed from the multiples list:
10, 20, 30, 40

The current value of multiples (after being sorted) is [2, 3, 4, 5, 6, 8, 9, 10, 12, 14, 15, 15, 16, 18, 20, 21, 22, 24, 25, 26, 27, 28, 30, 32, 33, 34, 35, 36, 38, 39, 40, 42, 44, 45, 45, 46, 48]'''

#Sum of Multiples for 15 = 90
for i in range(1, 50):
    if i % 15 == 0:
        multiples.remove(i)
        
'''The following values just got removed from the multiples list:
15, 30, 45

The current value of multiples (after being sorted) is [2, 3, 4, 5, 6, 8, 9, 10, 12, 14, 15, 16, 18, 20, 21, 22, 24, 25, 26, 27, 28, 32, 33, 34, 35, 36, 38, 39, 40, 42, 44, 45, 46, 48]'''

#The current sum of multiples is 827.

Do you see it? If you compare the final version of the multiples list to the list I wrote out above with all of the multiples of 2, 3, and 5 under 50, they're off by 1 number, and their sums are off by 30. But we subtracted it the right amount of times, so what gives? Well, what do you get when you multiply all 3 of our factors, 2, 3, and 5, together? BOOM, it's 30. So it seems we have to re-add the sum of multiples under the limit of a number all 3 factors, in this case 30. So, if we have three factors, our solution would look like this:

def solution(lim, a, b, c):
    return SoM(lim, a) + SoM(lim, b) + SoM(lim, c) - SoM(lim, a * b) - SoM(lim, a * c) - SoM(lim, b * c) + SoM(lim, a * b * c)

solution(50, 2, 3, 5)
#returns 857
solution(1000, 2, 3, 5)
#returns 366832

Phew! That is a good answer for if we have 3 factors, but this is an exercise in generalization, so we need to generalize what is happening here so we can take in a list of factors of any size and get the proper solution.

The first step is to actually understand what is happening here. It turns out this is a well established idea in set theory and combinatorics. Specifically, we're dealing with something called the inclusion-exclusion principle, which is a counting technique which generalizes the the method of obtaining the number of elements in the union of X finite sets. In set theory, with two sets, this is symbolically expressed as:

Or, with three sets, it can be expressed as:

As the number of sets increases, we need to add all the combinations of odd amounts of elements, and we need to subtract all the combinations of even amounts of elements. By doing this we can generalize this function. Here's what a generalized inclusion-exclusion function that correctly counts the sums up to a given limit should look like:

from math import prod
from itertools import combinations as co

def inex(lim, mults):
    ans = 0
    for i in range(len(mults)):
        for j in co(mults, i + 1):
            ans += (-1)**i * SoM(lim, prod(list(j)))
    return ans

This function above is doing a lot at once, so let's break it down a bit. First we import prod from the math library, which allows us to get the product of all of the elements in a list. Next we import combinations from the itertools library, which allows us to get every combination of elements in a list. The function itself takes in the limit, and a list of multiples. It first needs to loop through i from 0 to the length of the list, so if there are 3 elements in the mults list, as an example, the i values the loop will go through are 0, 1, and 2. For each i value we go through, we need to add or subtract, depending on whether i is even or odd as discussed earlier, every combination of products of elements of the mults list. I do this with the nifty (-1)**i, which will be negative when i is odd and positive when i is even. That initially seems backwards from what we need, but the amount of elements in the combination is increased by one in the nested for loop. Let's walk through an example for some clarity here:

from math import prod
from itertools import combinations as co

lim = 1000
mults = [2, 3, 5]

def inex(lim, mults):
    ans = 0
    for i in range(len(mults)):
        for j in co(mults, i + 1):
            lis = list(j)
            value = (-1)**i * SoM(lim, prod(lis))
            print(i, lis, value)
            ans += value
    return ans
    
inex(lim, mults)
'''
0 [2] 249500
0 [3] 166833
0 [5] 99500
1 [2, 3] -83166
1 [2, 5] -49500
1 [3, 5] -33165
2 [2, 3, 5] 16830
366832
'''

I added a print line to show what's happening here, but basically based on the i value, the size of the new list, j, is being dictated, and then every possibility at that size is being generated. Once it's generated, the product of all the elements in list j is being calculated, and then we're adding or subtracting the sum of multiples of that product based on whether the current i value is even or odd, as per the inclusion exclusion principle.

For a long while I was satisfied with this answer. The sum of multiples function runs in O(1), and the inclusion exclusion function runs in O(N^2) time. That runtime isn't ideal, but the inclusion exclusion principle requires it, and ideally the user keeps the number of factors low. But there was a part of me that kept lingering on the list size problem since it was forced to run at O(N^2). I started thinking about different examples to see if there was any way to simplify a list of given multiples, and I realized that if all the multiples in a given list weren't co-prime, not only could we simplify the list, we had to simplify the list. Let's look at an example to show what I mean.

If the given list has the values 3, 6, and 8, and the limit is, say, 30, then the list of factors should be the following: 3, 6, 8, 9, 12, 15, 16, 18, 21, 24, 27, and their sum is 159. But look what happens when we run the code above (the version that prints out comments) with this new example:

lim = 30
mults = [3, 6, 8]

inex(lim, mults)
'''
0 [3] 135
0 [6] 60
0 [8] 48
1 [3, 6] -18
1 [3, 8] -24
1 [6, 8] 0
2 [3, 6, 8] 0
201
'''

Well that's obviously off. So what happened? The result is larger than it should be which tells us that in some way, we over counted. But how? We know the inclusion exclusion principle is correct, so it has to do with our list. Let's look at the factors:

#3's factors: 3, 6, 9, 12, 15, 18, 21, 24, 27
#6's factors: 6, 12, 18, 24
#8's factors: 8, 16, 24

By looking at these lists, we can see that 6 doesn't have any unique factors from 3. So it's redundant. For posterity's sake, lets try the exercise above, but instead of with the multiples being 3, 6, and 8, let's take away the 6:

lim = 30
mults = [3, 8]

inex(lim, mults)
#0 [3] 135
#0 [8] 48
#1 [3, 8] -24
159

Well look at that, that's exactly the answer we were hoping for. So, in order to generalize an answer from any given list, not simply those where all its members are co-prime to each other, we need a function that removes any redundant factors. My first draft at this looked like this:

def cleanMults(mults):
    a = mults
    for i in range(len(mults) - 1):
        for j in range(i + 1, len(mults)):
            if mults[j] % mults[i] == 0:
                a.remove(mults[j])
    return a

This did the job, but like the inclusion exclusion function, it also runs in O(N^2), where N is the number of multiples. Initially I thought it couldn't be improved upon since every element in the list needs to be compared to every other element, but then I realized that because of our specific function, removing numbers that are multiples of other elements, if we keep track of the greatest common denominator (GCD for short), we only have to run through the list forward and backward once a piece, for a run time of O(M * N), where N is the size of the list, and M is the number of divisors found within the list. Here's the updated function I came up with:

from math import gcd

def cleanMults(mults):
    divisors = []
    for d in (1, -1):
        seen = 1
        #the list forwards, then the list backwards...
        for a in mults[::d]:
            if seen % a != 0:
                seen = a * seen // gcd(a, seen)
            else:
                divisors.append(a)
    return [a for a in mults if all(d == a or a % d != 0 for d in divisors)]

This function runs through the list twice, once from beginning to end, and then from end to beginning. For each run through the list, it tracks a variable named 'seen' which acts as the GCD among all of the numbers in the mults list seen so far. It is then used to decide what the divisors are. Once the list of divisors is decided, the last line decides which of the numbers in the mult list gets returned, based on whether each number in the mults list is either equal to d (the given divisor) or the mult, when modded by d, does not equal 0. If at least one of those two statements is true for every divisor in the divisor list, then the mult in question is included.

What's going on here is that by checking if the number from mult, when modded by a divisor, equals 0 or not, we're seeing if it is a multiple of another number from the list. Since we went through the work of identifying the divisors, we don't need to check every number against every other number, only against this smaller list of divisors. The second clause, which checks if they're equal, is really just dealing with the edge case, since we wouldn't want to eliminate a number only because it exists in the divisors list. The return line is just an elegant way of expressing a nested for loop.

Alright! Now we have a sum of multiples function that runs in O(1), no matter how large the limit is, an inclusion exclusion function which runs in O(N^2) based on the number of numbers in the given list of numbers, and a function which cleans up that list of multiples for both optimization and correctness that runs in O(N * M), where N is the length of the given list, and M is the number of divisors found in that list. When put all together, it's still ultimately O(N^2), but that N is the size of the list of factors, rather than the limit given, and N is reduced to its absolute minimum at O(N * M) speed before running. Not only is it a vast, vast improvement on the run time of the original problem, but it's been generalized to deal with larger lists, as well as lists that include divisors of other numbers in the list, should that accidentally occur.

Here's my final code for the problem!!

'''If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.Find the sum of all the multiples of 3 or 5 below 1000.
Link: https://projecteuler.net/problem=1'''

#Imports
import time 
from math import prod, gcd
from itertools import combinations as co

#Build a Sum of Multiples function
def SoM(lim, mul):
    n = (lim - 1) //mul
    return (n * (n+1) * mul) // 2
    
#Build an inclusion-exclusion function
def inex(lim, mults):
    ans = 0
    for i in range(len(mults)):
        for j in co(mults, i + 1):
            ans += (-1)**i * SoM(lim, prod(list(j)))
    return ans

#Build a clean-Multiples function
def cleanMults(mults):
    divs = []
    for d in (1, -1):
        seen = 1
        #the list forwards, then the list backwards...
        for a in mults[::d]:
            if seen % a != 0: 
                seen = a * seen // gcd(a, seen)
            else: 
                divs.append(a)
    return [a for a in mults if all(d == a or a % d != 0 for d in divs)]
    
#Build a toString function
def toString(mults):
    lm = list(mults)
    if len(lm) == 1: 
        return str(lm[0])
    s = 'or ' + str(lm[-1])
    for i in range(len(lm) -2, -1, -1):
        s = str(lm[i]) + ', ' + s
    return s
    
#Sum of multiples (of 3 and 5) function
def SumOfMults(lim, mults):
    #Declare variables
    s = time.time()
    
    #Solve the problem
    ans = str(inex(lim, cleanMults(mults)))
    l, m = str(lim), toString(mults)
    
    #Print the results
    print('The sum of all of the multiples ')
    print('of ' + m + ' below ' + l + ' is ' + ans + '.')
    print('This took ' + str(time.time() - s) + ' seconds to calculate.')
    
#Run the program
lim = 1000
mults = [3, 5]
SumOfMults(lim, mults)

There was just something about this problem that spoke to me, no matter how much I worked on it, I always felt like there was something more elegant right around the corner. I've solved a lot of project euler problems since tackling this one, but this one is still my favorite because of how much deeper I was able to dive into it.

Deep Dive on Project Euler's Problem 1

Recent Posts

Comments