Prerequisites: good algebra skills, basic number theory (definition of a prime number), a little bit of long multiplication (remember that from 5th grade?)
Overview: This tutorial explains more precisely what it means when computer scientists say that finding the prime factors of a large number, or breaking a mathematical encryption system, is a "hard" problem. We begin with the definition of big-O notation, and what it means to say that problems and the methods to solve them have "complexity" of O(n^{2}), O(n^{3}log(n)), O(2^{n}), etc. Then we show how mathematical problems are classified as "hard" if the best known methods to solve them have complexity that is exponential in n, for example, O(2^{n}). We explain the categories of problems described as "P", "NP", "NP-complete", and the significance of the question of whether P=NP.
Most students learn the method for long multiplication in school, which can be used to multiply, for example, a pair of three-digit numbers together:
347 x 296 2082 3123 694 102712 |
The same method could be used to multiply a 17-digit number by a 13-digit number:
2398531095872698 x 3295618303278 19188248766981584 16789717671108886 4797062191745396 7195593287618094 0000000000000000 7195593287618094 19188248766981584 2398531095872698 14391186575236188 11992655479363490 21586779862854282 4797062191745396 7195593287618094 [whatever the answer is, goes here] |
When you do long multiplication on paper, the only basic operations that you're doing in your head are: multiplying a pair of single-digit numbers together, and adding a one-digit number to another number (when you multiply two numbers and get a result of 10 or more, you have a tens digit to carry over to the next column). The process of solving the problem is just a long series of steps, where each step involves exactly one of those "basic operations" -- either multiplying two one-digit numbers together, or adding a one-digit number to another number. Suppose that you can do exactly one of these basic operations per second -- now the question is, how many of those basic operations do you need to perform, to multiply an m-digit number and an n-digit number?
In the multiplication problems above, the numbers written in blue are the intermediate steps -- each row is the result of multiplying the 17-digit number by a single digit in the 13-digit number below it. And when you're multiplying the 17-digit number by a certain digit of the 13-digit number, what you're really doing is multiplying each digit of the 17-digit number by that digit of the 13-digit number, and then possibly carrying over a tens digit to add to the next column (remember long multiplication?). So to get the digits written in blue, you have to do exactly 13 x 17 single-digit multiplication operations.
You also have to do some addition operations to get the blue numbers, whenever a multiplication gives you a result of at least 10 and you have a tens digit to carry over into the next column -- but the number of addition operations will vary depending on which particular 13-digit and 17-digit numbers you're multiplying together, because you don't always have to carry over a tens digit. The number of addition operations required could be as low as zero -- if the numbers you're multiplying are all 1's, then the result of each single-digit multiplication will be 1, and there will never be anything to carry. On the other hand, the number of addition operations could be as high as (17 - 1) x 13 -- if the numbers you're multiplying are all 9's, then you will always have an extra digit to carry over and add to the result, except when you're starting a new row and multiplying the rightmost digit of the 17-digit number, in which case you don't have anything to carry. In any case, the number of addition operations required for this phase is no greater than (17 - 1) x 13.
In the last step of long multiplication, you have to add up all the rows in blue to obtain the result, which involves a number of addition operations. But to get the number of addition operations, we're going to take a shortcut, the reasons for which will become clear eventually. When you're adding up the blue columns, you start with the rightmost column, add up the digits, then possibly carry over a tens digit to the next column. Each time you add up a single column, you start with the tens digit that you carried over from the previous column (if there was one), then adding the first number in the column, obtaining the result and then adding the next number, and so on until you get to the bottom of the column, then writing the result at the bottom and carrying the tens digit over to the next column if necessary. So each blue digit only gets added to anything once -- meaning the number of addition operations can't be more than the number of blue digits. Since some of the blue rows have 17 digits and some have 18, that means the number of addition operations is at most (17 + 1) x 13.
So we know that the number of basic operations required (multiplying one-digit numbers, and adding two-digit and one-digit numbers together) is at most 17 x 13 multiplication operations and (17 - 1) x 13 addition operations to get the blue numbers, plus another (17 + 1) x 13 operations to add all the blue numbers together and get the answer -- for a total of 17 x 13 x 3 "basic" operations required, in a worst-case scenario requiring the maximum number of steps. Thus if it takes one second to do each basic operation, the time taken is proportional to mn, the product of the numbers of digits in each of the numbers. This answers the original question: if you can multiply a 100-digit number by a 100-digit number in one hour on average, then it would take about 8 hours, on average, to multiply a 200-digit number by a 400-digit number, using the same technique.
This is why it was possible to ignore the fact that there aren't really (17 + 1) x 13 addition operations in the stage of adding up all the blue numbers. Because we know that the algorithm required 17 x 13 multiplication steps up to that point, we already know that the time required is at least proportional to mn. Whether or not the last phase required (17 + 1) x 13 addition operations, or required a lot less, that wouldn't have changed the fact that the algorithm time was roughly proportional to mn. All we needed to make sure of, was that the last phase did not require a number of operations that was more than proportional to mn, which would have meant that the time taken to run the whole algorithm would have been more than proportional to mn.
This illustrates what computer scientists care about the most in figuring out the complexity of a problem: what the running time is roughly proportional to, in terms of the size of the input. (We will give a more precise definition later.) It might seem like an odd thing not to care whether the time required by the algorithm is mn, or 2mn, or 3(m+1)n, or whatever, as long as it's roughly proportional to mn. Renting time on a supercomputer can be expensive; if you invent a new algorithm that takes exactly half as long as the old one (i.e. it takes mn steps instead of 2mn steps), isn't that a big deal? But what computer scientists care about the most, is how long it takes to run an algorithm like this one if m and n are very large. If the running time of one algorithm is roughly proportional to mn, and the running time of another algorithm is roughly proportional to mn^{2}, then no matter what the actual formulas are for their running times, if you make m and n large enough, the running time of the mn algorithm will be less. This is true, for example, even if the running time for the first algorithm is 100000m(n+600) (which is roughly proportional to mn for large values of m and n), and the running time for the second algorithm is 0.0002mn^{2}. For small values of m and n, the second algorithm might have a smaller running time, but if you make m and n large enough, the running time of the first algorithm will always be smaller, as long as m and n are above some lower limit. So, for large enough values of m and n, virtually all algorithms with running time proportional to mn, would be considered "faster than" algorithms with running time proportional to mn^{2}. And then only within the category of algorithms that are proportional to mn, would you distinguish between an algorithm that takes, say, 2mn steps and an algorithm that takes 4mn steps.
We've been talking about functions such as 3m(n+1) that are "roughly proportional to" simpler functions like mn, for sufficiently large values of m and n. Here we give a formal definition of what that means. For simplicity, we'll look at functions of just one variable, n.
A function f(n) is in the set O(g(n)) (pronounced "big-O of g(n)") if there is a number n_{1} and a constant k such that for all n > n_{1}, f(n) < kg(n). In other words, f(n) is in O(g(n)) if there is a function proportional to g(n) such that that function acts as an upper bound on f(n), once n gets above n_{1}.
Another way of saying that f(n) is in the set O(g(n)) is to say that f does not grow faster than g(n). Often, instead of saying "f is in the set O(n)", we would just say, "f is O(g(n))" (again, this would be pronounced "f is big-O of g of n").
Now, we can revisit something that I said in the discussion of the long multiplication problem. I said that in the final addition step, there could be up to (17 + 1) x 13 addition steps, and I emphasized that this is not "more than proportional to mn". This might have seemed like cheating -- (17 + 1) x 13 is greater than 17 x 13, isn't it? But even though (m + 1) x n is greater than mn, it's not more than proportional to mn, because (m + 1) x n < 2mn as long as m is greater than 1, so there is a formula 2mn which is proportional to mn and which still acts as an upper bound on (m + 1) x n, for m > 1. What we showed was that the running time of that last step, and in fact, of the algorithm as a whole, was O(mn). On the other hand, you could say that the function mn^{2} is "more than proportional" to mn, because no matter what constant k you select, you can always make n large enough that mn^{2} > kmn for all values of n above a certain number. mn^{2} is not O(mn). We would say that mn^{2} grows faster than mn. So if we had been computing the number of steps required in the multiplication problem, and we had found out that the last stage of the problem required mn^{2} addition steps, that would have meant the running time was no longer O(mn).
(By the way, when you talk about big-O notation without referring to a specific function -- generally, when you write the big O without parentheses next to it -- you actually write out the word "big", as in "big-O notation". But when you're saying that one function is big-O of another specific function -- i.e. you're writing the big O with parentheses next to it -- you don't write the word "big", you just write "f(n) is O(g(n))". However, that's still pronounced "f of n is big-O of g of n".)
You can illustrate the concepts of big-O notation and "complexity" using examples that are much simpler than long multiplication. For example, suppose that your input is a list of n one-digit numbers, and you want to find out if the number 7 appears in that list. Your algorithm does the obvious thing: it checks each number in the list to see if it's equal to 7, and if it finds a 7, it reports "Yes" and stops running; otherwise, if it gets to the end of the list and hasn't found a 7, it reports "No". Checking one number in the list counts as a single "step". If there happen to be no 7's in the list, then the algorithm has to check the entire list, so the running time of the algorithm could conceivably take n steps (but it certainly can't take more than that). So the algorithm is O(n).
You could even have an algorithm that always takes the same number of steps, no matter what the input. Suppose that your algorithm looks at the list of numbers and immediately outputs, "I don't know." (one step), and then prints "I give up." (second step), and quits. The algorithm always two steps to complete. That's not a very useful algorithm, but we can talk about its running time. Its running time is constant, i.e. always the same -- 2 -- regardless of the length n of the input. You could say that it's O(1) -- because 2 < 3*1, so 2 is bounded by something that's proportional to 1. But normally you'd simply say that it's a constant-time algorithm.
(This example illustrates that what computer scientists refer to as a "function" or an "algorithm" is different from what mathematicians usually mean when they use the same words. In mathematics, a function or an algorithm generally finds an answer to something -- e.g. the number of times that 37 goes into a particular number, or whether a given number is prime or not. But in computer science, a function or an algorithm simply "does something" -- that can include solving a problem such as determining whether a given number is prime, but it can also include clearing the computer screen, or rebooting the computer -- both of which are simply tasks, which don't find a "solution" to anything. The algorithm in this example printed two lines, and then gave up -- that's "doing something", so to a computer scientist, it counts as an algorithm.)
Now that we've given the formal definition, there are a few other subtle differences between what big-O formally means, and the way it is normally used. For example, suppose that you've determined that an algorithm would takes 2n^{2} + 2n steps to run. You could say that the algorithm is O(n^{2} + n). You could say that, but you never would. The reason is that if an algorithm is O(n^{2} + n), then it's also O(n^{2}) -- because n^{2} + n < 2n^{2} for all n > 1, so if a function is bounded by something proportional to n^{2} + n, then it's also bounded by something proportional to n^{2}. So it's simpler, and means the same thing, to say that the algorithm is O(n^{2}).
Similarly, if you found that the running time of an algorithm was given by a polynomial in n (where n is, as always, the length of the input), you would say that the running time was big-O of the most highest-degree term in the polynomial (with that term's constant factor removed). So if an algorithm has a running time of 4n^{3} + 6n^{2} - 2n + 4, then the running time of the algorithm is O(n^{3}).
You would also not need to write than an algorithm was
O(log_{3}(n)), because, by following logs of logarithms,
For functions that are exponential in n, we tend to group those together as "exponential" functions, even though they are not equivalent -- all exponential functions are not big-O of each other. 10_{n} grows faster than 2_{n}. However, when computer scientists look at running times of different algorithms, exponential running time is considered so horrendously bad, that we usually don't distinguish between the different categories of badness. If an algorithm's running time is exponential, then we just call it exponential, without specifying exactly which exponential formula gives the running time.
To summarize the conventions that are followed when using big-O notation:
You could, technically, say... | but instead, you would normally say... | because... |
O(n^{2} + n) | O(n^{2}) | n^{2} + n is bounded above by 2n^{2} as long as n > 1 |
O(2n^{3}) | O(n^{3}) | if something is bounded above by a function proportional to 2n^{3}, then since anything proportional to 2n^{3} is also proportional to n^{3}, it's simpler to write n^{3} |
O(1) | constant time | it's a simpler way of saying the algorithm always takes the same time to run |
O(log_{3}(n)) | O(log(n)) | If a function is O(log_{3}(n)), then it's also O(log_{10}(n)), O(log_{2}(n)), and so on for any other base of the logarithm, so it's simpler to write log(n) |
O(2^{n}) | exponential | An algorithm that takes exponential time to run, is considered so horrendously bad, that you usually don't need to quantify exactly how bad it is. |
Finally, if an algorithm is O(n^{2}), then you could also say that it's O(n^{3}), or O(n^{5}), because if the running time is bounded above by a constant times n^{2}, then it's also bounded by a constant times n^{3} or a constant times n^{5}. But you would just say that the algorithm is O(n^{2}).
We gave the example of an algorithm that takes a list of n digits as its input, and checks to see if the digit 7 is in the list. We showed that in the worst case, this algorithm would take O(n) time, because you'd have to check all n digits. If your input consists of 1,000 digits, then the worst-case running time is 10 times longer than if your input consists of 100 digits.
On the other hand, if the digits are random, and each has a 1-in-10 chance of being a 7, then the average running time for a string of 1,000 digits is not 10 times worse than the average running time for a string of 100 digits. This is because, since each digit has a 1-in-10 chance of being a 7, you will only have to check about 10 digits, on average, before you find a 7. This is true whether your input consists of 100 digits or 1,000 digits -- either way, you'll only have to check about 10 of them. The odds of not getting a single 7 anywhere in your input, so that you really do have to check all 100 digits (or all 1,000 digits) of your input, are microscopic -- that's just the worst-case scenario. So the average running time of the algorithm is always going to be less than 10. (The reason is that if you were checking an infinite number of digits, then the average wait time before you found a 7, would be exactly 10 steps -- so if you're forced to stop after checking 1,000 digits, then the average wait time will be slightly less than 10.) On average, the algorithm is constant-time -- 10 steps or less. But in the worst case, the algorithm is O(n) -- it's possible, although very unlikely, that you'd have to check all n numbers in the input.
Whether a computer scientist is interested in the average running time of an algorithm, or the worst-case running time, depends on the context. But unless stated otherwise, when they say that an algorithm is O(n^{2}), that refers to the worst-case running time.
For many algorithms, the worst-case running time and the average running time are both in the same big-O category anyway. Take the long-multiplication example. We showed that in the worst possible case (where every single-digit multiplication generates a number bigger than 10, and you have to carry a tens digit into the second column), there were (17 x 13) multiplication operations and (17 - 1) x 13 addition operations to generate the numbers in blue, and then possibly another (17 + 1) x 13 addition operations to total up all the blue rows to get the answer -- so the worst-case scenario takes O(mn) time. But even in the best-case scenario, where all the digits were all 1's and could be multiplied without carrying anything, you still require (17 x 13) multiplication operations to generate the rows in blue -- every digit in the first number has to be multiplied exactly once by every digit in the second number. So the average running time is O(mn) too, because there's no way of getting around the mn single-digit multiplication operations that you have to do.
Remember that when we say the algorithm time is a function of m and n, m and n are the lengths of the inputs (i.e. the number of digits in the two numbers you're multiplying), not the sizes of the numbers themselves. If the algorithm to multiply two numbers A and B required a length of time proportional to AB, that would mean that just adding a digit to the end of one of the numbers would make the problem take 10 times longer to solve, since you just made one of the numbers about 10 times greater! Fortunately, we know from experience that multiplying a 3-digit number by a 2-digit number does not take 10 times longer than multiplying a 2-digit number by a 2-digit number.
Similarly, sometimes an algorithm might take a number A as its input, and do something with every number from 1 to A -- for example, tests every number from 1 to A and see if it divides evenly into A. (Obviously the numbers greater than A/2 can't possibly divide evenly into A, but the algorithm is dumb and checks them anyway.) We know from long division that dividing one number into another number, to see if it divides evently, is not "instantaneous", but suppose that you could magically check if one number divided into another number, in one step. Even so, it would take A steps to complete the algorithm. But this is not proportional to n. n is the length of the input, i.e. the number of digits in A. If A is an n-digit number, then A is between 10^{n-1} and 10^{n} - 1, so the running time of the algorithm is 10^{n} steps in the worst case. The running time of the algorithm may be proportional to A, but it's exponential in n, the length of the input.
There is less danger of confusion in cases where the inputs are not really numbers. For example, when we looked at the algorithm that checked a list of 1,000 digits to see if one of them was a 7, we were considering the input to be a list of 1,000 digits, not a 1,000-digit number. The numbers were just being treated as symbols, so we might as well have said that the input was a string made up of 10 different symbols (heart, diamond, circle, etc.) and the algorithm's task was to check whether the string contained a heart or not.
In the long-multiplication example that we used, we showed that the usual algorithm for multiplying an m-digit number A, and an n-digit number B, was O(mn).
You could do it differently, by starting with a running total of 0 and then adding A to the running total over and over again, B times. Obviously, this would be a very tedious way to multiply 9,618 by 4,134. The running time of the algorithm would be at least proportional to B -- which, as explained earlier, means that it would be exponential in the number of digits in B. If B were 13 digits long, B would have to be equal to at least 10^{12} and this algorithm would take at least 10^{12} steps.
So, "running time" is a property of the algorithm used to solve a problem, not a property of the problem itself.
However, for certain problems, you can show that the running time can never
be less than a certain big-O category. We used the example of a list of n digits, where
the problem was to check the list to see if it contained any 7's. Obviously, any algorithm would
have to check all n digits, so any algorithm would have to be at least O(n).
This is kind of obvious, because for any problem where
the input has length n and you have to look at the
entire input to solve the problem, you need at least n steps to look at the whole input, and hence
any algorithm will require at least n steps to solve the problem. On the other hand, there are
problems that take an input of size n,
where computer scientists have proven that even the fastest-possible algorithm will
still be slower than O(n). For example, if you are given n numbers and assigned to sort them
in ascending order, it only takes n steps to read in all the numbers,
but the fastest-possible algorithm to sort the numbers will still be
As stated, computer scientists divide algorithms into categories based on their running time
as a function of the size of their input, such as O(n), O(n^{2}),
Polynomial running time algorithms are all the algorithms that are
constant-time, O(n), O(n^{2}), O(n^{2}),
Exponential running time algorithms are algorithms where the running time is exponential in n, such as those that are O(10^{n}). Multiplying two numbers A and B by starting with a running total of 0 and then adding A to the total over and over, B times, is an exponential-time algorithm because the running time is proportional to B, which means it's proportional to 10^{n} where n is the number of digits in B.
A "hard" problem is a problem where all known algorithms to solve the problem, take exponential time to finish. For example, suppose that your input is a large number A, and you want to find all the factors of A. You could start with the numbers 2, 3, 4, ... and so on, all the way up to A/2, to see if each of those numbers divides into A. But even if you had an instantaneous, one-step method of dividing a number into A to see if it divided evenly, the running time of the algorithm would still be proportional to A/2, i.e. exponential in the length of the input, because the size of A is exponential in the number of digits in A. Even though this looks like a naive algorithm to solve the problem, it turns out that all other algorithms to find the prime factors of a large number, also take exponential time. There is no known polynomial-time algorithm that can solve the problem. Hard problems are also called "intractable" problems.
Some "hard" problems are called "hard" because there is no known algorithm to solve them
in polynomial time, but it's never been proven that no such algorithm exists. On the other
hand, just as it has been proven that the problem of sorting a list of n numbers cannot be
solve by any algorithm in faster
than
A decision problem is a basically a yes-or-no question. So "Find the factors of the number A" is not a decision problem. But an example that we gave above, where the input is a list of one-digit numbers and the problem is to find out if the list of digits contains a 7, is a decision problem. "Is the number A prime?" would be another example of a decision problem.