Complexity theory

Prerequisites: good algebra skills, basic number theory (definition of a prime number), a little bit of long multiplication (remember that from 5th grade?)

Overview: This tutorial explains more precisely what it means when computer scientists say that finding the prime factors of a large number, or breaking a mathematical encryption system, is a "hard" problem. We begin with the definition of big-O notation, and what it means to say that problems and the methods to solve them have "complexity" of O(n2), O(n3log(n)), O(2n), etc. Then we show how mathematical problems are classified as "hard" if the best known methods to solve them have complexity that is exponential in n, for example, O(2n). We explain the categories of problems described as "P", "NP", "NP-complete", and the significance of the question of whether P=NP.

Illustrating "complexity" using long multiplication

Most students learn the method for long multiplication in school, which can be used to multiply, for example, a pair of three-digit numbers together:

347
x 296
2082
3123 
694  
102712

The same method could be used to multiply a 17-digit number by a 13-digit number:

2398531095872698
x 3295618303278
19188248766981584
16789717671108886 
4797062191745396  
7195593287618094   
0000000000000000    
7195593287618094     
19188248766981584      
2398531095872698       
14391186575236188        
11992655479363490         
21586779862854282          
4797062191745396           
7195593287618094            
[whatever the answer is, goes here]
Complexity theory asks the question: how much longer does it take to multiply the 17-digit and the 13-digit number, than it does to multiply the two 3-digit numbers together? And what kind of formula would tell you roughly how long it would take to multiply together an m-digit and an n-digit number together using the same technique, for large values of m and n? In particular, what is the required time proportional to -- is it proportional to m + n, or mn, or m2n2, or what? If the time required is roughly proportion to m + n, then that means if you can multiply a 100-digit number by a 100-digit number in 1 hour, then you can multiply a 200-digit number by a 400-digit number in about 3 hours, because m + n is 3 times greater in the second case. On the other hand, if the time required is roughly proportional to mn, then if you can multiply a 100-digit number by a 100-digit number in 1 hour, it would take about 8 hours to multiply a 200-digit number by a 400-digit number, because the product mn is 8 times greater.

When you do long multiplication on paper, the only basic operations that you're doing in your head are: multiplying a pair of single-digit numbers together, and adding a one-digit number to another number (when you multiply two numbers and get a result of 10 or more, you have a tens digit to carry over to the next column). The process of solving the problem is just a long series of steps, where each step involves exactly one of those "basic operations" -- either multiplying two one-digit numbers together, or adding a one-digit number to another number. Suppose that you can do exactly one of these basic operations per second -- now the question is, how many of those basic operations do you need to perform, to multiply an m-digit number and an n-digit number?

In the multiplication problems above, the numbers written in blue are the intermediate steps -- each row is the result of multiplying the 17-digit number by a single digit in the 13-digit number below it. And when you're multiplying the 17-digit number by a certain digit of the 13-digit number, what you're really doing is multiplying each digit of the 17-digit number by that digit of the 13-digit number, and then possibly carrying over a tens digit to add to the next column (remember long multiplication?). So to get the digits written in blue, you have to do exactly 13 x 17 single-digit multiplication operations.

You also have to do some addition operations to get the blue numbers, whenever a multiplication gives you a result of at least 10 and you have a tens digit to carry over into the next column -- but the number of addition operations will vary depending on which particular 13-digit and 17-digit numbers you're multiplying together, because you don't always have to carry over a tens digit. The number of addition operations required could be as low as zero -- if the numbers you're multiplying are all 1's, then the result of each single-digit multiplication will be 1, and there will never be anything to carry. On the other hand, the number of addition operations could be as high as (17 - 1) x 13 -- if the numbers you're multiplying are all 9's, then you will always have an extra digit to carry over and add to the result, except when you're starting a new row and multiplying the rightmost digit of the 17-digit number, in which case you don't have anything to carry. In any case, the number of addition operations required for this phase is no greater than (17 - 1) x 13.

In the last step of long multiplication, you have to add up all the rows in blue to obtain the result, which involves a number of addition operations. But to get the number of addition operations, we're going to take a shortcut, the reasons for which will become clear eventually. When you're adding up the blue columns, you start with the rightmost column, add up the digits, then possibly carry over a tens digit to the next column. Each time you add up a single column, you start with the tens digit that you carried over from the previous column (if there was one), then adding the first number in the column, obtaining the result and then adding the next number, and so on until you get to the bottom of the column, then writing the result at the bottom and carrying the tens digit over to the next column if necessary. So each blue digit only gets added to anything once -- meaning the number of addition operations can't be more than the number of blue digits. Since some of the blue rows have 17 digits and some have 18, that means the number of addition operations is at most (17 + 1) x 13.

So we know that the number of basic operations required (multiplying one-digit numbers, and adding two-digit and one-digit numbers together) is at most 17 x 13 multiplication operations and (17 - 1) x 13 addition operations to get the blue numbers, plus another (17 + 1) x 13 operations to add all the blue numbers together and get the answer -- for a total of 17 x 13 x 3 "basic" operations required, in a worst-case scenario requiring the maximum number of steps. Thus if it takes one second to do each basic operation, the time taken is proportional to mn, the product of the numbers of digits in each of the numbers. This answers the original question: if you can multiply a 100-digit number by a 100-digit number in one hour on average, then it would take about 8 hours, on average, to multiply a 200-digit number by a 400-digit number, using the same technique.

This is why it was possible to ignore the fact that there aren't really (17 + 1) x 13 addition operations in the stage of adding up all the blue numbers. Because we know that the algorithm required 17 x 13 multiplication steps up to that point, we already know that the time required is at least proportional to mn. Whether or not the last phase required (17 + 1) x 13 addition operations, or required a lot less, that wouldn't have changed the fact that the algorithm time was roughly proportional to mn. All we needed to make sure of, was that the last phase did not require a number of operations that was more than proportional to mn, which would have meant that the time taken to run the whole algorithm would have been more than proportional to mn.

This illustrates what computer scientists care about the most in figuring out the complexity of a problem: what the running time is roughly proportional to, in terms of the size of the input. (We will give a more precise definition later.) It might seem like an odd thing not to care whether the time required by the algorithm is mn, or 2mn, or 3(m+1)n, or whatever, as long as it's roughly proportional to mn. Renting time on a supercomputer can be expensive; if you invent a new algorithm that takes exactly half as long as the old one (i.e. it takes mn steps instead of 2mn steps), isn't that a big deal? But what computer scientists care about the most, is how long it takes to run an algorithm like this one if m and n are very large. If the running time of one algorithm is roughly proportional to mn, and the running time of another algorithm is roughly proportional to mn2, then no matter what the actual formulas are for their running times, if you make m and n large enough, the running time of the mn algorithm will be less. This is true, for example, even if the running time for the first algorithm is 100000m(n+600) (which is roughly proportional to mn for large values of m and n), and the running time for the second algorithm is 0.0002mn2. For small values of m and n, the second algorithm might have a smaller running time, but if you make m and n large enough, the running time of the first algorithm will always be smaller, as long as m and n are above some lower limit. So, for large enough values of m and n, virtually all algorithms with running time proportional to mn, would be considered "faster than" algorithms with running time proportional to mn2. And then only within the category of algorithms that are proportional to mn, would you distinguish between an algorithm that takes, say, 2mn steps and an algorithm that takes 4mn steps.

Formalizing concepts -- function "growth speeds", and big-O notation

We've been talking about functions such as 3m(n+1) that are "roughly proportional to" simpler functions like mn, for sufficiently large values of m and n. Here we give a formal definition of what that means. For simplicity, we'll look at functions of just one variable, n.

A function f(n) is in the set O(g(n)) (pronounced "big-O of g(n)") if there is a number n1 and a constant k such that for all n > n1, f(n) < kg(n). In other words, f(n) is in O(g(n)) if there is a function proportional to g(n) such that that function acts as an upper bound on f(n), once n gets above n1.

Another way of saying that f(n) is in the set O(g(n)) is to say that f does not grow faster than g(n). Often, instead of saying "f is in the set O(n)", we would just say, "f is O(g(n))" (again, this would be pronounced "f is big-O of g of n").

Now, we can revisit something that I said in the discussion of the long multiplication problem. I said that in the final addition step, there could be up to (17 + 1) x 13 addition steps, and I emphasized that this is not "more than proportional to mn". This might have seemed like cheating -- (17 + 1) x 13 is greater than 17 x 13, isn't it? But even though (m + 1) x n is greater than mn, it's not more than proportional to mn, because (m + 1) x n < 2mn as long as m is greater than 1, so there is a formula 2mn which is proportional to mn and which still acts as an upper bound on (m + 1) x n, for m > 1. What we showed was that the running time of that last step, and in fact, of the algorithm as a whole, was O(mn). On the other hand, you could say that the function mn2 is "more than proportional" to mn, because no matter what constant k you select, you can always make n large enough that mn2 > kmn for all values of n above a certain number. mn2 is not O(mn). We would say that mn2 grows faster than mn. So if we had been computing the number of steps required in the multiplication problem, and we had found out that the last stage of the problem required mn2 addition steps, that would have meant the running time was no longer O(mn).

(By the way, when you talk about big-O notation without referring to a specific function -- generally, when you write the big O without parentheses next to it -- you actually write out the word "big", as in "big-O notation". But when you're saying that one function is big-O of another specific function -- i.e. you're writing the big O with parentheses next to it -- you don't write the word "big", you just write "f(n) is O(g(n))". However, that's still pronounced "f of n is big-O of g of n".)

Two trivial examples

You can illustrate the concepts of big-O notation and "complexity" using examples that are much simpler than long multiplication. For example, suppose that your input is a list of n one-digit numbers, and you want to find out if the number 7 appears in that list. Your algorithm does the obvious thing: it checks each number in the list to see if it's equal to 7, and if it finds a 7, it reports "Yes" and stops running; otherwise, if it gets to the end of the list and hasn't found a 7, it reports "No". Checking one number in the list counts as a single "step". If there happen to be no 7's in the list, then the algorithm has to check the entire list, so the running time of the algorithm could conceivably take n steps (but it certainly can't take more than that). So the algorithm is O(n).

You could even have an algorithm that always takes the same number of steps, no matter what the input. Suppose that your algorithm looks at the list of numbers and immediately outputs, "I don't know." (one step), and then prints "I give up." (second step), and quits. The algorithm always two steps to complete. That's not a very useful algorithm, but we can talk about its running time. Its running time is constant, i.e. always the same -- 2 -- regardless of the length n of the input. You could say that it's O(1) -- because 2 < 3*1, so 2 is bounded by something that's proportional to 1. But normally you'd simply say that it's a constant-time algorithm.

(This example illustrates that what computer scientists refer to as a "function" or an "algorithm" is different from what mathematicians usually mean when they use the same words. In mathematics, a function or an algorithm generally finds an answer to something -- e.g. the number of times that 37 goes into a particular number, or whether a given number is prime or not. But in computer science, a function or an algorithm simply "does something" -- that can include solving a problem such as determining whether a given number is prime, but it can also include clearing the computer screen, or rebooting the computer -- both of which are simply tasks, which don't find a "solution" to anything. The algorithm in this example printed two lines, and then gave up -- that's "doing something", so to a computer scientist, it counts as an algorithm.)

How to use big-O notation the way everybody else does

Now that we've given the formal definition, there are a few other subtle differences between what big-O formally means, and the way it is normally used. For example, suppose that you've determined that an algorithm would takes 2n2 + 2n steps to run. You could say that the algorithm is O(n2 + n). You could say that, but you never would. The reason is that if an algorithm is O(n2 + n), then it's also O(n2) -- because n2 + n < 2n2 for all n > 1, so if a function is bounded by something proportional to n2 + n, then it's also bounded by something proportional to n2. So it's simpler, and means the same thing, to say that the algorithm is O(n2).

Similarly, if you found that the running time of an algorithm was given by a polynomial in n (where n is, as always, the length of the input), you would say that the running time was big-O of the most highest-degree term in the polynomial (with that term's constant factor removed). So if an algorithm has a running time of 4n3 + 6n2 - 2n + 4, then the running time of the algorithm is O(n3).

You would also not need to write than an algorithm was O(log3(n)), because, by following logs of logarithms, log3(n) = log3(10) * log10(n). And log10(n) is normally written just log(n). So for any function that is O(log3(n)), we would write that it is just O(log(n)).

For functions that are exponential in n, we tend to group those together as "exponential" functions, even though they are not equivalent -- all exponential functions are not big-O of each other. 10n grows faster than 2n. However, when computer scientists look at running times of different algorithms, exponential running time is considered so horrendously bad, that we usually don't distinguish between the different categories of badness. If an algorithm's running time is exponential, then we just call it exponential, without specifying exactly which exponential formula gives the running time.

To summarize the conventions that are followed when using big-O notation:

You could, technically, say... but instead, you would normally say... because...
O(n2 + n) O(n2) n2 + n is bounded above by 2n2 as long as n > 1
O(2n3) O(n3) if something is bounded above by a function proportional to 2n3, then since anything proportional to 2n3 is also proportional to n3, it's simpler to write n3
O(1) constant time it's a simpler way of saying the algorithm always takes the same time to run
O(log3(n)) O(log(n)) If a function is O(log3(n)), then it's also O(log10(n)), O(log2(n)), and so on for any other base of the logarithm, so it's simpler to write log(n)
O(2n) exponential An algorithm that takes exponential time to run, is considered so horrendously bad, that you usually don't need to quantify exactly how bad it is.

Finally, if an algorithm is O(n2), then you could also say that it's O(n3), or O(n5), because if the running time is bounded above by a constant times n2, then it's also bounded by a constant times n3 or a constant times n5. But you would just say that the algorithm is O(n2).

Distinction between worst-case running time, and average running time

We gave the example of an algorithm that takes a list of n digits as its input, and checks to see if the digit 7 is in the list. We showed that in the worst case, this algorithm would take O(n) time, because you'd have to check all n digits. If your input consists of 1,000 digits, then the worst-case running time is 10 times longer than if your input consists of 100 digits.

On the other hand, if the digits are random, and each has a 1-in-10 chance of being a 7, then the average running time for a string of 1,000 digits is not 10 times worse than the average running time for a string of 100 digits. This is because, since each digit has a 1-in-10 chance of being a 7, you will only have to check about 10 digits, on average, before you find a 7. This is true whether your input consists of 100 digits or 1,000 digits -- either way, you'll only have to check about 10 of them. The odds of not getting a single 7 anywhere in your input, so that you really do have to check all 100 digits (or all 1,000 digits) of your input, are microscopic -- that's just the worst-case scenario. So the average running time of the algorithm is always going to be less than 10. (The reason is that if you were checking an infinite number of digits, then the average wait time before you found a 7, would be exactly 10 steps -- so if you're forced to stop after checking 1,000 digits, then the average wait time will be slightly less than 10.) On average, the algorithm is constant-time -- 10 steps or less. But in the worst case, the algorithm is O(n) -- it's possible, although very unlikely, that you'd have to check all n numbers in the input.

Whether a computer scientist is interested in the average running time of an algorithm, or the worst-case running time, depends on the context. But unless stated otherwise, when they say that an algorithm is O(n2), that refers to the worst-case running time.

For many algorithms, the worst-case running time and the average running time are both in the same big-O category anyway. Take the long-multiplication example. We showed that in the worst possible case (where every single-digit multiplication generates a number bigger than 10, and you have to carry a tens digit into the second column), there were (17 x 13) multiplication operations and (17 - 1) x 13 addition operations to generate the numbers in blue, and then possibly another (17 + 1) x 13 addition operations to total up all the blue rows to get the answer -- so the worst-case scenario takes O(mn) time. But even in the best-case scenario, where all the digits were all 1's and could be multiplied without carrying anything, you still require (17 x 13) multiplication operations to generate the rows in blue -- every digit in the first number has to be multiplied exactly once by every digit in the second number. So the average running time is O(mn) too, because there's no way of getting around the mn single-digit multiplication operations that you have to do.

Important -- if the inputs are numbers, then you still look at the lengths of the inputs, not the size of the numbers

Remember that when we say the algorithm time is a function of m and n, m and n are the lengths of the inputs (i.e. the number of digits in the two numbers you're multiplying), not the sizes of the numbers themselves. If the algorithm to multiply two numbers A and B required a length of time proportional to AB, that would mean that just adding a digit to the end of one of the numbers would make the problem take 10 times longer to solve, since you just made one of the numbers about 10 times greater! Fortunately, we know from experience that multiplying a 3-digit number by a 2-digit number does not take 10 times longer than multiplying a 2-digit number by a 2-digit number.

Similarly, sometimes an algorithm might take a number A as its input, and do something with every number from 1 to A -- for example, tests every number from 1 to A and see if it divides evenly into A. (Obviously the numbers greater than A/2 can't possibly divide evenly into A, but the algorithm is dumb and checks them anyway.) We know from long division that dividing one number into another number, to see if it divides evently, is not "instantaneous", but suppose that you could magically check if one number divided into another number, in one step. Even so, it would take A steps to complete the algorithm. But this is not proportional to n. n is the length of the input, i.e. the number of digits in A. If A is an n-digit number, then A is between 10n-1 and 10n - 1, so the running time of the algorithm is 10n steps in the worst case. The running time of the algorithm may be proportional to A, but it's exponential in n, the length of the input.

There is less danger of confusion in cases where the inputs are not really numbers. For example, when we looked at the algorithm that checked a list of 1,000 digits to see if one of them was a 7, we were considering the input to be a list of 1,000 digits, not a 1,000-digit number. The numbers were just being treated as symbols, so we might as well have said that the input was a string made up of 10 different symbols (heart, diamond, circle, etc.) and the algorithm's task was to check whether the string contained a heart or not.

Different algorithms can take different time to solve the same problem

In the long-multiplication example that we used, we showed that the usual algorithm for multiplying an m-digit number A, and an n-digit number B, was O(mn).

You could do it differently, by starting with a running total of 0 and then adding A to the running total over and over again, B times. Obviously, this would be a very tedious way to multiply 9,618 by 4,134. The running time of the algorithm would be at least proportional to B -- which, as explained earlier, means that it would be exponential in the number of digits in B. If B were 13 digits long, B would have to be equal to at least 1012 and this algorithm would take at least 1012 steps.

So, "running time" is a property of the algorithm used to solve a problem, not a property of the problem itself.

However, for certain problems, you can show that the running time can never be less than a certain big-O category. We used the example of a list of n digits, where the problem was to check the list to see if it contained any 7's. Obviously, any algorithm would have to check all n digits, so any algorithm would have to be at least O(n). This is kind of obvious, because for any problem where the input has length n and you have to look at the entire input to solve the problem, you need at least n steps to look at the whole input, and hence any algorithm will require at least n steps to solve the problem. On the other hand, there are problems that take an input of size n, where computer scientists have proven that even the fastest-possible algorithm will still be slower than O(n). For example, if you are given n numbers and assigned to sort them in ascending order, it only takes n steps to read in all the numbers, but the fastest-possible algorithm to sort the numbers will still be O(nlog(n)).

Polynomial vs. exponential time

As stated, computer scientists divide algorithms into categories based on their running time as a function of the size of their input, such as O(n), O(n2), O(nlog(n)), O(10n), etc. These categories are grouped together into two larger categories: algorithms that take polynomial running time, and algorithms that take exponential running time.

Polynomial running time algorithms are all the algorithms that are constant-time, O(n), O(n2), O(n2), O(nlog(n)), O(squareroot(n)), etc. This might seem odd since expressions like nlog(n) and squareroot(n) are not considered polynomials. What polynomial running time really means is that the running time is polynomial or better. For example, nlog(n) is less than n2, so an O(nlog(n)) algorithm is considered a polynomial-time algorithm.

Exponential running time algorithms are algorithms where the running time is exponential in n, such as those that are O(10n). Multiplying two numbers A and B by starting with a running total of 0 and then adding A to the total over and over, B times, is an exponential-time algorithm because the running time is proportional to B, which means it's proportional to 10n where n is the number of digits in B.

Definition of a "hard" problem

A "hard" problem is a problem where all known algorithms to solve the problem, take exponential time to finish. For example, suppose that your input is a large number A, and you want to find all the factors of A. You could start with the numbers 2, 3, 4, ... and so on, all the way up to A/2, to see if each of those numbers divides into A. But even if you had an instantaneous, one-step method of dividing a number into A to see if it divided evenly, the running time of the algorithm would still be proportional to A/2, i.e. exponential in the length of the input, because the size of A is exponential in the number of digits in A. Even though this looks like a naive algorithm to solve the problem, it turns out that all other algorithms to find the prime factors of a large number, also take exponential time. There is no known polynomial-time algorithm that can solve the problem. Hard problems are also called "intractable" problems.

Some "hard" problems are called "hard" because there is no known algorithm to solve them in polynomial time, but it's never been proven that no such algorithm exists. On the other hand, just as it has been proven that the problem of sorting a list of n numbers cannot be solve by any algorithm in faster than O(nlog(n)) time, there are problems for which it has been proven that they cannot be solved in polynomial time; any solution requires at least exponential time. Finding the factors of a large number, is a problem from the first category -- no one has ever found an algorithm to factor a large number in less than exponential time, but no one has proven that it can't be done. (There are examples of problems in the second category -- problems that have been proven to require at least exponential time -- but none that can be described simply.)

Definitions of "decision problem", "P", and "NP"

A decision problem is a basically a yes-or-no question. So "Find the factors of the number A" is not a decision problem. But an example that we gave above, where the input is a list of one-digit numbers and the problem is to find out if the list of digits contains a 7, is a decision problem. "Is the number A prime?" would be another example of a decision problem.