June 27, 2017

Needle-in-a-Haystack Problems, and P vs. NP

Last week I wrote about needle-in-a-haystack problems, in which it’s hard to find the solution but if somebody tells you the solution it’s easy to verify. A commenter asked whether such problems are related to the P vs. NP problem, which is the most important unsolved problem in theoretical computer science. It turns out that they are related, and that needle-in-a-haystack problems are a nice framework for explaining the P vs. NP problem, which few non-experts seem to understand.

Stated simply, the P vs. NP problem asks whether needle-in-a-haystack computational problems can exist. To understand what this means, we need to take a short detour into computer science theoryland. It’s perfectly safe, but stay close to the group so you don’t wander off into a finite field…. Okay, let’s meet P and NP.

P is the set of problems that can be solved efficiently. The precise theoretical definition of “efficient” is a bit subtle: we define the “size” of a problem to be the number of characters needed to write down the problem; and we say a solution method is efficient if, when the problem is large, the method can finish in a length of time that is less than the problem size raised to some power.

For example, suppose we are given two N-digit numbers, A and B, and a 2N-digit number C, and we’re asked whether A times B equals C. It requires 4N characters to write down the problem, one character for each digit in each of the numbers. We can solve the problem by multiplying A and B, using the multiplication method we learned in school, and then comparing the result to C. The multiplication will take us roughly N-squared steps, and the comparison is faster than that. So the solution time will grow roughly as N-squared, and if N is large this is less than the third power of the problem size (i.e., less than 4N raised to the third power). Therefore multiplication is in P, which matches our intuition that we know how to multiply efficiently.

Here’s another example, the “traveling salesperson problem” (TSP): given a list of cities, a table showing the airfare between each pair of cities, and a total travel budget, is there an itinerary that visits every city on the list and returns to where it started, without exceeding the total travel budget? One way to solve this problem is to try out all of the possible itineraries, and see if there’s one that is cheap enough. That works, but it is not efficient, because its running time grows faster than any fixed power of the problem size. No efficient algorithm for the TSP is known. There might be one, but if there is one we haven’t discovered it yet. So we don’t know if the TSP is in P.

You might have guessed by now that P stands for “polynomial”, as in “solvable in polynomial time”. That is indeed the origin of the name P. And you might go on to guess that NP stands for “not polynomial” — but that would be wrong. If you must know, NP stands for “nondeterministic polynomial”. I won’t bother explaining what “nondeterministic” means here, because that has confused generations of students. Instead, let’s use an alternative, but equivalent, definition of NP.

NP is the set of problems having yes/no answers, for which, whenever the correct answer is “yes”, there is some “hint” that allows us to verify the “yes” answer efficiently. Think about the TSP: if somebody tells you a cheap itinerary, you can verify efficiently that that itinerary visits all of the cities, ends where it started, and costs less than the travel budget. So the TSP is in NP.

Multiplication is in NP as well. Given any hint (“booga booga”, say) we can verify that A times B equals C, by simply ignoring the hint, and then multiplying and comparing as above. By a similar argument, any problem in P must also be in NP.

You can see now how this connects with needles and haystacks. If a problem is in NP but not in P, it’s like a needle-in-a-haystack problem: we can verify the answer efficiently if we’re given a hint, but without a hint we can’t find the answer efficiently. So asking whether needle-in-a-haystack problems exist is exactly the same as asking whether P is different from NP.

Are P and NP different? We don’t know. Consider the Traveling Salesperson Problem. We know it’s in NP, because we can verify the answer efficiently given a hint, but we don’t know if it’s in P. We don’t know if the TSP can be solved efficiently — we haven’t found an efficient solution yet but we can’t rule out the possibility that somebody will discover one tomorrow. A lot of people have tried really hard for a long time to find an efficient solution, so most experts tend to assume that no efficient solution is possible — but the experts have been wrong before.

If, on the other hand, P and NP turn out to be equal, the consequences would be huge. For example, much of cryptography would collapse. Decrypting a message is supposed to be a needle-in-a-haystack problem—easy if you have a hint (i.e., the secret decryption key) but hard otherwise.

It’s annoying, to say the least, that such a basic question about information remains unanswered. For decades theorists have been trying to scale the P vs. NP mountain from various directions,without success. The rest of us have gotten accustomed to assuming that needle-in-a-haystack problems exist, and hoping for the best.

Comments

  1. It was long ago, so I don’t remember it perfectly, perhaps someone here will recognize it.

    In a Scientific American article that a reasearcher from Bell Labs (from memory) proved that for any switching matrix, of N inputs and N outputs, you don’t need N squared switches (one per input to the output), you need at most 12N or something (groups of lines can go into smaller matrices which eventually allows a connection between any row and column). The problem is by the time 12N is sufficiently less than doing NxN, – it was 48×48 where it would be equal if I remember the article – it becomes computationally infeasable to find even one actual setup that only uses 12N. It has to exist, but to find one without a gap or redundancy would fall into the NP arena.

    There are a number of strange proofs that say yes or no to questions like this without providing a method to solve any particular instance. Some contradiction in the logic is found that makes it clearly false, or it is easy to prove if you reduce A to X and B to Y, X equals Y, but you still can’t find B given an arbitrary A.

    So I’m not worried about cryptography at the moment even if someone solves the TSP or proves that P=NP through some strange, subtle, and indirect method.

    I also assume you mean the basis of PUBLIC KEY cryptography would collapse since they rely on NP problems. i don’t remember anything about symmetric ciphers being affected by P=NP, but it might be a gap in my information..

    Even knowing there must be a way to factor large numbers in N steps doesn’t immediately show how it can be accomplished. For a long time we wondered about Fermat’s last theorem.