P vs. NP is about finding algorithms, or computer programs, to solve particular math problems, and whether or not "good" algorithms exist to solve these problems. Good algorithms allow a computer to come up with a solution to a problem relatively quickly, even if the size of the input to the problem is very large. Problems for which computer scientists have found good algorithms are said to be in the "complexity class" P.
However, there are a large number of problems for which the best-known algorithms are not efficient, and can take an extremely long time to run on even the fastest computers. Some of these problems have very important applications to computer and industrial design, internet security, and resource allocation. Problems that have a good algorithm for checking a given possible solution but that don't necessarily have a good algorithm for finding solutions are in the complexity class NP.
The million-dollar question is whether the two complexity classes P and NP equal each other. P is contained in NP: Any problem that can be solved quickly by a computer can also have a particular possible answer quickly checked by a computer. The reverse — whether NP is contained in P — is unknown: We don't know whether problems that have a good algorithm for checking answers also have good algorithms for finding answers.
Most computer scientists and mathematicians think that the two classes are not equal: that there are some problems in NP that are not in P. Yet this has never been mathematically proven. Finding efficient algorithms for the hard problems in NP, and showing that P = NP, would dramatically change the world. On the other hand, finding a proof that no such algorithms exist, and that P ≠ NP, would likely involve a huge leap in our understanding of the nature and limitations of computers.
There are many useful problems that are in P: We know how to solve them relatively quickly, even for large inputs. For example, every year the National Resident Matching Program assigns recent medical-school graduates to hospital-residency programs using an algorithm Business Insider has described in the past. Graduates and programs each make rankings of their preferences. The algorithm takes each unmatched graduate, and tentatively tries to match them with their highest-ranked program.
If the program has space available, it tentatively accepts the graduate. If the program is full but prefers the current applicant to someone on its tentative acceptance list, the program kicks out the lowest-ranked tentative candidate and replaces him or her with the higher-ranked current applicant. The low-ranked candidate who got kicked out is added back to the unmatched graduate pool, where he or she will try to match with the next program on his or her ranking.
In the absolute worst-case scenario, we would have to try matching every graduate with every program that they have on their ranking list. In this scenario, a computer running this algorithm would have to perform a tentative match for each possible pair of applicant and program. This means that the largest number of tentative matches the computer would have to perform would be the number of applicants times the number of residency programs.
This gives us a more formal definition of P. P stands for "polynomial time": An algorithm like residency matching can be run in a number of steps based on a power of the size of the input. Algorithms like this can be run relatively quickly on a computer — while it will take longer for a computer to run the algorithm for a larger input, the number of operations the computer needs to execute grows in a reasonable way. These are our "good" algorithms.
Not all problems have known good polynomial time algorithms. An example is the Traveling Salesman Problem. Here, we consider a set of cities that have some fixed distances between each other, and a salesman who starts in one city and needs to visit the other cities and then come home. The salesman's goal is to find the shortest such tour of the cities. Here's a simple example with four cities and their distances marked in blue (distances are clearly not to scale):
Here's one possible tour our salesman could take through the four cities:
Start at A, walk to B (which has a distance of 1), then across to D (distance 6), then to C (distance 3), then back to A (distance 5). Adding up the distances, the total length of this tour is 1 + 6 + 3 + 5 = 15 units.
We're looking for the shortest path, however. At this point, we can frame this as a related yes-or-no question (formally, P and NP are defined in terms of yes-or-no questions, which are closely related to the kinds of problems we're looking at here): Is there a path shorter than the one above, a path with a total distance of less than 15?
Notice that it's very easy to check any particular route to see if it satisfies our criteria. First, make sure we actually visit all our cities. Second, add up the city-to-city distances to get the total length of the tour, and see if it's less than 15. The fact that it's easy to check a given possible answer means that the Traveling Salesman Problem is in NP.
Since checking a particular path is easy, one possible way to solve this problem is through brute force: Find the length of every possible route that starts and ends at A while visiting the three other cities. Here are all the possible routes in our simple example, with the loops indicated by the order in which we visit the cities (so that, for example, the route above is represented by ABDCA):
The path around the outside of the square, either going ABCDA or ADCBA, is the shortest path with a total length of 10 units.
For four cities, this approach is pretty easy: We had only six routes to consider. That number of routes is not an accident: We are starting at A, and from that point we have three cities to choose from for our first leg: B, C, or D. Once we've chosen the second city to visit, we have two cities left to choose from. After we've chosen a third city from those two, there's just one remaining city to visit. So our number of possible paths is 3 × 2 × 1 = 6.
The same idea for counting paths works as we add more cities. If we have a Traveling Salesman Problem with five cities, we have 4 × 3 × 2 × 1 = 24 paths to look at. If we have six cities, we have 5 × 4 × 3 × 2 × 1 = 120 paths.
As we can already see with these small numbers of cities, the number of paths grows extremely quickly as we add more cities. While it's still easy to take a given path and find its length, the sheer number of possible paths makes our brute-force approach untenable. By the time we have 30 cities, the number of possible paths is about a 9 followed by 30 zeros. A computer that could check a trillion paths per second would take about 280 billion years to check every path, about 20 times the current age of the universe.
There are algorithms for the Traveling Salesman Problem that are much more efficient than this brute-force approach, but they all either provide some kind of approximate "good enough" solution that might not be the actual shortest path, or still have the number of needed calculations grow exponentially with the number of cities, taking an unacceptably long time for large numbers of cities. There is no efficient, polynomial time algorithm known for the problem.
It turns out that there are a large number of problems that are like this: It's easy to check whether a particular candidate solution works for us, but actually finding a solution is much more difficult. As mentioned above, problems for which it's easy to check whether a particular possible answer works are said to be in the class NP, for "nondeterministic polynomial time."
That "nondeterministic" refers to a computer that, roughly speaking, could make as many copies of itself as needed to solve a given problem. If we had such a computer, we could just make as many copies of the computer as there are possible solutions, and check each one of them in parallel.
P Versus NP And NP Completeness
The big question is whether or not P = NP: do good, efficient, polynomial time algorithms exist to find solutions to problems that have good checking algorithms?
Answering a question like this at first glance seems excessively broad. There are many NP problems, like the Traveling Salesman, for which no polynomial time algorithm is known. Fortunately, there is a special subset of NP problems with a very interesting property: Finding a polynomial time algorithm for one of these problems would lead to a similarly quick algorithm for any NP problem. These problems are called NP-complete.
The basic idea is fairly straightforward. A problem is NP complete if it would be possible to make a good algorithm for any NP problem using a "black box" that could solve the NP complete problem quickly. That is, having an efficient algorithm for an NP complete problem automatically leads to an efficient algorithm for any NP problem.
Many interesting problems, including the Traveling Salesman Problem, are NP complete. This makes answering the million-dollar question somewhat easier: You either need to find an efficient algorithm for one NP complete problem, or prove that no such algorithm exists for one particular such problem.
Most computer scientists think that the latter is true: It's likely that there are no polynomial time algorithms for NP complete problems. But there is no formal proof of this yet, and any such proof would almost certainly involve radical new insights into the nature of computation and algorithms.
The practical implications of this problem, however, could be even more extreme than these theoretical aspects. Many hard problems in NP for which we don't have a good polynomial time algorithm are very useful. The Traveling Salesman Problem from above, for example, has applications in logistics and manufacturing: finding efficient road paths for deliveries, or the most efficient way for a robotic arm to place components on a circuit board. Being able to quickly solve the Traveling Salesman Problem would be very useful in these areas.
Prime Factors And Internet Security
One of the biggest NP problems for which we don't yet know an efficient algorithm is factoring integers. One of the most basic facts in number theory is that any number can be broken down into a unique product of prime numbers: numbers that are not divisible by any number other than themselves and one. For small numbers, finding factorizations is easy: 15 is 5 × 3; 12 is 2 × 2 × 3; 70 is 2 × 5 × 7.
For larger numbers, however, finding those prime factors can be much more difficult. To see why, consider the most naive possible way to go about factoring a number: Take every number between 2 and one less than the number, and divide the number we're interested in by the smaller number. If this division has no fractional part, or leaves no remainder, then the smaller number is a factor of our number.
This naive brute-force approach works fine for small numbers, but quickly breaks down as we add more digits. For a two-digit number, we have fewer than a hundred possible candidate factors. But for a 10-digit number, we have billions of numbers we need to try out to see if they are factors.
So, the number of possible factors increases exponentially with the number of digits. If we have a number that is the product of two prime numbers with around 100 digits each, the resulting number will have around 200 digits. Taking such a huge product and trying to find the two prime numbers going into that number using the naive brute-force approach would take a nearly unimaginable amount of computing power and time.
As with the Traveling Salesman Problem, there are better algorithms than brute force for factoring extremely large integers. However, none of the currently known algorithms satisfy the gold standard of running in polynomial time.The integer-factoring problem is particularly important since almost all security on the internet depends on this problem not being easily solvable. The RSA encryption algorithm allows computers to securely communicate with each other over the open internet.
A user randomly chooses two extremely large prime numbers and then multiplies them together. By publishing that product of the two primes along with another number of the user's choosing, other computers can encrypt messages for the user that can (probably) only be decrypted if someone knows the two prime numbers. As long as the user keeps those prime numbers secret, they will be the only person who can read messages encrypted with the publicly available product of the primes.
The security of RSA, which is used in many secure online communications, would be completely broken with an efficient integer-factoring algorithm. A hacker with the ability to take the public composite number and quickly find the secret prime factors of that number would be able to decrypt any supposedly secure message sent to the user, such as encrypted emails, or credit-card numbers sent to an online vendor.
Factoring is an NP problem, since a possible solution can be quickly checked: If I have some prime numbers that I think are the factors of a given number, I can just multiply those primes together and see if I actually get my number. This means that if there is an efficient polynomial time algorithm for NP complete problems, large numbers can be factored quickly, and internet security based on RSA or similar protocols would fall apart.
Whether or not P = NP, the insights to be gained by answering this deep question would have tremendous theoretical and practical effects on our understanding of the abilities and limitations of computers, and an answer is certainly worth at least a million dollars.
NOW WATCH: A Computer Just Solved A 400-Year-Old Math Problem About The Best Way To Stack Balls
Communications of the ACM, Vol. 52 No. 9, Pages 78-86
When editor-in-chief Moshe Vardi asked me to write this piece for Communications, my first reaction was the article could be written in two words:
When I started graduate school in the mid-1980s, many believed that the quickly developing area of circuit complexity would soon settle the P versus NP problem, whether every algorithmic problem with efficiently verifiable solutions have efficiently computable solutions. But circuit complexity and other approaches to the problem have stalled and we have little reason to believe we will see a proof separating P from NP in the near future.
Nevertheless, the computer science landscape has dramatically changed in the nearly four decades since Steve Cook presented his seminal NP-completeness paper "The Complexity of Theorem-Proving Procedures"10 in Shaker Heights, OH in early May, 1971. Computational power has dramatically increased, the cost of computing has dramatically decreased, not to mention the power of the Internet. Computation has become a standard tool in just about every academic field. Whole subfields of biology, chemistry, physics, economics and others are devoted to large-scale computational modeling, simulations, and problem solving.
As we solve larger and more complex problems with greater computational power and cleverer algorithms, the problems we cannot tackle begin to stand out. The theory of NP-completeness helps us understand these limitations and the P versus NP problem begins to loom large not just as an interesting theoretical question in computer science, but as a basic principle that permeates all the sciences.
So while we don't expect the P versus NP problem to be resolved in the near future, the question has driven research in a number of topics to help us understand, handle, and even take advantage of the hardness of various computational problems.
In this article I look at how people have tried to solve the P versus NP problem as well as how this question has shaped so much of the research in computer science and beyond. I will look at how to handle NP-complete problems and the theory that has developed from those approaches. I show how a new type of "interactive proof systems" led to limitations of approximation algorithms and consider whether quantum computing can solve NP-complete problems (short answer: not likely). And I close by describing a new long-term project that will try to separate P from NP using algebraic-geometric techniques.
This article does not try to be totally accurate or complete either technically or historically, but rather informally describes the P versus NP problem and the major directions in computer science inspired by this question over the past several decades.
Back to Top
What is the P versus NP Problem?
Suppose we have a large group of students that we need to pair up to work on projects. We know which students are compatible with each other and we want to put them in compatible groups of two. We could search all possible pairings but even for 40 students we would have more than 300 billion trillion possible pairings.
In 1965, Jack Edmonds12 gave an efficient algorithm to solve this matching problem and suggested a formal definition of "efficient computation" (runs in time a fixed polynomial of the input size). The class of problems with efficient solutions would later become known as P for "Polynomial Time."
But many related problems do not seem to have such an efficient algorithm. What if we wanted to make groups of three students with each pair of students in each group compatible (Partition into Triangles)? What if we wanted to find a large group of students all of whom are compatible with each other (Clique)? What if we wanted to sit students around a large round table with no incompatible students sitting next to each other (Hamiltonian Cycle)? What if we put the students into three groups so that each student is in the same group with only his or her compatibles (3-Coloring)?
All these problems have a similar favor: Given a potential solution, for example, a seating chart for the round table, we can validate that solution efficiently. The collection of problems that have efficiently verifiable solutions is known as NP (for "Nondeterministic Polynomial-Time," if you have to ask).
So P = NP means that for every problem that has an efficiently verifiable solution, we can find that solution efficiently as well.
We call the very hardest NP problems (which include Partition into Triangles, Clique, Hamiltonian Cycle and 3-Coloring) "NP-complete," that is, given an efficient algorithm for one of them, we can find an efficient algorithm for all of them and in fact any problem in NP. Steve Cook, Leonid Levin, and Richard Karp10, 24, 27 developed the initial theory of NP-completeness that generated multiple ACM Turing Awards.
In the 1970s, theoretical computer scientists showed hundreds more problems NP-complete (see Garey and Johnson16). An efficient solution to any NP-complete problem would imply P = NP and an efficient solution to every NP-complete problem.
Most computer scientists quickly came to believe P ≠ NP and trying to prove it quickly became the single most important question in all of theoretical computer science and one of the most important in all of mathematics. Soon the P versus NP problem became an important computational issue in nearly every scientific discipline.
As computers grew cheaper and more powerful, computation started playing a major role in nearly every academic field, especially the sciences. The more scientists can do with computers, the more they realize some problems seem computationally difficult. Many of these fundamental problems turn out to be NP-complete. A small sample:
- Finding a DNA sequence that best fits a collection of fragments of the sequence (see Gusfield20).
- Finding a ground state in the Ising model of phase transitions (see Cipra8).
- Finding Nash Equilbriums with specific properties in a number of environments (see Conitzer9).
- Finding optimal protein threading procedures.26
- Determining if a mathematical statement has a short proof (follows from Cook10).
In 2000, the Clay Math Institute named the P versus NP problem as one of the seven most important open questions in mathematics and has offered a million-dollar prize for a proof that determines whether or not P = NP.
Back to Top
What If P = NP?
To understand the importance of the P versus NP problem let us imagine a world where P = NP. Technically we could have P = NP, but not have practical algorithms for most NP-complete problems. But suppose in fact we do have very quick algorithms for all these problems.
Many focus on the negative, that if P = NP then public-key cryptography becomes impossible. True, but what we will gain from P = NP will make the whole Internet look like a footnote in history.
What we would gain from P = NP will make the whole Internet look like a footnote in history.
Since all the NP-complete optimization problems become easy, everything will be much more efficient. Transportation of all forms will be scheduled optimally to move people and goods around quicker and cheaper. Manufacturers can improve their production to increase speed and create less waste. And I'm just scratching the surface.
Learning becomes easy by using the principle of Occam's razorwe simply find the smallest program consistent with the data. Near perfect vision recognition, language comprehension and translation and all other learning tasks become trivial. We will also have much better predictions of weather and earthquakes and other natural phenomenon.
P = NP would also have big implications in mathematics. One could find short, fully logical proofs for theorems but these proofs are usually extremely long. But we can use the Occam razor principle to recognize and verify mathematical proofs as typically written in journals. We can then find proofs of theorems that have reasonable length proofs say in under 100 pages. A person who proves P = NP would walk home from the Clay Institute not with $1 million check but with seven (actually six since the Poincaré Conjecture appears solved).
Don't get your hopes up. Complexity theorists generally believe P ≠ NP and such a beautiful world cannot exist.
Back to Top
Approaches to Showing P ≠ NP
Here, I present a number of ways we have tried and failed to prove P ≠ NP. The survey of Fortnow and Homer14 gives a fuller historical overview of these techniques.
Diagonalization. Can we just construct an NP language L specifically designed so that every single polynomial-time algorithm fails to compute L properly on some input? This approach, known as diagonalization, goes back to the 19th century.
In 1874, Georg Cantor7 showed the real numbers are uncountable using a technique known as diagonalization. Given a countable list of reals, Cantor showed how to create a new real number not on that list.
Alan Turing, in his seminal paper on computation,38 used a similar technique to show that the Halting problem is not computable. In the 1960s complexity theorists used diagonalization to show that given more time or memory one can solve more problems. Why not use diagonalization to separate NP from P?
Diagonalization requires simulation and we don't know how a fixed NP machine can simulate an arbitrary P machine. Also a diagonalization proof would likely relativize, that is, work even if all machines involved have access to the same additional information. Baker, Gill and Solovay6 showed no relativizable proof can settle the P versus NP problem in either direction.
Complexity theorists have used diagonalization techniques to show some NP-complete problems like Boolean formula satisfiability cannot have algorithms that use both a small amount of time and memory,39 but this is a long way from P ≠ NP.
Circuit Complexity. To show P ≠ NP it is sufficient to show some -complete problem cannot be solved by relatively small circuits of AND, OR, and NOT gates (the number of gates bounded by a fixed polynomial in the input size).
In 1984, Furst, Saxe, and Sipser15 showed that small circuits cannot solve the parity function if the circuits have a fixed number of layers of gates. In 1985, Razborov31 showed the NP-complete problem of finding a large clique does not have small circuits if one only allows AND and OR gates (no NOT gates). If one extends Razborov's result to general circuits one will have proved P ≠ NP.
Razborov later showed his techniques would fail miserably if one allows NOT gates.32 Razborov and Rudich33 develop a notion of "natural" proofs and give evidence that our limited techniques in circuit complexity cannot be pushed much further. And, in fact, we haven't seen any significantly new circuit lower bounds in the past 20 years.
Proof Complexity. Consider the set of Tautologies, the Boolean formulas θ of variables over ANDs, ORs, and NOTs such that every setting of the variables to True and False makes θ true, for example the formula
A literal is a variable or its negation, such as x or NOT x. A formula, like the one here, is in Disjunctive Normal Form (DNF) if it is the OR of ANDs of one or more literals.
If a formula θ is not a tautology, we can give an easy proof of that fact by exhibiting an assignment of the variables that makes θ false. But if θ were indeed a tautology, we don't expect short proofs. If one could prove there are no short proofs of tautology that would imply P ≠ NP.
Resolution is a standard approach to proving tautologies of DNFs by finding two clauses of the form (ψ1 AND x) and (ψ2 AND NOT x) and adding the clause (ψ1 AND ψ2). A formula is a tautology exactly when one can produce an empty clause in this manner.
In 1985, Wolfgang Haken21 showed that tautologies that encode the pigeonhole principle (n + 1 pigeons in n holes means some hole has more than one pigeon) do not have short resolution proofs.
Since then complexity theorists have shown similar weaknesses in a number of other proof systems including cutting planes, algebraic proof systems based on polynomials, and restricted versions of proofs using the Frege axioms, the basic axioms one learns in an introductory logic course.
But to prove P ≠ NP we would need to show that tautologies cannot have short proofs in an arbitrary proof system. Even a breakthrough result showing tautologies don't have short general Frege proofs would not suffice in separating NP from P.
Back to Top
Dealing with Hardness
So you have an NP-complete problem you just have to solve. If, as we believe, P ≠ NP you won't find a general algorithm that will correctly and accurately solve your problem all the time. But sometimes you need to solve the problem anyway. All hope is not lost. Here, I describe some of the tools one can use on NP-complete problems and how computational complexity theory studies these approaches. Typically one needs to combine several of these approaches when tackling NP-complete problems in the real world.
Brute Force. Computers have gotten faster, much faster since NP-completeness was first developed. Brute force search through all possibilities is now possible for some small problem instances. With some clever algorithms we can even solve some moderate size problems with ease.
The NP-complete traveling salesperson problem asks for the smallest distance tour through a set of specified cities. Using extensions of the cutting-plane method we can now solve, in practice, traveling salespeople problems with more than 10,000 cities (see Applegate3).
Consider the 3SAT problem, solving Boolean formula satisfiability where formulas are in the form of the AND of several clauses where each clause is the OR of three literal variables or negations of variables). 3SAT remains NP-complete but the best algorithms can in practice solve SAT problems on about 100 variables. We have similar results for other variations of satisfiability and many other NP-complete problems.
But for satisfiability on general formulae and on many other NP-complete problems we do not know algorithms better than essentially searching all the possibilities. In addition, all these algorithms have exponential growth in their running times, so even a small increase in the problem size can kill what was an efficient algorithm. Brute force alone will not solve NP-complete problems no matter how clever we are.
Parameterized Complexity. Consider the Vertex Cover problem, find a set of k "central people" such that for every compatible pair of people, at least one of them is central. For small k we can determine whether a central set of people exists efficiently no matter the total number n of people we are considering. For the Clique problem even for small k the problem can still be difficult.
Downey and Fellows11 developed a theory of parameterized complexity that gives a fine-grained analysis of the complexity of NP-complete problems based on their parameter size.
Approximation. We cannot hope to solve NP-complete optimization problems exactly but often we can get a good approximate answer. Consider the traveling salesperson problem again with distances between cities given as the crow flies (Euclidean distance). This problem remains NP-complete but Arora4 gives an efficient algorithm that gets very close to the best possible route.
Consider the MAX-CUT problem of dividing people into two groups to maximize the number of incompatibles between the groups. Goemans and Williamson17 use semi-definite programming to give a division of people only a .878567 factor of the best possible.
Heuristics and Average-Case Complexity. The study of NP-completeness focuses on how algorithms perform on the worst possible inputs. However the specific problems that arise in practice may be much easier to solve. Many computer scientists employ various heuristics to solve NP-complete problems that arise from the specific problems in their fields.
While we create heuristics for many of the NP-complete problems, Boolean formula Satisfiability (SAT) receives more attention than any other. Boolean formulas, especially those in conjunctive normal form (CNF), the AND of ORs of variables and their negations, have a very simple description and yet are general enough to apply to a large number of practical scenarios particularly in software verification and artificial intelligence. Most natural NP-complete problems have simple efficient reductions to the satisfiability of Boolean formulas. In competition these SAT solvers can often settle satisfiability of formulas of one million variables.a
Computational complexity theorists study heuristics by considering average-case complexityhow well can algorithms perform on average from instances generated by some specific distribution.
Leonid Levin28 developed a theory of efficient algorithms over a specific distribution and formulated a distributional version of the P versus NP problem.
Some problems, like versions of the shortest vector problem in a lattice or computing the permanent of a matrix, are hard on average exactly when they are hard on worst-case inputs, but neither of these problems is believed to be NP-complete. Whether similar worst-to-average reductions hold for NP-complete sets is an important open problem.
Average-case complexity plays an important role in many areas of computer science, particularly cryptography, as discussed later.
Back to Top
Interactive Proofs and Limits of Approximation
Previously, we saw how sometimes one can get good approximate solutions to NP-complete optimization problems. Many times though we seem to hit a limit on our ability to even get good approximations. We now know that we cannot achieve better approximations on many of these problems unless P = NP and we could solve these problems exactly. The techniques to show these negative results came out of a new model of proof system originally developed for cryptography and to classify group theoretic algorithmic problems.
As mentioned earlier, we don't expect to have short traditional proofs of tautologies. But consider an "interactive proof" model where a prover Peggy tries to convince a verifier Victor that a formula θ is a tautology. Victor can ask Peggy randomly generated questions and need only be convinced with high confidence. Quite surprisingly, these proof systems have been shown to exist not only for tautologies but for any problem computable in a reasonable amount of memory.
A variation known as a "probabilistically checkable proof system" (PCPs), where Peggy writes down an encoded proof and Victor can make randomized queries to the bits of the proof, has applications for approximations. The "PCP Theorem" optimizes parameters, which in its strong form shows that every language in NP has a PCP where Victor uses a tiny number of random coins and queries only three bits of the proof.
One can use this PCP theorem to show the limitations of approximation for a large number of optimization questions. For example, one cannot approximate the largest clique in a group of n people by more than a multiplicative ratio of nearly unless P = NP. See Madhu Sudan's recent article in Communications for more details and references on PCPs.36
One can do even better assuming a "Unique Games Conjecture" that there exists PCPs for NP problems with some stronger properties. Consider the MAX-CUT problem of dividing people discussed earlier. If the unique games conjecture holds one cannot do better than the .878567 factor given by the Goemans-Williamson approximation algorithm.26 Recent work shows how to get a provably best approximation for essentially any constrained problem assuming this conjecture.30
We expect P ≠ NP to hold in very strong ways. We can use strong hardness assumptions as a positive tool, particularly to create cryptographic protocols and to reduce or even eliminate the need of random bits in probabilistic algorithms.
Back to Top
In "What If P = NP?" we saw the nice world that arises when we assume P = NP. But we expect P ≠ NP to hold in very strong ways. We can use strong hardness assumptions as a positive tool, particularly to create cryptographic protocols and to reduce or even eliminate the need of random bits in probabilistic algorithms.
Cryptography. We take it for granted these days, the little key or lock on our Web page that tells us that someone listening to the network won't get the credit card number I just sent to an online store or the password to the bank that controls my money. But public-key cryptography, the ability to send secure messages between two parties that have never privately exchanged keys, is a relatively new development based on hardness assumptions of computational problems.
If P = NP then public-key cryptography is impossible. Assuming P ≠ NP is not enough to get public-key protocols, instead we need strong average-case assumptions about the difficulty of factoring or related problems.
We can do much more than just public-key cryptography using hard problems. Suppose Alice's husband Bob is working on a Sudoku puzzle and Alice claims she has a solution to the puzzle (solving a n × n Sudoku puzzle is NP-complete). Can Alice convince Bob that she knows a solution without revealing any piece of it?
Alice can use a "zero-knowledge proof," an interactive proof with the additional feature that the verifier learns nothing other than some property holds, like a Sudoku puzzle having a solution. Every NP search problem has a zero-knowledge proof under the appropriate hardness assumptions.
Online poker is generally played through some "trusted" Web site, usually somewhere in the Caribbean. Can we play poker over the Internet without a trusted server? Using the right cryptographic assumptions, not only poker but any protocol that uses a trusted party can be replaced by one that uses no trusted party and the players can't cheat or learn anything new beyond what they could do with the trusted party.b
Eliminating Randomness. In the 1970s we saw a new type of algorithm, one that used random bits to aid in finding a solution to a problem. Most notably we had probabilistic algorithms35 for determining whether a number is prime, an important routine needed for modern cryptography. In 2004, we discovered we don't need randomness at all to efficiently determine if a number is prime.2 Does randomness help us at all in finding solutions to NP problems?
Truly independent and uniform random bits are either very difficult or impossible to produce (depending on your beliefs about quantum mechanics). Computer algorithms instead use pseudorandom generators to generate a sequence of bits from some given seed. The generators typically found on our computers usually work well but occasionally give incorrect results both in theory and in practice.
We can create theoretically better pseudorandom generators in two different ways, one based on the strong hardness assumptions of cryptography and the other based on worst-case complexity assumptions. I will focus on this second approach.
We need to assume a bit more than P ≠ NP, roughly that NP-complete problems cannot be solved by smaller than expected AND-OR-NOT circuits. A long series of papers showed that, under this assumption, any problem with an efficient probabilistic algorithm also has an efficient algorithm that uses a pseudorandom generator with a very short seed, a surprising connection between hard languages and pseudo-randomness (see Impagliazzo23). The seed is so short we can try all possible seeds efficiently and avoid the need for randomness altogether.
Thus complexity theorists generally believe having randomness does not help in solving NP search problems and that NP-complete problems do not have efficient solutions, either with or without using truly random bits.
While randomness doesn't seem necessary for solving search problems, the unpredictability of random bits plays a critical role in cryptography and interactive proof systems and likely cannot be avoided in these scenarios.
Back to Top
Could Quantum Computers Solve NP-Complete Problems?
While we have randomized and nonrandomized efficient algorithms for determining whether a number is prime, these algorithms usually don't give us the factors of a composite number. Much of modern cryptography relies on the fact that factoring or similar problems do not have efficient algorithms.
In the mid-1990s, Peter Shor34 showed how to factor numbers using a hypothetical quantum computer. He also developed a similar quantum algorithm to solve the discrete logarithm problem. The hardness of discrete logarithm on classical computers is also used as a basis for many cryptographic protocols. Nevertheless, we don't expect that factoring or finding discrete logarithms are NP-complete. While we don't think we have efficient algorithms to solve factoring or discrete logarithm, we also don't believe we can reduce NP-complete problems like Clique to the factoring or discrete logarithm problems.
So could quantum computers one day solve NP-complete problems? Unlikely.
I'm not a physicist so I won't address the problem as to whether these machines can actually be built at a large enough scale to solve factoring problems larger than we can with current technology (about 200 digits). After billions of dollars of funding of quantum computing research we still have a long way to go.
Even if we could build these machines, Shor's algorithm relies heavily on the algebraic structures of numbers that we don't see in the known NP-complete problems. We know that his algorithm cannot be applied to generic "black-box" search problems so any algorithm would have to use some special structure of NP-complete problems that we don't know about. We have used some algebraic structure of NP-complete problems for interactive and zero-knowledge proofs but quantum algorithms would seem to require much more.
Lov Grover19 did find a quantum algorithm that works on general NP problems but that algorithm only achieves a quadratic speed-up and we have evidence that those techniques will not go further.
Meanwhile quantum cryptography, using quantum mechanics to achieve some cryptographic protocols without hardness assumptions, has had some success both in theory and in practice.
Back to Top
A New Hope?
Ketan Mulmuley and Milind Sohoni have presented an approach to the P versus NP problem through algebraic geometry, dubbed Geometric Complexity Theory, or GCT.29
This approach seems to avoid the difficulties mentioned earlier, but requires deep mathematics that could require many years or decades to carry through.
In essence, they define a family of high-dimension polygons Pn based on group representations on certain algebraic varieties. Roughly speaking, for each n, if Pn contains an integral point, then any circuit family for the Hamiltonian path problem must have size at least nlog n on inputs of size n, which implies P ≠ NP. Thus, to show that P ≠ NP it suffices to show that Pn contains an integral point for all n.
Although all that is necessary is to show that Pn contains an integral point for all n, Mulmuley and Sohoni argue that this direct approach would be difficult and instead suggest first showing that the integer programming problem for the family Pn is, in fact, in P. Under this approach, there are three significant steps remaining:
- Prove that the LP relaxation solves the integer programming problem for Pn in polynomial time;
- Find an efficient, simple combinatorial algorithm for the integer programming problem for Pn, and;
- Prove that this simple algorithm always answers "yes."
Since the polygons Pn are algebro-geometric in nature, solving (1) is thought to require algebraic geometry, representation theory, and the theory of quantum groups. Mulmuley and Sohoni have given reasonable algebro-geometric conditions that imply (1). These conditions have classical analogues that are known to hold, based on the Riemann Hypothesis over finite fields (a theorem proved by André Weil in the 1960s). Mulmuley and Sohoni suggest that an analogous Riemann Hypothesis-like statement is required here (though not the classical Riemann Hypothesis).
Although step (1) is difficult, Mulmuley and Sohoni have provided definite conjectures based on reasonable mathematical analogies that would solve (1). In contrast, the path to completing steps (2) and (3) is less clear. Despite these remaining hurdles, even solving the conjectures involved in (1) could provide some insight to the P versus NP problem.
Mulmuley and Sohoni have reduced a question about the nonexistence of polynomial-time algorithms for all NP-complete problems to a question about the existence of a polynomial-time algorithm (with certain properties) for a specific problem. This should give us some hope, even in the face of problems (1)(3).
Nevertheless, Mulmuley believes it will take about 100 years to carry out this program, if it works at all.
Back to Top
This survey focused on the P versus NP problem, its importance, our attempts to prove P ≠ NP and the approaches we use to deal with the NP-complete problems that nature and society throws at us. Much of the work mentioned required a long series of mathematically difficult research papers that I could not hope to adequately cover in this short article. Also the field of computational complexity goes well beyond just the P versus NP problem that I haven't discussed here. In "Further Reading," a number of references are presented for those interested in a deeper understanding of the P versus NP problem and computational complexity.
The P versus NP problem has gone from an interesting problem related to logic to perhaps the most fundamental and important mathematical question of our time, whose importance only grows as computers become more powerful and widespread. The question has even hit popular culture appearing in television shows such as The Simpsons and Numb3rs. Yet many only know of the basic principles of P versus NP and I hope this survey has given you a small feeling of the depth of research inspired by this mathematical problem.
Proving P ≠ NP would not be the end of the story, it would just show that NP-complete problem, don't have efficient algorithms for all inputs but many questions might remain. Cryptography, for example, would require that a problem like factoring (not believed to be NP-complete) is hard for randomly drawn composite numbers.
Proving P ≠ NP might not be the start of the story either. Weaker separations remain perplexingly difficult, for example showing that Boolean-formula Satisfiability cannot be solved in near-linear time or showing that some problem using a certain amount of memory cannot be solved using roughly the same amount of time.
None of us truly understands the P versus NP problem, we have only begun to peel the layers around this increasingly complex question. Perhaps we will see a resolution of the P versus NP problem in the near future but I almost hope not. The P versus NP problem continues to inspire and boggle the mind and continued exploration of this problem will lead us to yet even new complexities in that truly mysterious process we call computation.
Back to Top
Recommendations for a more in-depth look at the P versus NP problem and the other topics discussed in this article:
- Steve Homer and I have written a detailed historical view of computational complexity.14
- The 1979 book of Garey and Johnson still gives the best overview of the P versus NP problem with an incredibly useful list of NP-complete problems.16
- Scott Aaronson looks at the unlikely possibility that the P versus NP problem is formally independent.1
- Russell Impagliazzo gives a wonderful description of five possible worlds of complexity.22
- Sanjeev Arora and Boaz Barak have a new computational complexity textbook with an emphasis on recent research directions.5
- The Foundations and Trends in Theoretical Computer Science journal and the Computational Complexity columns of the Bulletin of the European Association of Theoretical Computer Science and SIGACT News have many wonderful surveys on various topics in theory including those mentioned in this article.
- Read the blog Computational Complexity and you will be among the first to know about any updates of the status of the P versus NP problem.13
Back to Top
Thanks to Rahul Santhanam for many useful discussions and comments. Josh Grochow wrote an early draft. The anonymous referees provided critical advice. Some of the material in this article has appeared in my earlier surveys and my blog.13
Back to Top
1. Aaronson, S. Is P versus NP formally independent? Bulletin of the European Association for Theoretical Computer Science 81 (Oct. 2003).
2. Agrawal, M., Kayal, N., and Saxena, N. PRIMEs. In Annals of Mathematics 160, 2 (2004) 781793.
3. Applegate, D., Bixby, R., Chvátal, V., and Cook, W. On the solution of traveling salesman problems. Documenta Mathematica, Extra Volume ICM III (1998), 645656.
4. Arora, S. Polynomial time approximation schemes for Euclidean traveling salesman and other geometric problems. J. ACM 45, 5 (Sept. 1998), 753782.
5. Arora, S. and Barak, B. Complexity Theory: A Modern Approach. Cambridge University Press, Cambridge, 2009.
6. Baker, T., Gill, J., and Solovay, R. Relativizations of the P = NP question. SIAM Journal on Computing 4, 4 (1975), 431442.
7. Cantor, G. Ueber eine Eigenschaft des Inbegriffes aller reellen algebraischen Zahlen. Crelle's Journal 77 (1874), 258262.
8. Cipra, B. This Ising model is NP-complete. SIAM News 33, 6 (July/Aug. 2000).
9. Conitzer, V. and Sandholm, T. New complexity results about Nash equilibria. Games and Economic Behavior 63, 2 (July 2008), 621641.
10. Cook, S. The complexity of theorem-proving procedures. In Proceedings of the 3rd ACM Symposium on the Theory of Computing, ACM, NY, 1971, 151158.
11. Downey, R. and Fellows, M. Parameterized Complexity. Springer, 1999.
12. Edmonds, J. Paths, trees and owers. Canadian Journal of Mathematics 17, (1965), 449467.
13. Fortnow, L. and Gasarch, W. Computational complexity; http://weblog.fortnow.com.
14. Fortnow, L. and Homer, S. A short history of computational complexity. Bulletin of the European Association for Theoretical Computer Science 80, (June 2003).
15. Furst, M., Saxe, J., and Sipser, M. Parity, circuits and the polynomial-time hierarchy. Mathematical Systems Theory 17 (1984), 1327.
16. Garey, M. and Johnson, D. Computers and Intractability. A Guide to the Theory of NP-Completeness. W. H. Freeman and Company, NY, 1979.
17. Goemans, M. and Williamson, D. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM 42, 6 (1995), 11151145.
18. Goldreich, O. Foundations of cryptographyla primer. Foundations and Trends in Theoretical Computer Science 1, 1 (2005) 1116.
19. Grover, L. A fast quantum mechanical algorithm for database search. In Proceedings of the 28th ACM Symposium on the Theory of Computing. ACM, NY, 1996, 212219.
20. Gusfield, D. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.
21. Haken, A. The intractability of resolution. Theoretical Computer Science, 39 (1985) 297305.
22. Impagliazzo, R. A personal view of average-case complexity theory. In Proceedings of the 10th Annual Conference on Structure in Complexity Theory. IEEE Computer Society Press, 1995, 134147.
23. Impagliazzo, R. and Wigderson, A. P = BPP if E requires exponential circuits: Derandomizing the XOR lemma. In Proceedings of the 29th ACM Symposium on the Theory of Computing. ACM, NY, 1997, 220229.
24. Karp, R. Reducibility among combinatorial problems. Complexity of Computer Computations. R. Miller and J. Thatcher, Eds. Plenum Press, 1972, 85103.
25. Khot, S., Kindler, G., Mossel, E., and O'Donnell, R. Optimal inapproximability results for MAX-CUT and other 2-variable CsPs? SIAM Journal on Computing 37, 1 (2007), 319357.
26. Lathrop, R. The protein threading problem with sequence amino acid interaction preferences is NP-complete. Protein Engineering 7, 9 (1994), 10591068.
27. Levin, L. Universal'nyie perebornyie zadachi (Universal search problems: in Russian). Problemy Peredachi Informatsii 9, 3 (1973), 265266. Corrected English translation.37
28. Levin, L. Average case complete problems. SIAM Journal on Computing 15, (1986), 285286.
29. Mulmuley, K. and Sohoni, M. Geometric complexity theory I: An approach to the P vs. NP and related problems. SIAM Journal on Computing 31, 2, (2001) 496526.
30. Raghavendra, P. Optimal algorithms and inapproximability results for every csp? In Proceedings of the 40th ACM Symposium on the Theory of Computing. ACM, NY, 2008, 245254.
31. Razborov, A. Lower bounds on the monotone complexity of some Boolean functions. Soviet Mathematics-Doklady 31, (1985) 485493.
32. Razborov, A. On the method of approximations. In Proceedings of the 21st ACM Symposium on the Theory of Computing. ACM, NY, 1989, 167176.
33. Razborov, A., and Rudich, S. Natural proofs. Journal of Computer and System Sciences 55, 1 (Aug. 1997), 2435.
34. Shor. P. Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM Journal on Computing 26, 5 (1997) 14841509.
35. Solovay, R. and Strassen, V. A fast Monte-Carlo test for primality. SIAM Journal on Computing 6 (1977), 8485. See also erratum 7:118, 1978.
36. Sudan, M. Probabilistically checkable proofs. Commun. ACM 52, 3 (Mar. 2009) 7684.
37. Trakhtenbrot, R. A survey of Russian approaches to Perebor (brute-force search) algorithms. Annals of the History of Computing 6, 4 (1984), 384400.
38. Turing, A. On computable numbers, with an application to the Etscheidungs problem. Proceedings of the London Mathematical Society 42 (1936), 230265.
39. van Melkebeek, D. A survey of lower bounds for satisfiability and related problems. Foundations and Trends in Theoretical Computer Science 2, (2007), 197303.
Back to Top
Lance Fortnow (email@example.com) is a professor of electrical engineering and computer science at Northwestern University's McCormick School of Engineering, Evanston, IL.
Back to Top
Back to Top
Figure. The software written for this illustration makes a stylized version of a network graph that draws connections between elements based on proximity. The graph constantly changes as the elements sort themselves.
Figure. NP can be seen as a graph where every element is connected to every other element. Over these pages a deconstruction of the graph is shown.
Back to top
©2009 ACM 0001-0782/09/0900 $10.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2009 ACM, Inc.
Jinsei YamaguchiOctober 08, 2009 09:47
I have already solved this problem. The final answer is "P vs NP problem vanishes". For the definition of "vanish" and the proof, see my Japanese site http://www.int2.info/news1.htm
Moshe VardiOctober 09, 2009 11:10
Not reading Japanese, it is hard for me even to understand what the claim is. I'd suggest to Jinsei Yamaguchi to submit his result to a scholarly publication for review.
Michael WojcikNovember 13, 2009 03:57
I'd like to congratulate Dr Fortnow and the CACM editors for one of the best pieces I've read in CACM in years. This article is engaging and clear, on a topic of broad theoretical and practical interest.
Andras OlahMarch 10, 2010 07:31
A problem is just only a matter of debate.
The only thing we can not control is the time.
Assuming that a problem when you dissolve it, its importance is relative, because what can already solve a minor. The problem is the importance of the correct definition of the most important and hence the explanatory memorandum in relation to the proper time.
If a calculation is needed done today, but only months later, I get accurate results. It is no longer important to the results tomorrow. Unless my life depends on it, because it certainly calculated! :)
There are no insoluble problems! You do not always worth the time to devote as much.
AnonymousMarch 15, 2011 10:26
Solution of P versus NP problem.
Here is the link......kindly have a look:
AnonymousApril 02, 2011 06:01
"P=NP" doesn't imply "public-key cryptography becomes impossible". Nor does it imply "short, fully logical proofs". This is because the order for polynomial time algorithms can still be astronomical.
Search, for example, for "inefficient" at http://portal.acm.org/citation.cfm?id=803896
("Linear time algorithm for isomorphism of planar graphs", by Hopcroft / Wong)
I don't think that is a good point to make.
AnonymousMay 15, 2011 12:55
Thank you for an excellent article! (And, if I am correctly using a cultural term from a generation younger than mine "respect" for the "New Hope" reference.)
My own intuition is, that P <> NP, but I would love to be proved wrong.
One line, however, struck me.
> Nevertheless, Mulmuley believes it will take about 100 years to
> carry out this program, if it works at all.
This is the piece in which I have a strong disagreement. I believe that the calculation of "about 100 years" fails to adequately take into account the advances in machine-based theorem proving.
At the risk of sounding "singularitian", I would be willing to wager that the problem will:
EITHER be solved by 2050
OR prove unsolveable, and not be solved by 2500
P AlbertMay 17, 2011 09:16
To the two posters who provided solutions to the P vs NP problem, you solved a different P vs NP problem than the one that everyone else is discussing. :) Good work, though!
AnonymousMarch 19, 2012 03:53
The real issue of P vs NP is it's framed incorrectly. If one thinks of the costs in terms of physics I imagine many problems would disappear. This is the problem when being caught up in the mathematics without a sound understanding of the physics of the world itself. You've framed the problem incorrectly to begin with so you end up pursuing stupid questions.
AnonymousSeptember 16, 2012 11:56
This has been very breadth-full coverage of the topic.Thanks to the editors for the same
View More Comments