The Most Important Unsolved Problem in Computer Science

When the Clay Mathematics Institute put individual $1-million prize bounties on seven unsolved mathematical problems, they may have undervalued one entry—by a lot. If mathematicians were to resolve, in the right way, computer science’s “P versus NP” question, the result could be worth worlds more than $1 million—they’d be cracking most online-security systems, revolutionizing science and even mechanistically solving the other six of the so-called Millennium Problems, all of which were chosen in the year 2000. It’s hard to overstate the stakes surrounding the most important unsolved problem in computer science.

P versus NP concerns the apparent asymmetry between finding solutions to problems and verifying solutions to problems. For example, imagine you’re planning a world tour to promote your new book. You pull up Priceline and start testing routes, but each one you try blows your total trip budget. Unfortunately, as the number of cities grows on your worldwide tour, the number of possible routes to check skyrockets exponentially, rapidly making it infeasible even for computers to exhaustively search through every case. But when you complain, your agent writes back with a solution sequence of flights. You can easily verify whether or not their route stays in budget by simply checking that it hits every city and summing the fares to compare against the budget limit. Notice the asymmetry here: finding a solution is hard, but verifying a solution is easy.

The P versus NP question asks whether this asymmetry is real or an illusion. If you can efficiently verify a solution to a problem, does that imply that you can also efficiently find a solution? Perhaps a clever shortcut can circumvent searching through zillions of potential routes. For example, if your agent instead wanted you to find a sequence of flights between two specific remote airports while obeying the budget, you might also throw up your hands at the similarly immense number of possible routes to check, but in fact, this problem contains enough structure that computer scientists have developed a fast procedure (algorithm) for it that bypasses the need for exhaustive search.

You might think this asymmetry is obvious: of course one would sometimes have a harder time finding a solution to a problem than verifying it. But researchers have been surprised before in thinking that that’s the case, only to discover last-minute that the solution is just as easy. So every attempt in which they try to resolve this question for any single scenario only further exposes how monumentally difficult it is to prove one way or another. P versus NP also rears its head everywhere we look in the computational world well beyond the specifics of our travel scenario—so much so that it has come to symbolize a holy grail in our understanding of computation.

In the subfield of theoretical computer science called complexity theory, researchers try to pin down how easily computers can solve various types of problems. P represents the class of problems they can solve efficiently, such as sorting a column of numbers in a spreadsheet or finding the shortest path between two addresses on a map. NP represents the class of problems for which computers can verify solutions efficiently. Our book tour problem, called the Traveling Salesperson Problem by academics, lives in NP because we have an efficient procedure for verifying that our agent’s solution worked.

Notice that NP actually contains P as a subset because solving a problem outright is one way to verify a solution to it. For example, how would you verify that 27 x 89 = 2,403? You would solve the multiplication problem yourself and check that your answer matches the claimed one. We typically depict the relationship between P and NP with a simple Venn diagram:

Venn diagram shows one large circle labeled “NP” encompassing a smaller one labeled “P.” The entire circle is labeled “Problems with solutions that computers can verify easily.” The area inside of P is labeled “Problems with solutions that computers can find easily.” The area in NP but outside of P is labeled “Problems with solutions that computers can verify but not find easily.”
Credit: Amanda Montañez

The region inside of NP but not inside of P contains problems that can’t be solved with any known efficient algorithm. (Theoretical computer scientists use a technical definition for “efficient” that can be debated, but it serves as a useful proxy for the colloquial concept.) But we don’t know if that’s because such algorithms don’t exist or we just haven’t mustered the ingenuity to discover them. Here’s another way to phrase the P versus NP question: Are these classes actually distinct? Or does the Venn diagram collapse into one circle? Do all NP problems admit efficient algorithms? Here are some examples of problems in NP that are not currently known to be in P:

  • Given a social network, is there a group of a specified size in which all of the people in it are friends with one another?
  • Given a varied collection of boxes to be shipped, can all of them be fit into a specified number of trucks?
  • Given a sudoku (generalized to n x n puzzle grids), does it have a solution?
  • Given a map, can the countries be colored with only three colors such that no two neighboring countries are the same color?

Ask yourself how you would verify proposed solutions to some of the problems above and then how you would find a solution. Note that approximating a solution or solving a small instance (most of us can solve a 9 x 9 sudoku) doesn’t suffice. To qualify as solving a problem, an algorithm needs to find an exact solution on all instances, including very large ones.

Each of the problems can be solved via brute-force search (e.g., try every possible coloring of the map and check if any of them work), but the number of cases to try grows exponentially with the size of the problem. This means that if we call the size of the problem n (e.g., the number of countries on the map or the number of boxes to pack into trucks), then the number of cases to check looks something like 2n. The world’s fastest supercomputers have no hope against exponential growth. Even when n equals 300, a tiny input size by modern data standards, 2300 exceeds the number of atoms in the observable universe. After hitting “go” on such an algorithm, your computer would display a spinning pinwheel that would outlive you and your descendants.

Thousands of other problems belong on our list. From cell biology to game theory, the P versus NP question reaches into far corners of science and industry. If P = NP (i.e., our Venn diagram dissolves into a single circle) and we obtain fast algorithms for these seemingly hard problems, then the whole digital economy would become vulnerable to collapse. This is because much of the cryptography that secures such things as your credit card number and passwords works by shrouding private information behind computationally difficult problems that can only become easy to solve if you know the secret key. Online security as we know it rests on unproven mathematical assumptions that crumble if P = NP.

Amazingly, we can even cast math itself as an NP problem because we can program computers to efficiently verify proofs. In fact, legendary mathematician Kurt Gödel first posed the P versus NP problem in a letter to his colleague John von Neumann in 1956, and he expressed (in older terminology) that P = NP “would have consequences of the greatest importance. Namely, it would obviously mean that … the mental work of a mathematician concerning yes-or-no questions could be completely replaced by a machine.”

If you’re a mathematician worried for your job, rest assured that most experts believe that P does not equal NP. Aside from the intuition that sometimes solutions should be harder to find than to verify, thousands of the hardest NP problems that are not known to be in P have sat unsolved across disparate fields, glowing with incentives of fame and fortune, and yet not one person has designed an efficient algorithm for a single one of them.

Of course, gut feeling and a lack of counterexamples don’t constitute a proof. To prove that P is different from NP, you somehow have to rule out all potential algorithms for all of the hardest NP problems, a task that appears out of reach for current mathematical techniques. In fact, the field has coped by proving so-called barrier theorems, which say that entire categories of tempting proof strategies to resolve P versus NP cannot succeed. Not only have we failed to find a proof but we also have no clue what an eventual proof might look like.