October 22, 2018

Needle-in-a-Haystack Problems

Sometimes the same idea comes flying at you from several directions at once, and you start seeing that idea everywhere. This has been happening to me lately with needle-in-a-haystack problems, a concept that is useful but often goes unrecognized.

A needle-in-a-haystack problem is a problem where the right answer is very difficult to determine in advance, but it’s easy to recognize the right answer if someone points it out to you. Faced with a big haystack, it’s hard to find the needle; but if someone tells you where the needle is, it’s easy to verify that they’re right.

This idea came up in a class discussion of Beth Noveck’s Wiki-Government paper. We were talking about how government can use crowdsourcing to learn from outside experts, and somebody pointed out that crowdsourcing can work well for needle-in-a-haystack problems. If somebody in the crowd knows the answer, they can announce it, and other participants in the discussion will recognize that the proposed answer is correct. This is the basis for the open-source maxim that “given enough eyes, all bugs are shallow”. It’s also the theory underlying the Peer to Patent project, in which distributed experts tried to find the best prior art document that would invalidate a patent application.

The idea came up again in a discussion of online passwords. A highly secure system doesn’t remember your password, because an attacker who managed to observe the stored password could impersonate you. But if the system doesn’t store your password, how can it verify that the password you enter is correct?

The answer is that the system can create a needle-in-a-haystack problem to which your password is the only right answer. By remembering this problem rather than your password, the system will be able to recognize your password when it sees it, but an attacker who learns what the problem is will have great difficulty determining your password. There are standard cryptographic tricks for generating this kind of needle-in-a-haystack problem.

The idea came up a third time when I read a paper with the eye-catching title “Why Most Published Research Findings Are False“. The paper explains why the vast majority of research findings in a field might be false, even if the researchers do everything right; and it suggests that this is the case for some areas of medical research.

To see how this might happen, imagine a needle-in-a-haystack problem that has one right answer and a million wrong answers. Suppose that our procedure for verifying the right answer when we see it is not flawless but instead is wrong 0.01% of the time. Now if we examine all of the 1,000,000 wrong answers, we will incorrectly classify 100 of them (0.01% of 1,000,000) as correct. In total, we’ll find 101 “correct” answers, only one of which is actually correct. The vast majority of “correct” answers — more than 99% of them — will be wrong.

Any research field that is looking for needles in haystacks will tend to suffer from this problem, especially if it relies on statistical studies, which by definition can never yield absolute 100% confidence.

Which brings us back to crowdsourcing. If we’re not perfect at recognizing proposed answers as correct — and no human process is perfect — then a crowdsourcing approach to needle-in-a-haystack problems could well yield wrong answers most of the time.

How can we avoid this trap? There are only two ways out: reduce the size of the haystack, or improve our procedure for evaluating candidate answer. Here crowdsourcing could be helpful. If we’re lucky, vigorous discussion within the crowd will help to smoke out the false positives. It’s not just access to experts that helps, but dialog among the experts.

Comments

  1. Anonymous says:

    P = find a needle in a haystack efficiently; NP = recognize a needle efficiently.

  2. Yuliy Pisetsky says:

    You note that even if it is easy to verify that a crowdsourced solution is correct, the risk of false positives is significant.

    However, if the solutions are also being published to the same wide audience, the risk of misclassification goes down significantly, provided that the verification is actually cheap.

    The problem brought up in the Ioannidis paper manifests itself because the verification cost is nontrivial: you have to run your own study.

  3. Anonymous says:

    This is why part of the scientific process is the independent replication of research results. That .01% chance of a false positive becomes .000001% with one independent replication, and .0000000001% with just two.

    • That’s true, but it’s rare for a single study to give you 0.01% chance of false positive (FP). If, instead, you insist on 1% FP, then replicating the study will only get you to 0.01% FP. Or if you use 5% FP — which I understand is common in some fields — then even with the initial study plus two independent replications, you still get to about 0.01% false positive rate.

      • And even this kind of statistical approximation is optimistic, because the payoffs for attempting to replicate experiments are highly variable, and may depend on the result. (If, for example, you discover that your professor’s result doesn’t replicate…)

        Software is a special case, because many bugs are relatively easy to reproduce, and in general discovering bugs results in accolades from peers. At the other end of the gamut are results generated in big-science projects, which are more elephant-in-the-dark-room than needle in a haystack.

  4. Anonymous says:

    Why are comments closed after only 8 days now? I only check this site once a week or so. Please at least raise it back up to the two weeks or so that it had been, if not get rid of that “feature” entirely.

  5. Anonymous says:

    Yes, the wisdom of the crowds is like quantum computing. Linux is like Shor’s factorization algorithm.