Leonard is so lucky! He's just asked a very complicated question and he's not getting an over-confident and misleading answer. Granted, he was likely hoping for an easier one! But let's dive into it.
"Does": that auxiliary verb packs a punch. How do we know whether something does or doesn't work? It would be great if that were simple, but unfortunately it's not.
I talk a lot here at Statistically Funny about the need for trials and systematic reviews of them to help us find the answers to these questions. But whether we're talking about trials or other forms of research, statistical techniques are needed to help make sense of what emerges from a study.
Too often, this aspect of research is going to lead us down a garden path. It's common for people to take the approach of relying only, or largely, on a statistical significance test of the null hypothesis: the assumption that there is no difference. So if a result is within the range that could occur by chance alone, the assumption of the null hypothesis stands. But if it's not within that range, it's "statistically significant."
However a statistically significant result - especially from a single study - is often misunderstood and contributes to over-confidence about what we know. It's not a magical wand that finds out the truth. I wrote about this in some detail about testing for statistically significance this week over at my Scientific American blog, Absolutely Maybe. Leonard's statistician is a Bayesian: you can find out some more about that, too, in my post.
As chance would have it, there was also a lot of discussion this week in response to a paper published while I was writing that post. It called for a tightening of the threshold for significance, which isn't really the answer either. Thomas Lumley puts that into great perspective over at his wonderful blog, Biased and Inefficient: a very valuable read.
"It": now this part should be easy, right? Actually, this can be particularly tricky. The treatment you could be using may not be very much like the one that was studied. Even if it's a prescription drug, the dose or regimen you're facing might not be the same as the one used in studies. Or it might be used in conjunction with another intervention that could affect how it works.
Then there's the question of whether "it" is even what it says it is. Unlike prescription drugs, the contents of herbal remedies and dietary supplements aren't closely regulated to ensure that what it says on the label is what's inside. That was also recently in the news, and covered in detail here by Emily Willingham.
If it's a non-drug intervention, it's actually highly likely that the articles and other reports of the research don't ever make clear exactly what "it" is. Paul Glasziou had a brainwave about this: he's started HANDI: the Handbook of Non-Drug Intervention. When a systematic reviews shows that something works, the HANDI team wants to dig out all the details and make sure we all know exactly what "it" is.
For example, if you heard that drinking water before meals can help you lose weight, and you want to try it, HANDI helpfully points out what that actually means is drinking half a liter of water before every meal AND having a low-calorie diet. HANDI is new, so there aren't many "it"s explained. But you can see them here.
"Work": this one really needs to get specific. As I point out in the slides from a talk I gave this month, you really need to be thinking about each possible outcome separately - and thinking about the possible adverse effects too. There can be complicated trade-offs between effects, and the quality of the evidence is going to vary for each of them.
Think of it this way: if you do a survey with 150 questions in it, there are going to be more answers to some of the questions than others. For example, if you had 400 survey respondents, they might all have answered the first easy question and there could be virtually no answers to a hard question near the end. So thinking "a survey of 400 people found…" an answer to that later question is going to be seriously misleading.
Then there's the question of how much does it work for that particular outcome? Does a sliver of a benefit count to you as "working"? That might be enough for the person answering your question, but it might not be enough for it to count for you - especially if there are risks, costs or inconvenience involved.
And there's who did it work for in the research? Whether or not research results apply to a person in your situation can be straightforward, but it might not be.
And how high did researchers set the bar? Did the treatment effect have to be superior to doing nothing, or doing something else - or is the information coming from comparing it to something else that itself may not be all that effective? You might think that can't possibly happen, but it does more often than you might think. You can find out about this here at Statistically Funny, where I tackle the issue of drugs that are "no worse (more or less)."
Finally, one of the most common trip-ups of all: did they really measure the outcome, or a proxy for it? If it's a proxy for the real thing, how good is it? The use of surrogate measures or biomarkers is increasing fast: you can learn more about why this can lead to an unreliable answer here.
So while there are many who might have told Leonard, "Yes, it's been proven to work in clinical trials" in a few seconds flat, I wonder how long it would take his statistician to answer the question? There are no stupid questions, but beware of the too-simple answer.