Deborah Mayo presented a new version of her critique of the Strong Likelihood Principle. And since a few friends have asked me about what is the Likelihood Principle, I`ll try to explain here why it is important (and if I’m in the mood, why is not). But please, keep in mind I’m no expert on this matters and this is my understanding of the matters. I’ll be happy to be corrected on any of my views.
Roughly speaking, the (weak) Likelihood Principle states that an inference should be made only with the data at hand, i.e., shouldn’t depend on what could have happened, but didn’t. It is known that frequentist statistics violates the (weak) Likelihood Principle, and this is uncontroversial. To see why frequentism violates it, consider this. What is random in frequentism is the sample estimator, however defined. And the sampling distribution depend on what could have happened (but didn’t). So, you get some data, and ask how extreme is you data under a null distribution for your estimator (a statistic). But think about it. You decision say, to reject a hypothesis, depends on considering what could have happened in other experiments that were not performed. Sure, the data at hand is an important piece, but it is meaningless without thinking about the sampling distribution.
As it is known, Bayesian Inference doesn’t violate the (weak) LP. Given your prior, once you get the data, you’ll end with the same posteriori, no matter what you can think about what could have happened in another experiment.
But a principle is just that. A principle. It’s a normative appeal about how things should be. And you may well think that things should be different. So, the fact that frequentism violates the WLP isn’t that bad, as long as you don’t think the principle is an important one to be followed.
The major contention is that Birnbaum, in 1962, published a paper in which he claimed that he could prove that the LP is implied by the Conditionality Principle + Sufficiency Principle. And since most statistitican agree with both principles, then they should follow the Likelihood Principle, i.e. it would be a really important principle to be followed. I never read the original paper by Birnbaum, so take what I’m saying here with a grant of salt.
The Sufficiency principle says that if T is a sufficient statistics for theta, and if in two independents experiments T(x1) = T(x2), then the evidence* about theta by both experiments is the same.
To understand what is a sufficient statistic, think about a uniform distribution with parameters 0 and theta (theta > 0), i.e., X ~ U(0, theta). Suppose I get some data, x1, x2, x3, … xn. I want to make inference about theta. Let’s define T as max (x1, x2, … xn). Now, once I have T, I can discard all information in the data at hand, because there isn’t any more information about theta in the data. In fact, since a Uniform distribution will generate values between 0 and theta, T will give the maximun value I observed in my sample. Theta may be higher than T, but I’m sure it is not less than it and all other information in the data doesn’t provide me any information about the value of theta. In this sense we say that T is a sufficient statistics for theta.
The conditionality principle says that, once we perform an experiment, I may think about what could’ve happened but didn’t, but looking only at what could’ve happend in this particular experiment, not many others that weren’t realized. This is quite sensible, of course.
The Weak Likelihood Principle, as I mentioned above, says that all the information in the sample is reflected on the likelihood function.
Now we can move to the Strong Likelihood Principle (SLP) as (arguably) proved by Brinbau. The SLP says that if you want to follow both the CP and the SP, you must follow the WLP. If you don’t follow the WLP, then you either don’t follow the CP or the SP. This is quite embarrassing for the frequentism, or at least seems so. It doesn’t seem sensible to violate either the CP or the SP. As a Bayesian, I’m supposedly free of this problem (but see this, for instance). But things may be complicated. In what follows, I’ll explain my problem with the LP (the SLP?).
The mainstream approach to study causality in experiments is the Neyman-Rubin potential outcomes (PO) framework. But, reading the main papers by him (Rubin) on this approach, it seems to me that it clearly violates the (S?)LP, by considering what could have happened, but didn’t, at other potential experiments. In fact, the whole idea of a potential outcome is based on thinking about what could’ve happened if a unit had (not) received a treatment, instead of what actually happened. Now, I’m not sure of my interpretation that the potential outcomes approach violates the (S?)LP, specially because Rubin is a Bayesian himself and probably agree with the LP. I even asked about it at Deborah’s Mayo Blog, and at Simon Jackman’s blog, but didn’t get any clear answer on this (Mayo said it was a good question! and Jackman said I was on spot on this!).
So, If I must violate either the SP or the CP or the idea that experiments allow me to estimate average causal effects, I’d rather stick with experiments and abandon the CP, for instance. In fact, it seems to me that the potential outcome framework violates the CP, but I’m not sure of this (less sure than the fact that it violates the LP). To see why the PO would violate the CP, think of this. The CP says that we shouldn’t focus on other experiments that weren’t performed. But, as far as I understand it, the PO approach says we should consider that could have happened on other experiments (actually, every conceivable experiment). So, we’re violating the CP principle.
*I should define evidence, but it happens (or so It seems, see here) that the term is vague even for statisticians, which is rather problematic.