Basically, the author (Chen) argues that languages with different structures about the difficulty to speak about the future affect how much people will save (remember that saving is the same as to postpone present consumption). A good critical summary can be found here (by Tom Pepinsky). And here’s the abstract:
Languages differ widely in the ways they partition time. In this paper I test the hypothesis that languages which grammatically distinguish between present and future events (what linguists call strong-FTR languages) lead their speakers to take fewer future-oriented actions. First, I show how this prediction arises naturally when well-documented effects of language on cognition are merged with models of decision making over time. Then, I show that consistent with this hypothesis, speakers of strong-FTR languages save less, hold less retirement wealth, smoke more, are more likely to be obese, and suffer worse longrun health. This is true in every major region of the world and holds even when comparing only demographically similar individuals born and living in the same country. While not conclusive, the evidence does not seem to support the most obvious forms of common causation. Implications of these findings for theories of intertemporal choice are discussed.
I don’t have much to add to the substantive discussion of the paper, since I didn’t have time to read it in detail neither know anything of linguistics. However, I’d like to comment about the regression technique they used in part of the paper, conditional logistic regression, since Tom Pepinsky said that political scientists don’t understand it very well.
Conditional logistic regression is a way to estimate fixed effects model with logistic regression. I used it in the beginning of the writing of my Phd thesis, before I learned about Hierarchical Bayes. It allows you to condition on fixed effects, but you’re note able to estimate it (i.e. you can’t report the fixed effects). That’s one of the reasons for you to estimate a Bayesian Hierarchical model (you can see the estimates for the random effects).
Anyway, backing to the paper, they said that they interacted all categories of the controls with the fixed effects ( don’t know how to do this in Stata, but I’ll assume they do), resulting in more than one million categories being estimated. Since the number of parameters is huge and the sample size is at most 150,000 (in one of the models), the first question is how is it possible to estimate more parameters than you have data points.
The answer, I guess, is related to the way that conditional logistic regression is estimated. The math is hard, but if I remember correctly, they’re kind of integrated away and you end up estimating only the parameters which are not fixed effects, but conditional on the fixed effects. However, it’s still weird that you condition on a number of unknown parameters which is bigger than the number of data points.
Second, I think they really should use some kind of regularized regression or Hierarchical Bayes, which is obviously suited to this, although the number of parameters being that size will make it practically impossible (or I think so, even using Stan).
Third, and perhaps most important, he can’t understand the model he’s fitting, due to the the way he’s fitting. It would be much better if he had fitted a simpler model, with a varying intercept by Country and wave (or person), and the controls interacting among them (but not with the varying intercepts). In fact, I don’t think it makes sense to interact the controls with the varying intercepts (the so called fixed effects in his paper). Maybe he’s doing this to approximate a varying slope by country and wave, but if that’s the case, he should fit a varying slope, varying intercept model. This is pretty straight with Hierarchical Bayes.
Last, but not least, with multiple comparisons, you should correct the p-value for multiple comparisons. Again, this is pretty straight with hierarchical Bayes (actually, it’s kind of automatic!), but he could do it with classical techniques.
All this to say that the I think it’s possible to improve the analysis and increase the confidence in the model fitted*. But, again, these are my quick reactions based on a somewhat sloppy reading of the paper.
* As far as I could see, there was no measure of model fit besides R-square in linear models (there were linear models as well).