Hume, Kant, Causality in Statistics and the no-free-lunch theorems

Note: In what follows, I take some notes on some thoughts about how learning is possible. My notes try to connect Hume, Kant and Nietzsche with statistics (Bayesian inference and causal models) and computer science (machine learning results like the no-free-theorem and machine-learning models). Be advised that things are confused in my mind, I’m no expert in none of these fields and I’m writing in English.


Hume famously said that

“When we look about us towards external objects, and consider the operation of causes, we are never able, in a single instance, to discover any power or necessary connexion; any quality, which binds the effect to the cause, and renders the one an infallible consequence of the other. We only find, that the one does actually, in fact, follow the other. ” (add reference).

With this and other statements, Hume is arguing that causality – understood as a necessary connection between cause and effect – can never be found and be proved, since there is nothing in our sensible experience that show to us the inner connection between things. See, also, the following quote from Hume:

“When I see, for instance, a billiard-ball moving in a straight line towards another; even suppose motion in the second ball should by accident be suggested to me, as the result of their contact or impulse; may I not conceive, that a hundred different events might as well follow from the cause? … All these suppositions are consistent and conceivable.”

However, as a practical question, we do act as if learn is possible. Don’t we know that we can’t fly, or that when I write in my native language, Portuguese, people around mean understand me, or that when I cross a street if a car hits me, it will harm me (I may even be killed)? In fact, if we look at modern science, we see lots and lots of claims of causality being discovered, lots of cause and effect relationship being established that it is quite hard to reconcile Hume’s view and modern science view.  In a sense, the “surprise” is even bigger when we consider several recent development on statistics about how to proper identify causal effects with data – I’m thinking here of the Neyman-Rubin causal model and Pearl causal models.

This surprise isn’t new, of course. At the time when Hume wrote these lines, Newton mechanics was gaining more and more ground as the paradigmatic science, which of course seems to contradict Hume’s view. Kant famously discussed this, and attempted to solve Hume’s conundrum – or so him said. Kant reframed Hume’s argument bout how is  learning possible into “How are synthetic a priori propositions possible?”

Says Kant in the prolegomena ($ 30):

the pure concepts of the understanding have no meaning when they depart themselves from objects of experience and refer to things in themselves (noumena).  do not refer to objects of experience. They serve, as it were, only to spell appearances, that we may be able to read them as experience (my translation, from the original, English and Portuguese translations).

By this, he means that the experience is only possible because we have these mental faculties that allow us to understand the data. In more modern terms and more suited to our discussion, it’s the transcendental that allows us to learn from experience.

If Kant solved or not Hume’s problem, is debatable. Nietzsche, for instance, criticized Kant’s solution by arguing that Kant didn’t really found the mental faculties – the transcendental -, but invented it. And maybe the habit (Hume own’s solution to his own problem) as a practical illusion is what allows us to learn (but if it is an illusion, do we really learn? And what we mean by learn, after all?).

My purpose on bringing these questions here is not to make an exegesis of these authors to try to understand what they really meant. Rather, I’d like to bring some developments on the statistics and my experience with programming – computer science – to try to look at these questions from a new point of view and, maybe, simultaneously illuminate both the philosophical problems and the scientific questions on  causality and learning.

I don’t have the time right now to develop the statistics and programming side of the issue. So, for now We can consider that I only posed the problem. But to avoid only put the problem, I”ll make some appointments about what I’m thinking that connect both statistics and programming on these issues.

About the Nietzsche-Kant controversy (if I may call it so), I’d like to point (not in favor of Kant,neither against Nietzsche) the “grue ridle”, brought to me by the first time by Scott Aaronson. My discussion on this point follows him almost verbatim. Aaronson discusses how do we learn that a hypothesis is true – or at least, more likely than other hypothesis in a probabilistic sense – and then presents the following problem. Assume we’re asked to consider the hypothesis “All emeralds are green”. Then, Aaronson asks: why do we favor this hypothesis against, say, “All emeralds are green before 2030, and then blue afterwards”.

Of course the second hypothesis is more complicated. But think about it a little bit further. If, in our language, we didn’t have definitions for blue and green, but only a word grue, meaning “green before 2030, and then blue afterwards”, and a word bleen, meaning “blue before 2030, then green afterwards”, then the hypothesis “all emeralds are grue” is manifestly more simple than the alternative hypothesis “all emeralds are grue before 2030, then bleen afterwards”. How can we make sense of this riddle?

If we think in what Kant is saying, we can think of our language and categories, as that only practical way of understanding our experience. Or we can think, against Kant and for Nietzsche, that instead of asking why such and such judgments are possible, we should think of them as why they’re necessary.  And they’re necessaries because that how we learn and use our learning in our day-to-day activities.

But this answer is unsatisfactory on light of the modern achievements of computer science, mostly on the field of machine learning – and event of science itself, with it’s claims to proven causality. Here, the Bayesian approach may help us a little bit.

Bayes theorem is claimed (add reference) to be the rational way of learning from experience (data). I’m not so sure things are that simple, but let me add this bit. There is one hypothesis that, as Aaronson says, is fundamental to Bayes theorem to work: The future must resemble the past in some way for some learning be possible. If the past says nothing about the future, then no learning is possible.

This assumption was perceived long ago by the first Bayesians like De Finetti, when the concept of exchangeability was introduced. To understand the importance of this concept, think of the lottery. What past numbers says about the probability of new numbers being draw? In general, nothing, and no learning is possible. The numbers of the lottery are considered to be truly independent and, in this case, no learning is possible. And that’s why De Finneti introduced the concept of exchangeability: we need a weaker (less restrictive) notion of independence that admit some form of dependence in order to learn.

However, I’m not sure that exchangeability (with the problems that it has) is enough to justify that learning is possible. Consider the so called no-free-lunch theorems in machine learning. Roughly speaking, these theorems proved the claim by Hume that given my experience, any number of hypothesis are equally plausible. Saying a little bit more formally, I mean that, given n data-points, and considering the mis-classification rate as the basis of my loss function, evaluating any algorithm by this loss function in out-of-sample data, there is no a priori reason to think that any algorithm will perform better than others. One interpretation is that there is no universal better algorithm. All algorithms are equal, in the sense that there is always possible to find a set of k data points where one algorithm performs better than another one.

My take on these results is that theory is necessary to reduce the hypothesis space and also to model unlikely or uninteresting sample space where an algorithm will perform better than our preferred one. So, connecting all the dots: I need theory, language and some assumptions (like the future resemble the past) for learning be possible at all. And this means that, strictly speaking, Hume was right. But, for practical purposes, we can consider that learning does happen and causality can be proved and founded, on grounds that how human organize our experience (And given the practical take on view of thing by Hume it’s not even clear that my answer isn’t Humean in spirit).

These are more or less what I intend to talk about. However, I still want to introduce the Neyman-Rubin causal model in this conversation (And maybe Pearl). I understand things are confused in my mind, and probably here.


Sobre Manoel Galdino

Corinthiano, Bayesiano e Doutor em ciência Política pela USP.
Esse post foi publicado em ciência, english, estatística, orquídeas selvagens e marcado , , , , , , , , , , , , , , . Guardar link permanente.

2 respostas para Hume, Kant, Causality in Statistics and the no-free-lunch theorems

  1. Glauco Peres da Silva disse:

    I hope there is no problem in a long comment in Portuguese to this post.
    A meu ver, há dois conjuntos de argumentos tratados aqui: 1) Os modelos de inferência causal, que ele articula, são formas de extrair regras gerais a partir de evidências estatisticamente tratáveis;2) Os debates filosóficos sobre ‘understanding’ e ‘explanation’, que no texto ele me parece juntar em uma única coisa, como ‘learning’. Então, só aí já poderíamos discutir o texto em 3 áreas diferentes: em cada uma das duas apresentadas e na conexão entre elas. Há um mundo aí. Aquele texto a que me referi no teu post do Wesley Salmon discute esse mundo 2, em que diz ele é possível superar as proposições de Hume. Eu não sei ainda como ele faz isso.
    Seja como for, por conta da existência de alguns textos mais atuais, me incomoda um pouco o fato dele colocar as coisas em Hume e Kant e não em ninguém mais contemporâneo. Posso estar equivocado, mas no meu modo de ver só reforça o meu ponto: não temos formação nessa área e tratamos do tema (no caso, sobre causalidade) muito mais calcados no 1o. conjunto de argumentos acima.
    Assim como o autor, eu não vou dizer que domino isto tudo. Longe disso. Devo saber bem menos do que o Manoel. Sei só que essa diferença entre understanding e explanation existe para alguns filósofos contemporâneos que se preocupam com causalidade. Explanation pode se referir a perguntas do tipo “how-to-do”, “what the meaning” ou “why”. Dentre estas últimas estariam as científicas, ainda que as perguntas científicas não envolvam todas perguntas do tipo “why”. Do lado do ‘understanding’ haveria outras formas, como o empático, o simbólico, o “goal-oriented” e o científico. Neste meio todo se colocaria a discussão sobre causalidade…
    Eu acho muito válidas as exposições, tanto do Rogério quanto do Manoel para que possamos pensar em que medida estas discussões envolvem nossos trabalhos. Vcs estão de parabéns tanto pela iniciativa, quanto pelo conteúdo dos textos.

  2. Eu não sei se entendi a diferença entre entendimento e explicação, mas me pergunto se essa distinção se aplicaria aos conputadores. O deep blue, que jogou com o Kasparov e venceu, aprendeu a jogar xaderez bem. Mas podemos dizer que ele tinha um entendimento ou mesmo explicação do jogar xadrez? Ou ainda, o Watson, computador da IBM que ganhou no jeopardy, conseguia responder às perguntas do jogo com sucesso (na média, melhor que humanos). Mas ele entendia as perguntas? E será que nosso processo cerebral de “entendimento” das coisas é tão diferente desses casos de máquinas? Ou apenas nosso cérebo ainda está muito na frente desses exemplos, pois nosso “algoritmo” é muito melhor (posto que mais flexível e adequado pra muitos tasks diferentes)? Veja por exemplo, esse link:

Deixe um comentário

Preencha os seus dados abaixo ou clique em um ícone para log in:

Logotipo do

Você está comentando utilizando sua conta Sair / Alterar )

Imagem do Twitter

Você está comentando utilizando sua conta Twitter. Sair / Alterar )

Foto do Facebook

Você está comentando utilizando sua conta Facebook. Sair / Alterar )

Foto do Google+

Você está comentando utilizando sua conta Google+. Sair / Alterar )

Conectando a %s