Probability
# Interpreting small probabilities
I suspect that there is a human limit to the size of a probability we find meaningful. Just like how the size of certain numbers are incomprehensible to humans, some probabilities are so unlikely that they are nearly meaningless. Similarly, Bernoulli conjectured that people neglect small probability events.
Nicholas Bernoulli who can be held liable as the creator of the Petersburg gamble suggested that more than five tosses of heads are morally impossible. This proposition is experimentally tested through the elicitation of subjects‘ willingnesstopay for various truncated versions of the Petersburg gamble that differ in the maximum payoff. In fact, the experimental data show that all versions of the Petersburg gamble which allow for more than six repeated tosses of tails elicit the same willingnesstopay. From this evidence it is concluded that subjects neglect those outcomes in the Petersburg gamble which occur with a probability smaller than or equal to one in sixtyfour, because, given this level, the alternative explanations seem implausible. (Neugebauer 2010, p. 3)
# Kolmogorov Axioms
 The probability of an event is a nonnegative real number
 The probability that at least one of the possible events happen is 1
 Given a set of mutually exclusive events, the probability of all of them happening is the probability of each event happening summed up
As a result, we get:
 $0 \leq P(A) \leq 1$
 $P(\lnot A) = 1  P(A)$
# Joint probability
Probability of both A and B happening. Intersection of the areas of the two events.
# Marginalization Rule
For some random variable X
$$P(A) = \sum_{x \in \mathcal{X}}P(A \cap X = x)$$
For example, to roll some even number,
$$P(\textrm{even}) = \sum_{i=1}^6 P(i \cap \textrm{even}) = 0 + \frac 1 6 + 0 + \frac 1 6 + 0 + \frac 1 6$$
# Union of Events
Given an event A and B, the probability of both occurring is
$$P(A \cup B) = P(A) + P(B)  P(A \cap B)$$
# Conditional Probability
The probability of A given B has occurred is
$$P(AB) = \frac{P(A \cap B)}{P(B)} = \frac{P(BA)P(A)}{P(B)}$$
Deriving from this, we get,
 $P(A \cap B) = P(AB) P(B)$
 $P(A \cap B) = P(BA) P(A)$
# Independence
$$P(A \cap B) = P(A)P(B)$$
or
$$P(AB) = P(A)$$
# Expected Values
If we have a random variable $X$ that can takes values $x \in \mathcal{X}$, we define the expectation of X:
$$\mathbb{E}[X] = \sum_{x \in \mathcal{X}} P(X=x)x$$
Additionally,
 For functions that depend on a random variable: $\mathbb{E}[f(X)] = \sum_{x \in \mathcal{X}} P(X=x)f(X)$
 $\mathbb{E}[\alpha f(X) + \beta g(X)] = \alpha \mathbb{E}[f(X)] + \beta \mathbb{E}[g(X)]$
 $\mathbb{E}[f(X)g(X)] \neq \mathbb{E}[f(X)] \mathbb{E}[g(X)]$
 $\mathbb{E}[XY=y] = \sum_{x \in \mathcal{X}} P(X=xY=y)f(X)$
 $\mathbb{E}[\mathbb{E}[X =x  Y=y]] = \mathbb{E} $ (tower property, law of total expectation, iterated expectation rule)
# Bayes’ Theorem
See also: Naive Bayes
Let $c$ be the class label and $x$ be the measurement (evidence)
$$P(cx) = \frac{P(xc)p(c)}{P(x)}$$
 $P(cx)$: the posterior probability is the probability of $c$ given $x$ (after the measurement).
 $p(c)$: prior probability
 $P(xc)$: classconditional probability (likelihood of $c$ on $x$)
 $P(x)$: unconditional probability (a.k.a. marginal likelihood or expectedness of evidence)
Alternate formulations:
# Expanded
$$P(c  x) = \frac{P(xc)P(c)}{P(xc)P(c)+P(x\lnot c)P(\lnot c)}$$
# Multiple Hypotheses
Suppose $c_1, \dots, c_n$ is an exhaustive and mutually exclusive set of possibilities
$$P(c_i  x) = \frac{P(xc_i)P(c_i)}{P(xc_1)P(c_1)+\dots+P(xc_n)P(c_n)}$$
# Interpretations of Probability
A = draws from a normal deck, B = draws of a face card. $P(B  A) = \frac 3 {13}$ means:
Objective interpretations: Probability values are determined by factors independent of our beliefs.
Subjective interpretations: Probability values reflect individual degrees of belief, and vary from person to person.
We can evaluate these interpretations as follows (Wesley Salmon)

Admissible. Probability values must satisfy the axioms of the probability calculus (the Kolmogorov axioms). This is also called coherence.

Ascertainable. Probability values must be values that we can determine (or else they are useless).

Applicable. Probability values must be reliable as a “guide to life”. They must be values that we can justifiably use to make decisions.

Classical: number of B over number of A is $3/13$
 Problem: assumes cases of A are equipossible (equal probability)
 This seems circular
 Fails ascertainability, admissibility, and applicability

Finite frequency: The proportion of B in a long series of draws is exactly $3/13$.
 Is admissible and ascertainable
 Not applicable: how does this work for single case probabilities?

Limiting frequency: The limiting frequency in an infinite series of draws would be $3/13$.
 Is admissible and applicable
 Not ascertainable: there may be no limiting frequency
 Again, does not work for single case probabilities

Longrun propensity: The setup A has a disposition to produce long sequences in which B happens with frequency $3/13$.
 Assumes that longrun frequencies have an underlying cause through an experimental arrangement/setup
 Not ascertainable: no improvement on the limiting frequency interpretation
 Not explanatory: the tendency or disposition adds nothing to our udnerstanding
 Not all probabilities can be interpreted as propensities. (no causal relation)

Logical: B partially entails A, with degree of entailment $3/13$.
 P(B/A) measures the “proportion” of A that overlaps with B

Epistemic: The evidence that A happened provides objective support of degree $3/13$ that B happened.
 Logical and epistemic probabilities might only exist in some cases
 Very unlikely that we know some of the priors/likelihoods can be computed a priori (from pure logic)

Subjective (actual degree of belief): Somebody believes with degree $3/13$ that A will produce B.
 Credences can be measured (or even defined) by studying your actions, especially your betting behaviour
 Problem: actual degree of belief is not admissable, people commit probabilistic fallacies all the time
 This can lead to bad betting combinations (see Dutch Book examples) in which you are guaranteed to lose money
 Key assumption: EU and EMV are equivalent for small but nontrivial amounts of money

Subjective (idealized credence): An idealized version of someone – with coherent probabilities – believes with degree $3/13$ that A will produce B.
 Fixes admissibility as we require it
 Not applicable: how can we justify using personal probabilities to make decisions if there are no constraints on one’s prior probabilities?
 Another problem
 The meaning or concept of probability does not essentially involve desires or preferences
 For example, an enlightened Zen Buddhist monk can have probabilities but no desires
 Thus, by Peterson, any theory that creates a necessary (definitional) link between probability and preference/desire must be wrong.