Meetup summary

2025-06-27 - Intro to probability - part 1

None.

Agenda:

This should serve as the basic prerequisite for future probability-oriented sessions. (For example, this is mostly prerequisite for information theory and machine learning applications.) The current list is provisional and will have to be trimmed down based on interest. We’ll define important terms and give a convincing derivation/“physicists’s proof” for interesting results. We might state some theorems/results for the general multivariate case, but will likely avoid derivations unless people are fresh on their linear algebra. (That would be a good follow-up session.)

  • Event space
  • Axioms of probability
  • Conditional probability
  • Bayes’ Rule
  • Chain rule of probability
  • Conditional chain rule, Bayes’ rule, etc. (You should be able to move conditionals back and forth arbitrarily).
  • Independence
    • Of two events. (Not defined by denominator due to division by zero, but equivalent in the happy case.)
    • Pairwise independence of a set of events
    • Mutual independence of a set (this is typically what is meant by “independence” if not otherwise stated)
    • Conditional independence
  • Law of total probability (partitioning)
  • Reminder: Counting forms the basis for all generalizations of probability. Start with uniform distribution and count events (implies equal weighting). We won’t go into counting rules today, as we’re already experts in that area. 😎
  • Discrete RVs
    • PMF
    • CDF
  • Continuous/mixed RVs
    • PDF (this is defined as a limiting case—essentially the difference quotient of the CDF)
    • CDF (this is spelled the same way as in the discrete case).
    • Can always spell a mixed RV as a PDF by using Dirac delta “distribution”.
    • Rule of thumb: When you’re deriving some complex/non-intuitive property or trying to prove something, start with CDF which allows you to reason in “probability space”. You can always differentiate to get PDFs.
  • Multivariate RVs
    • Joint distribution (PMF/PDF/CDF)
    • Marginal PDF
    • Marginal CDF
  • Expectation
    • Formalization of a “mean”
    • Definition for discrete RV
    • Definition for continuous RV
    • Single RV linearity (multiply/add constants)
    • LOTUS
    • Multiple RV linearity (requires joint distribution; invoke LOTUS)
    • Conditional expectation (should be though of as a function of the conditioned RV)
      • Law of total expectation
    • Of independent RVs
  • Variance
    • Definition
    • Expansion due to linearity of expectation
    • Law of total variance
    • Of independent RVs (derived via the general case of covariance)
  • Covariance
    • Definition
    • Linear expansion
    • Rules for sums/scales of covariance. (This gives rise to sums of variance, whether or not independent.)
    • Covariance inequality (relies on Cauchy-Schwarz)
    • Correlation (just standardize and take covariance)
  • Derived RVs
    • Single-variable method of transformations
    • Multivariate MOT (analogous but uses Jacobian)
    • Special case: sum of random variables (convolution). Works for both discrete and continuous RVs.
  • Foundational processes/distributions
    • Bernoulli process/RV, binomial RV, geometric RV, negative binomial RV, etc.
    • Multinomial (categorical) process, multinoulli.
    • Poisson process (limiting case of binomial), Poisson distribution, exponential distribution, Erlang/Gamma distribution
    • Gaussian distribution (different limiting case of binomial, but the derivation is long and we won’t get into it today; also arises from the CLT)
  • Moment generating functions
    • Equivalent to a two-sided Laplace transform, so it does not exist when RV doesn’t have finite moments.
  • Characteristic functions
    • Equivalent to Fourier transform, so it always exists but cannot be used to easily recovery moments from the series expansion.
  • Basic estimators
    • Definition
    • Estimator bias
    • Estimator variance
    • Estimator MSE
    • Sample mean
    • “Naive” sample variance
    • Unbiased sample variance
  • Foundational inequalities
    • Union bound
    • Markov inequalitiy
    • Chebychev inequality
    • Cauchy-Schwarz inquality (for expectations, by analogy with coordinate-free vector version)
    • Jensen’s inequality (I don’t know a general proof—this will just be an intuitive argument)
    • Gibbs’ inequalitiy (preview for information theory—won’t drill into entropy yet)
  • Inference preview (not planning to go deep here; will need to dedicate future sessions to particular areas)
    • Classical vs Bayesian perspective in a nutshell (raw MLE vs explicit priors, underlying parameter is an (unknown) constant vs RV)
    • Conjugate priors (e.g., beta-binomial)

Notes:

We went through the list linearly. A small group of us made it through the covariance section (sans covariance inequality). We’ll need to figure out where to pick up next time, depending on whether we want to get through this stuff as fast as possible or make sure that everybody is caught up.