Meetup summary
2025-09-26 - Intro to probability - part 4
Recommended reading:
None.
Agenda:
Picking up the probability intro series from last time. As per our running tradition, I expect to get through only a small subset of the agenda.
- Derived RVs
- Multivariate method of transformations (analogous to single-variable MOT but uses Jacobian)
- Special case: sum of random variables (convolution). Works for both discrete and continuous RVs.
- Foundational processes/distributions
- Bernoulli process/RV, binomial RV, geometric RV, negative binomial RV, etc.
- Multinomial (categorical) process, multinoulli.
- Poisson process (limiting case of binomial), Poisson distribution, exponential distribution, Erlang/Gamma distribution
- Gaussian distribution (different limiting case of binomial, but the derivation is long and we won’t get into it today; also arises from the CLT)
- Moment generating functions
- Equivalent to a two-sided Laplace transform, so it does not exist when RV doesn’t have finite moments.
- Characteristic functions
- Equivalent to Fourier transform, so it always exists but cannot be used to easily recovery moments from the series expansion.
- Basic estimators
- Definition
- Estimator bias
- Estimator variance
- Estimator MSE
- Sample mean
- “Naive” sample variance
- Unbiased sample variance
- Foundational inequalities
- Union bound
- Markov inequalitiy
- Chebychev inequality
- Cauchy-Schwarz inquality for expectations (covered earlier)
- Jensen’s inequality (I don’t know a general proof—this will just be an intuitive argument)
- Gibbs’ inequalitiy (preview for information theory—won’t drill into entropy yet)
- Inference preview (not planning to go deep here; will need to dedicate future sessions to particular areas)
- Classical vs Bayesian perspective in a nutshell (raw MLE vs explicit priors, underlying parameter is an (unknown) constant vs RV)
- Conjugate priors (e.g., beta-binomial)
Notes
Most attendees missed the previous coverage of the single-variate MOT, so we rederived and covered that.
I also noted that the monotonicity requirement is not nearly as constraining as it seems and in fact you don’t typically even need to use the partitioning trick to use this in most practical applications. Since the MOT is most often (indirectly) used to constrain model parameters where the underlying sampler/training algorithm is unconstrained by default (e.g., when sampling from a posterior in a Bayesian generative model), the mapping is usually something simple and monotonic by design, e.g., . This got us onto a tangent where we discussed numerical stability in probability sampling (for example, vs and why you should usually prefer the latter). We then got into a further tangent into various properties of the exponential and logarithm functions. (I didn’t capture any further notes since this was ad-hoc, but if you’re interested in this topic and missed this session, feel free to ask about it.)
We’re putting the probability session on hold and returning to AC for now due to interest. I still want to cover the prerequisites for reinforcement learning before we get into the Thrill Digger competition, but we can go back to the pattern we previously used in AC and derive/cover what we need in an on-demand fashion. I think that gives better motivation and will help things stick.
tags: