Summary

A structured roadmap covering the three mathematical pillars required for AI/ML: Statistics & Probability (reasoning under uncertainty), Linear Algebra (structure of data and models), and Calculus (learning as optimization). Each section explains not just what the concepts are, but why they matter in ML contexts — loss functions arise from MLE, backpropagation is the chain rule, and PCA is eigendecomposition. Includes a personal learning resource stack: 3Blue1Brown → Coursera Imperial → Khan Academy → ISL → MML textbook.

一份結構化的 AI/ML 數學路線圖,涵蓋三大支柱:統計與概率(在不確定性下推理)、線性代數(資料與模型的結構)以及微積分(學習即最優化)。每個部分不只說明概念是什麼,更解釋為何在 ML 中重要——損失函數源於 MLE,反向傳播是鏈式法則,PCA 是特徵值分解的應用。附上個人學習資源棧:3Blue1Brown → Coursera Imperial → Khan Academy → ISL → MML 教材。

Key Points

  • Statistics & Probability: Expected value → loss functions; Bayes’ theorem → probabilistic models; MLE → MSE/cross-entropy naturally; distributions (Normal, Binomial) → data generation assumptions
  • Linear Algebra: All ML is matrix operations; scalars/vectors/matrices/tensors form the data hierarchy; SVD for numerical stability; PCA for dimensionality reduction; eigenvalues for convergence analysis
  • Calculus: Gradient = direction of steepest ascent; chain rule = backpropagation; Jacobian/Hessian for deep learning; local minima/saddle points explain training failures
  • Learning sequence: intuition first (3Blue1Brown) → structured courses → statistics (Khan Academy) → applied theory (ISL) → unified view (MML textbook)
  • Convexity guarantees optimality — rare in deep learning but conceptually important for understanding why convergence is hard

Insights

  • The “why does this matter in ML” framing for each concept is the article’s main contribution over standard math textbooks — connecting variance to model generalization or determinants to singularity in one sentence is more valuable than derivations
  • The learning path (visual → structured → applied → unified) mirrors the technical book reading strategy from another vault note: intuition first, rigor second
  • MLE as the origin of loss functions is probably the single most unifying insight in ML theory: once you understand that training is MLE, cross-entropy and MSE stop being arbitrary choices and become principled derivations
  • Saddle points being “common in high-dimensional spaces” is a non-obvious fact — the popular mental model of “getting stuck in local minima” is actually less accurate than “navigating saddle points,” which has different implications for optimizer design
  • This roadmap pairs directly with the DevOps 6-month roadmap: both argue for fundamentals-first, ordered learning over tool-hopping

Connections

Raw Excerpt

Almost everything in machine learning is a matrix operation. Data, parameters, activations, and gradients are all vectors, matrices, or tensors.