(2024/2025) Applied Statistics

The notes are taken from the books required for the course:

T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition. Springer Series in Statistics. Springer New York, 2009.
G. James, D. Witten, T. Hastie, and R. Tibshirani. An Introduction to Statistical Learning: with Applications in Python. Springer Texts in Statistics. Springer New York, 2013.
R.A. Johnson and D.W. Wichern. Applied Multivariate Statistical Analysis. Applied Multivariate Statistical Analysis. Pearson Prentice Hall, 2007.
Course slides.

You can view/download the PDF here. In the notes folder, you can also see the source code.

Course Syllabus

Introduction to statistical learning. The bias-variance tradeoff. Maximum Likelihood Estimation.
Curse of dimensionality and dimension reduction. Principal Component Analysis and its probabilistic counterpart. PCA by singular value decomposition.
Unsupervised classification. Hierarchical clustering, K-means clustering, Gaussian mixture models and the EM algorithm.
Supervised classification. Linear and Quadratic discriminant analysis.
Linear Models. Simple and multiple linear regression. Fitting the model via ordinary least squares, assessing the accuracy of the coefficient estimates, assessing the accuracy of the model, prediction intervals. Qualitative predictors and interactions.
Logistic regression
Model selection and regularization: subset selection, shrinkage methods (ridge regression and lasso), dimension reduction methods
Resampling methods. Cross-validation. The bootstrap.
Tree-based methods. Classification and regression trees. Bagging, random forests.