Skip to the content.

(2023/2024) Applied Statistics

The notes are taken from the books required for the course:

You can view/download the PDF here. In the notes folder, you can also see the source code.

For any issue, use the appropriate section.

Course Syllabus

According to the official course syllabus:

  1. Exploring a multivariate dataset.
    • Descriptive statistics and graphical displays.
    • The geometry of a multivariate sample.
    • Sample mean, covariance and correlation.
    • Generalized variance and total variance.
    • The metric induced by the covariance matrix.
  2. Data representation and dimensional reduction.
    • The analysis of the covariance structure, principal component analysis (PCA).
  3. Discrimination, classification, clustering.
    • Statistical classification: model, misclassification costs and prior probability.
    • Bayesian supervised classification and the Fisher approach to discriminant analysis.
    • Cross-validation for the evaluation of a classification function.
    • Alternative approaches to classification: CART, support vector machines.
    • Similarity measures.
    • Unsupervised classification; hierarchical and nonhierarchical methods.
    • DBSCAN.
    • K-means and K-medoids.
    • Multidimensional scaling.
  4. Inference about mean vectors.
    • The multivariate normal distribution, the Wishart distribution, the F distribution.
    • Hotelling $T^2$ test.
    • Confidence regions and simultaneous comparisons of component means.
    • The Bonferroni method for multiple comparisons.
    • Familywise Error Rate and False Discovery Rate.
    • Comparisons of several multivariate means.
    • ANOVA and MANOVA.
    • Inference for Linear Models.
    • Beyond Ordinary Least Squares: ridge regression, lasso, regularized least Squares.
    • Random effects and mixed effects linear models.