About MILD

While practical methods for data science and machine learning are advancing very rapidly, theoretical foundations are often not yet present; we wish to fill in this gap.

The researchers in MILD bring expertise from several disciplines. Graduate students involved in MILD will receive crossdisciplinary training and be exposed to ideas from all of the following areas.

Probability Theory
Probability theory provides a wealth of tools that are invaluable for work in this area. Examples include concentration of measure in high dimensions, subGaussian and subexponential random variables, random matrix theory, and uniform deviation inequalities.

Mathematical Statistics
Mathematical statistics applies probabilistic techniques to provide rigorous backing for various statistical methods. Examples include optimality of model selection, consistency of estimation, tight confidence sets for unknown parameters, and generally analyzing highdimensional data.

Learning Theory
Learning theory gives a theoretical foundation to many of the ideas in machine learning. Examples include measuring the complexity of hypothesis classes, proving sample bound on learning, online learning methods.

Information Theory
Information theory is a discipline that studies how to remove redundancy and represent information in the most efficient manner (compression), and how to add redundancy and convey information reliably through a noisy medium (error correction). The answers to these questions require fundamental information measures such as entropy that capture the amount of uncertainty in a probabilistic model. These techniques are used in making decisions under uncertainty, choosing informative features in machine learning, comparing probabilistic models, etc.

Algorithm Design
Algorithm design is a discipline that studies how to efficiently analyze and manipulate data on a computer. These techniques are crucial for efficiently processing modern largescale data sets, particularly in machine learning. Modern algorithms are frequently randomized, so tools from probability theory are frequently very useful.

Differential privacy
Differential privacy provides provable privacy guarantees for individuals in the training data that will be used for training machine learning models. Examples include a variety of ML applications such as clustering, classification, generative modeling.
/