Ok, I almost forgot this blog existed, but we are back. It's been really a hectic year honestly. Let's get into it.
What is Probabilistic Machine Learning ? What is Machine Learning to start with ?
At its core, machine learning is about finding patterns in data. Let's say given some data $X$ , we want to either find structure within it, or learn a mapping from it to some output $Y$. here, we are basically looking for a function $f$ such that
$y = f(x) , x \in X , y \in Y$
In practice, $f$ is usually parameterized by some set of weights $W$ , whether it's a linear model, a decision tree, or a neural network, the idea remains same.
We tune $W$ until $f$ maps the inputs to the outputs, well enough, which is essentially an optimization problem.
So far so good
but the problem is real world is messy and often not so deterministic, and the data we collect is not the whole story.
The data $X$ and/ or $Y$ itself is a finite sample drawn from some much larger underlying distribution in real world.., and the patterns we find are patterns in that sample, not necessarily in the distribution itself.
This is where the generalization problem comes up, Any model (pattern) trained (found) on a finite sample is implicitly shaped by that sample, and may not reflect the full distribution the data came from.
Standard ML largely ignores this part. To understand better, we need to understand where the uncertainty actually comes from.
For examples (marked with %%), let's consider a cobot scenario where a robot is trying to estimate the human action (picking, placing, etc., ) based on their behavior (hand trajectory, body orientation, etc., )
Two fundamentally different sources of uncertainty exists,
1) Aleatoric - the uncertainty in the data,
This is the inherent randomness in the phenomenon we are trying to model,
%% Even though we are performing same action, the behaviors in the training samples can be slightly different because we are not deterministic machines !
so mathematically the problem becomes $p(y |x) $ instead of $y = f(x)$, essentially, the model needs to also learn how uncertain the point in $Y$ space for a given $x$, not just the value of $y$.
2)Epistemic - uncertainty in the model
This arises from lack of knowledge, either the model hasn't seen enough data, or hasn't seen the right data.
%% and if the robot sees a completely new action it has never encountered, that's epistemic.
This mainly lies in the model itself, so let's say we find the $w$, which was found on the data, but there could be many configurations of $W$ that fit the data equally well, this is underdetermination, so the probabilistic approach here is to treat $w$ as a distribution, rather than a single point in $W$. So now, there are $p(y|x)$ for each of the parameter value in $w$, the prediction becomes an average over all the prediction weighted by the probability of $w$ given the data.
$p(y | x , X , Y) = \int p(y | x , W)p(W | X , Y)$ dW
In practice, both types of uncertainty are present at same time. A nice probabilistic model should capture both :
Aleatoric, model learns the noise in the data, and Epistemic, model knows what it doesn't know.
Comments
Post a Comment