
Advanced Statistics Advanced Probability and Statistics Math Forum 
 LinkBack  Thread Tools  Display Modes 
October 8th, 2019, 10:51 AM  #1 
Newbie Joined: May 2017 From: Germany Posts: 7 Thanks: 0  Problem with arguing about probability mass of general random variable
(I'm not sure if this is the right subforum, but I didn't see a better fit.) I have a problem with an exercise in a machine learning textbook. Solving it doesn't require any knowledge of machine learning, though, just of advanced probability theory. It's a very simple exercise in principle, almost trivial, so my difficulty isn't a logical one but a technical one. So I'll first describe the setup. Let $\displaystyle X$ be an arbitrary set and $\displaystyle Y = \{0,1\}$. Let $\displaystyle P$ be a probability distribution over $\displaystyle X \times Y$ such that for each $\displaystyle x \in X$, the conditional probability $\displaystyle P_x(y) := P(y  x)$ is defined. Given a function $\displaystyle h : X \rightarrow Y$, we define the error (or "loss") as $\displaystyle L_P(h) = P(\{(x,y) \in X \times Y  h(x) \neq y\}$ (Here $\displaystyle X$ is the domain set and $\displaystyle Y$ the set of classes, and we want to train a classifier to predict which category an element of $\displaystyle X$ belongs to and measure the quality via the loss function, but again this isn't important for the exercise.) The book mentions that, in order for the above definition to work, we assume that there is a $\displaystyle \sigma$algebra over $\displaystyle X \times Y$ and that the set $\displaystyle \{(x,h(x))  x \in X\}$ is in the algebra for every classifier $\displaystyle h$. The function which minimizes the loss is the Bayesian predictor $\displaystyle f$ defined via $\displaystyle f(x) = \begin{cases} 1 & P_x(1) \ge 0.5 \\ 0 & P_x(1) < 0.5 \end{cases} $. That's pretty obvious, and it's exactly what I'm supposed to prove formally. So I'm supposed to prove that $\displaystyle L_P(f) \le L_P(h) \forall h : X \rightarrow Y$. Here's my current approach. Given any predictor $\displaystyle h : X \rightarrow Y$, we can divide $\displaystyle X = A \sqcup D$ where $\displaystyle A = \{x \in X  f(x) = h(x)\}$ and $\displaystyle D = \{x \in X  f(x) \neq h(x)\} $. Then $\displaystyle L_P(h) = P(\{(x,y) \in A \times Y  h(x) \neq y\}) + P(\{(x,y) \in D \times Y  h(x) \neq y\}) $ and $\displaystyle L_P(f) = P(\{(x,y) \in A \times Y  f(x) \neq y\}) + P(\{(x,y) \in D \times Y  f(x) \neq y\})$, and since the first part is obviously identical, we merely need to show that $\displaystyle P(\{(x,y) \in D \times Y  f(x) \neq y\} \le P(\{(x,y) \in D \times Y  h(x) \neq y\})$. Furthermore, since $\displaystyle f$ and $\displaystyle h$ disagree on the set $\displaystyle D$ (that's why it's called D ), we can rewrite this equation as $\displaystyle P(\{(x,y) \in D \times Y  f(x) \neq y\} \le P(\{(x,y) \in D \times Y  f(x) = y\})$ (since $\displaystyle h(x) \neq y \iff 1f(x) \neq y \iff f(x) \neq 1  y \iff f(x) = y$) And here's where I'm stuck. I just don't know how to argue about probabilities on $\displaystyle X \times Y$. Obviously if $\displaystyle X$ was discrete, this would be easy, but it may not be. Intuitively I would like to make an argument such as, for every $\displaystyle x \in X$ the $\displaystyle (x,y)$ from the first set is at most as likely as the one from the second set, but I don't know how to do that since individual points may have probability 0. My concrete questions are 1. is it even legal to define $\displaystyle P_x$ if $\displaystyle x$ may have probability 0 as the book does it, and if so, what's the theory behind that? (The equation $\displaystyle P_x(y) = \frac{P(x,y)}{P(y)}$ may not be defined.) 2. Are the sets $\displaystyle \{(x,y) \in A \times Y  h(x) \neq y\}$ and $\displaystyle \{(x,y) \in D \times Y  h(x) \neq y\}$ even measurable, and if so how do I prove that? 3. How do I finish the above argument? And if I can't, how else do you do this? The book is quite rigorous, so I'm assuming there is a way to do it properly and I'm just lacking competence in measure theory. Last edited by sty silver; October 8th, 2019 at 11:10 AM. 
October 8th, 2019, 12:16 PM  #2 
Senior Member Joined: Mar 2015 From: Universe 2.71828i3.14159 Posts: 132 Thanks: 49 Math Focus: Area of Circle 
What book is it???

October 8th, 2019, 01:43 PM  #3 
Newbie Joined: May 2017 From: Germany Posts: 7 Thanks: 0  Understanding Machine Learning, page 52 Exercise 7


Tags 
arguing, general, mass, measuretheory, probability, problem, random, variable 
Thread Tools  
Display Modes  

Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Calculate probability of interval with a random variable?  DecoratorFawn82  Probability and Statistics  4  July 25th, 2018 08:49 AM 
Binomial random variable probability  alw15625  Probability and Statistics  1  October 30th, 2015 02:32 PM 
Probability to find the random variable X within an interval  Linder88  Probability and Statistics  1  October 28th, 2015 06:48 PM 
probability at a point for a continuous random variable  rsashwinkumar  Advanced Statistics  3  January 7th, 2013 03:25 PM 
random variable probability  meph1st0pheles  Advanced Statistics  0  February 22nd, 2010 06:27 PM 