My Math Forum Problem with arguing about probability mass of general random variable

 October 8th, 2019, 10:51 AM #1 Newbie   Joined: May 2017 From: Germany Posts: 7 Thanks: 0 Problem with arguing about probability mass of general random variable (I'm not sure if this is the right sub-forum, but I didn't see a better fit.) I have a problem with an exercise in a machine learning text-book. Solving it doesn't require any knowledge of machine learning, though, just of advanced probability theory. It's a very simple exercise in principle, almost trivial, so my difficulty isn't a logical one but a technical one. So I'll first describe the setup. Let $\displaystyle X$ be an arbitrary set and $\displaystyle Y = \{0,1\}$. Let $\displaystyle P$ be a probability distribution over $\displaystyle X \times Y$ such that for each $\displaystyle x \in X$, the conditional probability $\displaystyle P_x(y) := P(y | x)$ is defined. Given a function $\displaystyle h : X \rightarrow Y$, we define the error (or "loss") as $\displaystyle L_P(h) = P(\{(x,y) \in X \times Y | h(x) \neq y\}$ (Here $\displaystyle X$ is the domain set and $\displaystyle Y$ the set of classes, and we want to train a classifier to predict which category an element of $\displaystyle X$ belongs to and measure the quality via the loss function, but again this isn't important for the exercise.) The book mentions that, in order for the above definition to work, we assume that there is a $\displaystyle \sigma$-algebra over $\displaystyle X \times Y$ and that the set $\displaystyle \{(x,h(x)) | x \in X\}$ is in the algebra for every classifier $\displaystyle h$. The function which minimizes the loss is the Bayesian predictor $\displaystyle f$ defined via $\displaystyle f(x) = \begin{cases} 1 & P_x(1) \ge 0.5 \\ 0 & P_x(1) < 0.5 \end{cases}$. That's pretty obvious, and it's exactly what I'm supposed to prove formally. So I'm supposed to prove that $\displaystyle L_P(f) \le L_P(h) \forall h : X \rightarrow Y$. Here's my current approach. Given any predictor $\displaystyle h : X \rightarrow Y$, we can divide $\displaystyle X = A \sqcup D$ where $\displaystyle A = \{x \in X | f(x) = h(x)\}$ and $\displaystyle D = \{x \in X | f(x) \neq h(x)\}$. Then $\displaystyle L_P(h) = P(\{(x,y) \in A \times Y | h(x) \neq y\}) + P(\{(x,y) \in D \times Y | h(x) \neq y\})$ and $\displaystyle L_P(f) = P(\{(x,y) \in A \times Y | f(x) \neq y\}) + P(\{(x,y) \in D \times Y | f(x) \neq y\})$, and since the first part is obviously identical, we merely need to show that $\displaystyle P(\{(x,y) \in D \times Y | f(x) \neq y\} \le P(\{(x,y) \in D \times Y | h(x) \neq y\})$. Furthermore, since $\displaystyle f$ and $\displaystyle h$ disagree on the set $\displaystyle D$ (that's why it's called D ), we can rewrite this equation as $\displaystyle P(\{(x,y) \in D \times Y | f(x) \neq y\} \le P(\{(x,y) \in D \times Y | f(x) = y\})$ (since $\displaystyle h(x) \neq y \iff 1-f(x) \neq y \iff f(x) \neq 1 - y \iff f(x) = y$) And here's where I'm stuck. I just don't know how to argue about probabilities on $\displaystyle X \times Y$. Obviously if $\displaystyle X$ was discrete, this would be easy, but it may not be. Intuitively I would like to make an argument such as, for every $\displaystyle x \in X$ the $\displaystyle (x,y)$ from the first set is at most as likely as the one from the second set, but I don't know how to do that since individual points may have probability 0. My concrete questions are 1. is it even legal to define $\displaystyle P_x$ if $\displaystyle x$ may have probability 0 as the book does it, and if so, what's the theory behind that? (The equation $\displaystyle P_x(y) = \frac{P(x,y)}{P(y)}$ may not be defined.) 2. Are the sets $\displaystyle \{(x,y) \in A \times Y | h(x) \neq y\}$ and $\displaystyle \{(x,y) \in D \times Y | h(x) \neq y\}$ even measurable, and if so how do I prove that? 3. How do I finish the above argument? And if I can't, how else do you do this? The book is quite rigorous, so I'm assuming there is a way to do it properly and I'm just lacking competence in measure theory. Last edited by sty silver; October 8th, 2019 at 11:10 AM.
 October 8th, 2019, 12:16 PM #2 Senior Member   Joined: Mar 2015 From: Universe 2.71828i3.14159 Posts: 132 Thanks: 49 Math Focus: Area of Circle What book is it???
 October 8th, 2019, 01:43 PM #3 Newbie   Joined: May 2017 From: Germany Posts: 7 Thanks: 0 Understanding Machine Learning, page 52 Exercise 7

 Tags arguing, general, mass, measuretheory, probability, problem, random, variable

 Thread Tools Display Modes Linear Mode

 Similar Threads Thread Thread Starter Forum Replies Last Post DecoratorFawn82 Probability and Statistics 4 July 25th, 2018 08:49 AM alw15625 Probability and Statistics 1 October 30th, 2015 02:32 PM Linder88 Probability and Statistics 1 October 28th, 2015 06:48 PM rsashwinkumar Advanced Statistics 3 January 7th, 2013 03:25 PM meph1st0pheles Advanced Statistics 0 February 22nd, 2010 06:27 PM

 Contact - Home - Forums - Cryptocurrency Forum - Top