My Math Forum  

Go Back   My Math Forum > College Math Forum > Advanced Statistics

Advanced Statistics Advanced Probability and Statistics Math Forum

LinkBack Thread Tools Display Modes
October 8th, 2019, 10:51 AM   #1
Joined: May 2017
From: Germany

Posts: 7
Thanks: 0

Problem with arguing about probability mass of general random variable

(I'm not sure if this is the right sub-forum, but I didn't see a better fit.)

I have a problem with an exercise in a machine learning text-book. Solving it doesn't require any knowledge of machine learning, though, just of advanced probability theory. It's a very simple exercise in principle, almost trivial, so my difficulty isn't a logical one but a technical one.

So I'll first describe the setup. Let $\displaystyle X$ be an arbitrary set and $\displaystyle Y = \{0,1\}$. Let $\displaystyle P$ be a probability distribution over $\displaystyle X \times Y$ such that for each $\displaystyle x \in X$, the conditional probability $\displaystyle P_x(y) := P(y | x)$ is defined. Given a function $\displaystyle h : X \rightarrow Y$, we define the error (or "loss") as $\displaystyle L_P(h) = P(\{(x,y) \in X \times Y | h(x) \neq y\}$

(Here $\displaystyle X$ is the domain set and $\displaystyle Y$ the set of classes, and we want to train a classifier to predict which category an element of $\displaystyle X$ belongs to and measure the quality via the loss function, but again this isn't important for the exercise.)

The book mentions that, in order for the above definition to work, we assume that there is a $\displaystyle \sigma$-algebra over $\displaystyle X \times Y$ and that the set $\displaystyle \{(x,h(x)) | x \in X\}$ is in the algebra for every classifier $\displaystyle h$.

The function which minimizes the loss is the Bayesian predictor $\displaystyle f$ defined via $\displaystyle f(x) = \begin{cases} 1 & P_x(1) \ge 0.5 \\ 0 & P_x(1) < 0.5 \end{cases} $. That's pretty obvious, and it's exactly what I'm supposed to prove formally. So I'm supposed to prove that $\displaystyle L_P(f) \le L_P(h) \forall h : X \rightarrow Y$.

Here's my current approach. Given any predictor $\displaystyle h : X \rightarrow Y$, we can divide $\displaystyle X = A \sqcup D$ where $\displaystyle A = \{x \in X | f(x) = h(x)\}$ and $\displaystyle D = \{x \in X | f(x) \neq h(x)\} $. Then $\displaystyle L_P(h) = P(\{(x,y) \in A \times Y | h(x) \neq y\}) + P(\{(x,y) \in D \times Y | h(x) \neq y\}) $ and $\displaystyle L_P(f) = P(\{(x,y) \in A \times Y | f(x) \neq y\}) + P(\{(x,y) \in D \times Y | f(x) \neq y\})$, and since the first part is obviously identical, we merely need to show that $\displaystyle P(\{(x,y) \in D \times Y | f(x) \neq y\} \le P(\{(x,y) \in D \times Y | h(x) \neq y\})$. Furthermore, since $\displaystyle f$ and $\displaystyle h$ disagree on the set $\displaystyle D$ (that's why it's called D ), we can rewrite this equation as

$\displaystyle P(\{(x,y) \in D \times Y | f(x) \neq y\} \le P(\{(x,y) \in D \times Y | f(x) = y\})$ (since $\displaystyle h(x) \neq y \iff 1-f(x) \neq y \iff f(x) \neq 1 - y \iff f(x) = y$)

And here's where I'm stuck. I just don't know how to argue about probabilities on $\displaystyle X \times Y$. Obviously if $\displaystyle X$ was discrete, this would be easy, but it may not be. Intuitively I would like to make an argument such as, for every $\displaystyle x \in X$ the $\displaystyle (x,y)$ from the first set is at most as likely as the one from the second set, but I don't know how to do that since individual points may have probability 0.

My concrete questions are

1. is it even legal to define $\displaystyle P_x$ if $\displaystyle x$ may have probability 0 as the book does it, and if so, what's the theory behind that? (The equation $\displaystyle P_x(y) = \frac{P(x,y)}{P(y)}$ may not be defined.)

2. Are the sets $\displaystyle \{(x,y) \in A \times Y | h(x) \neq y\}$ and $\displaystyle \{(x,y) \in D \times Y | h(x) \neq y\}$ even measurable, and if so how do I prove that?

3. How do I finish the above argument? And if I can't, how else do you do this?

The book is quite rigorous, so I'm assuming there is a way to do it properly and I'm just lacking competence in measure theory.

Last edited by sty silver; October 8th, 2019 at 11:10 AM.
sty silver is offline  
October 8th, 2019, 12:16 PM   #2
Senior Member
Joined: Mar 2015
From: Universe 2.71828i3.14159

Posts: 132
Thanks: 49

Math Focus: Area of Circle
What book is it???
tahirimanov19 is offline  
October 8th, 2019, 01:43 PM   #3
Joined: May 2017
From: Germany

Posts: 7
Thanks: 0

Understanding Machine Learning, page 52 Exercise 7
sty silver is offline  

  My Math Forum > College Math Forum > Advanced Statistics

arguing, general, mass, measuretheory, probability, problem, random, variable

Thread Tools
Display Modes

Similar Threads
Thread Thread Starter Forum Replies Last Post
Calculate probability of interval with a random variable? DecoratorFawn82 Probability and Statistics 4 July 25th, 2018 08:49 AM
Binomial random variable probability alw15625 Probability and Statistics 1 October 30th, 2015 02:32 PM
Probability to find the random variable X within an interval Linder88 Probability and Statistics 1 October 28th, 2015 06:48 PM
probability at a point for a continuous random variable rsashwinkumar Advanced Statistics 3 January 7th, 2013 03:25 PM
random variable probability meph1st0pheles Advanced Statistics 0 February 22nd, 2010 06:27 PM

Copyright © 2019 My Math Forum. All rights reserved.