
Advanced Statistics Advanced Probability and Statistics Math Forum 
 LinkBack  Thread Tools  Display Modes 
August 29th, 2017, 12:26 AM  #1 
Newbie Joined: Aug 2017 From: Singapore Posts: 2 Thanks: 0  Prove a large number of samplings converge to a set of small number of samples
In a $k$dimensional space, given a set of $n$ points: $X=\{x_1, x_2,..., x_n\}$ which is sampled from an known distribution $P$. Given a set of $m$ points ($m << n$): $Y=\{y_1,y_2,...,y_m\}$ which is also drawn from $P$. Suppose $\{z_1, z_2,...,z_m\} \subset X$ are the closest points (in term of Euclidean distance) of $\{y_1,y_2,...,y_m\}$ respectively. A gap $g$ is defined as follows: $\displaystyle g = max\{z_1  y_1^2, z_2  y_2^2,..., z_m  y_m^2\}$ where $ab^2$ denotes the Euclidean distance between a, b. Prove that if $m$ is fixed, $g \rightarrow 0$ as $n \rightarrow \infty$. I don't know how to start this problem. Thank you. Last edited by thanhvinh0906; August 29th, 2017 at 01:00 AM. 
August 29th, 2017, 01:11 PM  #2 
Global Moderator Joined: May 2007 Posts: 6,555 Thanks: 599 
The applicable theorem is the law of large numbers, but I haven't tried to work it out. I'll leave that to you.

August 30th, 2017, 01:34 AM  #3  
Newbie Joined: Aug 2017 From: Singapore Posts: 2 Thanks: 0  Quote:
Could you please check whether my below proof is correct? I do it in a uniform distribution. Suppose $g = max\{z_1  y_1^2, z_2  y_2^2,..., z_m  y_m^2\} = \ z_{max}  y_{max} \$. For any $\varepsilon > 0$, put a hypersphere radius $\varepsilon$ around $y_{max}$ in that $y_{max}$ is the centroid (*). Suppose there is no sample of $S$ exists in the hypersphere as $n \rightarrow \infty$. (**) We project $y_{max}$ to dimension $k$, denoted as $y_{max,k}$. Suppose the boundary of dimension $k$ is $[a, b]$, i.e. $a \leq y_{max,k} \leq b$. According to the law of large numbers, $E[x_k]$ (the average of $X$ projected to dimension $k$) $\rightarrow (a + b)/2$ as $n \rightarrow \infty$. Now if there is no sample in the hypersphere: \begin{equation} \begin{split} E[x_k] \rightarrow \int_a^b \mathrm{\frac{x}{ba}}\,\mathrm{d}x = \int_a^{y_{max,k}  \varepsilon} \mathrm{\frac{x}{ba}}\,\mathrm{d}x + \int_{y_{max,k} + \varepsilon}^b \mathrm{\frac{x}{ba}}\,\mathrm{d}x \\ = \frac{a+b}{2}  \frac{2\varepsilon y_{max,k}}{b  a} \neq \frac{a + b}{2} \end{split} \end{equation} Equation (1) conflicts the law of large numbers, this indicates that (**) is false; so there exists at least one sample in the hypersphere. This statement accompanied by (*) indicates that $g = \ z_{max}  y_{max} \ \rightarrow 0$ as $n \rightarrow \infty$. Please share if you have another proof. Thank you. Last edited by skipjack; August 30th, 2017 at 04:38 PM.  
August 30th, 2017, 04:27 PM  #4 
Global Moderator Joined: May 2007 Posts: 6,555 Thanks: 599 
There may be a flaw in that as n increases, the set of x's changes and so would the set of z's. Therefore y_{max} may change with n. I had further thoughts after my first post. Qualitatively, a histogram, with narrower intervals for increasing n, of the x values should resemble the density function of the probability. The y values correspond to specific points of the density. The intervals (containing y values) of the histograms should eventually contain some x (therefore z) value close to the y value. I have never taken a course in statistics, although my specialty is probability theory, so I am not sure hot to develop a proof. Last edited by mathman; August 30th, 2017 at 04:29 PM. 

Tags 
converge, large, number, prove, samples, samplings, set, small 
Thread Tools  
Display Modes  

Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Small number at bottom right.  Azarashi  Algebra  10  December 12th, 2016 08:38 AM 
Number of factors of a large number  ishaanmj007  Number Theory  2  May 14th, 2015 05:07 PM 
Large number modulo  FloorPlay  Number Theory  2  October 30th, 2013 08:59 PM 
What are the number of rainfalls in each samples?  Jessicasamide  Algebra  1  November 24th, 2012 05:47 PM 
Number of unique after n samples  BigBugBuzz  Advanced Statistics  3  December 15th, 2010 01:25 PM 