Sets functions and iterations

RPO

Dec 2019
1
0
Mont Saint Michel
I am new to algebra and help would be more than welcome to tell me if the process I have built is OK, and if my attempt to apply formula on two sets to create a new one is also OK.

Context

I have a database containing species records (e.g. 10 different species, with 100 rows by species ; in columns are quantitative variables). I want to compute Euclidean distances (considering all variables) between randomly sampled 20 row by species, between each species and species h. I want to bootstrap this calculation an increasing number of time to assess the effect of iteration augmentation on results linearity (to say: OK, we have reach linearity, results should be OK). The aim is to show a figure like that (1 line color = 1 species Euclidean distance to species h):




To explain the process, I illustrate it with distance calculation between species a and species h:

We define sets Ra and Rh as original species records.
\[R_\alpha = \left \{ 1,2,3,4,...,n | n\in \mathbb{N} \: and\: n\geq 21 \right \}\]

\[R_h = \left \{ 1,2,3,4,...,n | n\in \mathbb{N} \: and\: n\geq 21 \right \}\]

Then we define Sa and Sh as proper subsets of Ra and Rh composed of 20 records randomly sampled in Ra and Rh, without replacement, so that probability P(r) for records to be selected is:
\[P(r)=\frac{(N-n)!}{N!}\]

\[S_\alpha \subset R_\alpha \: and\: S_h \subset R_h\:, with\: n(S)=20\]

Then we define the following function to compute the mean Euclidean distance between all records of Sa and Sh:
\[f(x,y)=\frac{1}{n'}\sum_{j=1}^{n'}\left ( \sqrt{\sum_{i=1}^{n}(y_i-x_i)^{2}} \right )_j\]

With n = 20 (variables) and n' = 20 (randomly sampled records ; size of Sa and Sh).

Then we define set D, which contains Euclidean distances between records Sa and Sh:
\[d_{(\alpha ,h)}=\left \{ f(x,y)|x\in S_\alpha \: and\: y\in S_h \right \}\]

Finally, we define set B containing number of iterations of the whole process, from sampling event (with replacement between each iteration, giving a probability P(r)=1/N for records to be selected between iterations) to sed D computation. The following formula f(x) allow computing set M:
\[B\approx \left \{ 1*1.6^x | x\in \mathbb{N}_0\: and\: 0\leq x\geq 20\right \}\]

\[f(x)=\frac{1}{n''}\sum_{l=1}^{n''}x_{l}\]

\[M_{(\alpha ,h)}=\left \{ f(x)|x\in D\: and\: n''\in B \right \}\]

With B = rounded values.

Mainly, I am not pretty sure that I have the right to build M(a,h) this way...

Could you please tell me if it is OK to call functions this way in sets ? And if you spot mistakes in the process ?

Many thanks for your help !