relationship between sets of numbers

Jun 2015
2
0
australia
Hey guys,

Im trying to find a pattern to identify the relationship between some numbers I have:

unknown: 393260 size: 26623 x: 67 y: 184
unknown: 403669 size: 27449 x: 69 y: 191
unknown: 407177 size: 27852 x: 71 y: 194
unknown: 410336 size: 28125 x: 73 y: 196
unknown: 415528 size: 28575 x: 75 y: 200
unknown: 423787 size: 29218 x: 78 y: 205
unknown: 429183 size: 29696 x: 80 y: 209
unknown: 432962 size: 29941 x: 81 y: 211
unknown: 435455 size: 30089 x: 81 y: 212
unknown: 449949 size: 30908 x: 82 y: 214
unknown: 454742 size: 31276 x: 84 y: 217
unknown: 458611 size: 31642 x: 86 y: 220
unknown: 461507 size: 31887 x: 88 y: 222
unknown: 467154 size: 32298 x: 90 y: 225

I am interested in finding out a formula for creating the "unknown" variable for a give size. I am not sure if the x,y variable are necessary
does anyone have any idea?

:cold:

thanks in advance.
 

CRGreathouse

Forum Staff
Nov 2006
16,046
936
UTC -5
Some crunching with R suggests that all of the variables positively correlate with each other but x is mostly redundant with the others, so a good model is (apparently)
unknown ~= 35525.99 + 21.78 * size - 1208.84 * y

Code:
> unknown <- c(393260, 403669, 407177, 410336, 415528, 423787, 429183, 432962, 435455, 449949, 454742, 458611, 461507, 467154)
> size <- c(26623, 27449, 27852, 28125, 28575, 29218, 29696, 29941, 30089, 30908, 31276, 31642, 31887, 32298)
> x <- c(67, 69, 71, 73, 75, 78, 80, 81, 81, 82, 84, 86, 88, 90)
> y <- c(184, 191, 194, 196, 200, 205, 209, 211, 212, 214, 217, 220, 222, 225)
> df <- data.frame(unknown,size,x,y)
> summary(lm.out <- lm(unknown ~ size + y))
> lm.out
>  lm.resid <- resid(lm.out)
> plot(unknown, lm.resid)
 
Jun 2015
2
0
australia
I checked the formula out, but unfortunately it generates "unknown" values close to, but not quite exact.

:(

great piece of software though!
thanks for your help.
 

CRGreathouse

Forum Staff
Nov 2006
16,046
936
UTC -5
I checked the formula out, but unfortunately it generates "unknown" values close to, but not quite exact.
Right. Unless you very, very badly overfit the data you won't get exact matches.