My Math Forum  

Go Back   My Math Forum > College Math Forum > Advanced Statistics

Advanced Statistics Advanced Probability and Statistics Math Forum


Reply
 
LinkBack Thread Tools Display Modes
August 27th, 2018, 02:52 AM   #1
Newbie
 
Joined: Jun 2017
From: italia

Posts: 13
Thanks: 2

Distribution-free test for outliers

My data are obtained by subtracting two vector $V_{1}$ and $V_{2}$ in a 3D space:
$$v=\sqrt{(V_{1_{x}}-V_{2_{x}})^2+(V_{1_{y}}-V_{2_{y}})^2+(V_{1_{z}}-V_{2_{z}})^2}$$
I don't know the distribution of $v$, but it vaguely resembles a very long-tailed log-normal distribution. Without any valid assumption, I assume an unknown distribution.

My current method to find the outliers is based on the Chebyshev’s inequality; I say that $v$ is an outlier if $v-\bar{v} > 10 s$ (sample average and sample standard deviation).

Could that method be reasonably valid?

I found a paper that explains the Bootlier test to find outliers: https://www.econstor.eu/bitstream/10.../735352828.pdf, but it's not clear to me how to write a working procedure for that test.
Please, could anyone explain the Bootlier test in practical terms?
Cristiano is offline  
 
August 27th, 2018, 05:16 AM   #2
Senior Member
 
Joined: Oct 2009

Posts: 628
Thanks: 190

Quote:
Originally Posted by Cristiano View Post
My data are obtained by subtracting two vector $V_{1}$ and $V_{2}$ in a 3D space:
$$v=\sqrt{(V_{1_{x}}-V_{2_{x}})^2+(V_{1_{y}}-V_{2_{y}})^2+(V_{1_{z}}-V_{2_{z}})^2}$$
I don't know the distribution of $v$, but it vaguely resembles a very long-tailed log-normal distribution. Without any valid assumption, I assume an unknown distribution.
Do we know the distribution of $V_1$ and $V_2$. Does it look like anything to you? Normal or anything?

Quote:
My current method to find the outliers is based on the Chebyshev’s inequality; I say that $v$ is an outlier if $v-\bar{v} > 10 s$ (sample average and sample standard deviation).
This is definitely a valid method, but likely way too conservative. You won't flag many outliers this way.

The bootlier is a good test, and there are R scripts written to make it work. Definitely don't write your own script, the procedures you can find online usually do the job very well.
https://github.com/jodeleeuw/Bootlie...ter/bootlier.R

Notice the bootlier does work well with skewed distributions, but only if you have quite a generous amount of data points.

Another method you might want to try are isolation forests. This is also nondistributional and works quite well.
There are a lot of methods on finding outliers nondistributionally, but I really like these two.
Micrm@ss is offline  
August 27th, 2018, 06:32 AM   #3
Newbie
 
Joined: Jun 2017
From: italia

Posts: 13
Thanks: 2

Quote:
Originally Posted by Micrm@ss View Post
Do we know the distribution of $V_1$ and $V_2$. Does it look like anything to you? Normal or anything?
I don't know how to check the distribution of the vectors.

Quote:
The bootlier is a good test, and there are R scripts written to make it work. Definitely don't write your own script, the procedures you can find online usually do the job very well.
https://github.com/jodeleeuw/Bootlie...ter/bootlier.R
I need to include the code in my program (written in C++), but no source code found and I hardly doubt that I can write the program without a step by step explanation.

Quote:
Another method you might want to try are isolation forests. This is also nondistributional and works quite well.
There are a lot of methods on finding outliers nondistributionally, but I really like these two.
I found 2 small packages in C++11, but they don't get compiled (I use MSVC++ 2013).
Cristiano is offline  
Reply

  My Math Forum > College Math Forum > Advanced Statistics

Tags
distributionfree, outliers, test



Thread Tools
Display Modes


Similar Threads
Thread Thread Starter Forum Replies Last Post
Free program that calculate Math formulas for free stringnumargs Algebra 6 June 28th, 2015 04:29 PM
free math test Denis Elementary Math 0 November 18th, 2012 01:03 PM
Statistics, Outliers and Boxplot daivinhtran Advanced Statistics 0 September 12th, 2011 03:21 PM
T test w/o normal distribution? Relmiw Advanced Statistics 4 September 12th, 2011 05:44 AM
uniform distribution test - help G0Y Algebra 1 November 11th, 2008 02:24 PM





Copyright © 2018 My Math Forum. All rights reserved.