User Name Remember Me? Password

 Advanced Statistics Advanced Probability and Statistics Math Forum

 August 27th, 2018, 01:52 AM #1 Newbie   Joined: Jun 2017 From: italia Posts: 13 Thanks: 2 Distribution-free test for outliers My data are obtained by subtracting two vector $V_{1}$ and $V_{2}$ in a 3D space: $$v=\sqrt{(V_{1_{x}}-V_{2_{x}})^2+(V_{1_{y}}-V_{2_{y}})^2+(V_{1_{z}}-V_{2_{z}})^2}$$ I don't know the distribution of $v$, but it vaguely resembles a very long-tailed log-normal distribution. Without any valid assumption, I assume an unknown distribution. My current method to find the outliers is based on the Chebyshev’s inequality; I say that $v$ is an outlier if $v-\bar{v} > 10 s$ (sample average and sample standard deviation). Could that method be reasonably valid? I found a paper that explains the Bootlier test to find outliers: https://www.econstor.eu/bitstream/10.../735352828.pdf, but it's not clear to me how to write a working procedure for that test. Please, could anyone explain the Bootlier test in practical terms? August 27th, 2018, 04:16 AM   #2
Senior Member

Joined: Oct 2009

Posts: 850
Thanks: 325

Quote:
 Originally Posted by Cristiano My data are obtained by subtracting two vector $V_{1}$ and $V_{2}$ in a 3D space: $$v=\sqrt{(V_{1_{x}}-V_{2_{x}})^2+(V_{1_{y}}-V_{2_{y}})^2+(V_{1_{z}}-V_{2_{z}})^2}$$ I don't know the distribution of $v$, but it vaguely resembles a very long-tailed log-normal distribution. Without any valid assumption, I assume an unknown distribution.
Do we know the distribution of $V_1$ and $V_2$. Does it look like anything to you? Normal or anything?

Quote:
 My current method to find the outliers is based on the Chebyshev’s inequality; I say that $v$ is an outlier if $v-\bar{v} > 10 s$ (sample average and sample standard deviation).
This is definitely a valid method, but likely way too conservative. You won't flag many outliers this way.

The bootlier is a good test, and there are R scripts written to make it work. Definitely don't write your own script, the procedures you can find online usually do the job very well.
https://github.com/jodeleeuw/Bootlie...ter/bootlier.R

Notice the bootlier does work well with skewed distributions, but only if you have quite a generous amount of data points.

Another method you might want to try are isolation forests. This is also nondistributional and works quite well.
There are a lot of methods on finding outliers nondistributionally, but I really like these two. August 27th, 2018, 05:32 AM   #3
Newbie

Joined: Jun 2017
From: italia

Posts: 13
Thanks: 2

Quote:
 Originally Posted by Micrm@ss Do we know the distribution of $V_1$ and $V_2$. Does it look like anything to you? Normal or anything?
I don't know how to check the distribution of the vectors.

Quote:
 The bootlier is a good test, and there are R scripts written to make it work. Definitely don't write your own script, the procedures you can find online usually do the job very well. https://github.com/jodeleeuw/Bootlie...ter/bootlier.R
I need to include the code in my program (written in C++), but no source code found and I hardly doubt that I can write the program without a step by step explanation.

Quote:
 Another method you might want to try are isolation forests. This is also nondistributional and works quite well. There are a lot of methods on finding outliers nondistributionally, but I really like these two.
I found 2 small packages in C++11, but they don't get compiled (I use MSVC++ 2013). Tags distributionfree, outliers, test Thread Tools Show Printable Version Email this Page Display Modes Linear Mode Switch to Hybrid Mode Switch to Threaded Mode Similar Threads Thread Thread Starter Forum Replies Last Post stringnumargs Algebra 6 June 28th, 2015 03:29 PM Denis Elementary Math 0 November 18th, 2012 12:03 PM daivinhtran Advanced Statistics 0 September 12th, 2011 02:21 PM Relmiw Advanced Statistics 4 September 12th, 2011 04:44 AM G0Y Algebra 1 November 11th, 2008 01:24 PM

 Contact - Home - Forums - Cryptocurrency Forum - Top

Copyright © 2019 My Math Forum. All rights reserved.      