My Math Forum How Close To The Normal Distribution Is My Data?

 Probability and Statistics Basic Probability and Statistics Math Forum

 January 25th, 2017, 06:47 PM #1 Senior Member   Joined: Oct 2013 From: New York, USA Posts: 645 Thanks: 85 How Close To The Normal Distribution Is My Data? I have 351 numbers. About half are on each side of the mean. 169 are higher than the mean and 182 are lower than the mean. Here is how my data compares to the normal distribution in terms of what percent of the numbers are within 0.5, 1, and 2 standard deviations of the mean: Within 0.5 standard deviations: exactly 1/3rd of my numbers, 38.3% for the normal distribution Within 1 standard deviation: 63.5% of my numbers, 68.3% for the normal distribution Within 2 standard deviations: 97.2% of my numbers, 95.4% for the normal distribution The farthest from the mean any of my numbers are is 2.5334 standard deviations above the mean, so all of my numbers are within 3 standard deviations of the mean. It's obvious that my numbers are less likely than the normal distribution to be within 1 standard deviation of the mean and more likely than the normal distribution to be between 1 and 2 standard deviations away from the mean. The highest 35 numbers (about 10 percent of 351) are an average of 1.728 standard deviations above the mean. The lowest 35 numbers are an average of 1.598 standard deviations below the mean. Without looking at all the numbers, would you say the numbers are close to being normally distributed?
 January 26th, 2017, 06:32 AM #2 Senior Member   Joined: Dec 2012 From: Hong Kong Posts: 853 Thanks: 311 Math Focus: Stochastic processes, statistical inference, data mining, computational linguistics From your description, it sure sounds close, but you should do a chi-squared test for goodness-of-fit (or some other goodness-of-fit test) to make sure, since you didn't really provide a lot of info...
 January 27th, 2017, 10:58 AM #3 Senior Member   Joined: Oct 2013 From: New York, USA Posts: 645 Thanks: 85 I have no idea how to do a chi-square test. https://en.wikipedia.org/wiki/Pearso...i-squared_test says "the expected (theoretical) frequency of type i, asserted by the null hypothesis that the fraction of type i in the population is p_{i}" (the symbols might not look right." That sounds like it refers to whole numbers like comparing the observed and expected number of heads from 20 coins, but that's not what I'm working with.
January 28th, 2017, 06:44 PM   #4
Senior Member

Joined: Dec 2012
From: Hong Kong

Posts: 853
Thanks: 311

Math Focus: Stochastic processes, statistical inference, data mining, computational linguistics
Quote:
 Originally Posted by EvanJ I have no idea how to do a chi-square test.
I would just copy code from the Internet and do it in R.

Joking aside...

Quote:
 https://en.wikipedia.org/wiki/Pearso...i-squared_test says "the expected (theoretical) frequency of type i, asserted by the null hypothesis that the fraction of type i in the population is p_{i}" (the symbols might not look right." That sounds like it refers to whole numbers like comparing the observed and expected number of heads from 20 coins, but that's not what I'm working with.
What you're doing is quite similar, just the continuous analogue of that I've looked at a few websites and this link explains quite well what you should do. The basic idea is to make an empirical cdf from your data and determine the distance from your empirical cdf to the ideal normal cdf.

 January 28th, 2017, 07:02 PM #5 Senior Member   Joined: Oct 2013 From: New York, USA Posts: 645 Thanks: 85 Is there a way of taking an amount of numbers, mean, and standard deviation, and having a website generate what all the numbers would be if they were normally distributed?
January 28th, 2017, 07:09 PM   #6
Senior Member

Joined: Dec 2012
From: Hong Kong

Posts: 853
Thanks: 311

Math Focus: Stochastic processes, statistical inference, data mining, computational linguistics
Quote:
 Originally Posted by EvanJ Is there a way of taking an amount of numbers, mean, and standard deviation, and having a website generate what all the numbers would be if they were normally distributed?
The website wouldn't know what the bins you want are... C'mon, you can generate what you want in R or Excel

January 29th, 2017, 07:11 AM   #7
Senior Member

Joined: Oct 2013
From: New York, USA

Posts: 645
Thanks: 85

Quote:
 Originally Posted by 123qwerty The website wouldn't know what the bins you want are... C'mon, you can generate what you want in R or Excel
I've never worked with bins. I have numbers in Excel, and I know how to make Excel do basics like standard deviations, but not statistical tests.

January 30th, 2017, 04:41 AM   #8
Senior Member

Joined: Dec 2012
From: Hong Kong

Posts: 853
Thanks: 311

Math Focus: Stochastic processes, statistical inference, data mining, computational linguistics
Quote:
 Originally Posted by EvanJ I've never worked with bins.
I'm sure you have; you just probably didn't know they were called bins. For example, when you create a histogram, you have intervals like (.5, 5.5], (5.5, 10.5], etc., and those are called bins.

Quote:
 I have numbers in Excel, and I know how to make Excel do basics like standard deviations, but not statistical tests.
The Excel functions you'll need for your current purpose are NORM.DIST and CHISQ.TEST. I'm sure you can find the necessary documentation online to learn to use them; if not perhaps you could a subset of the data and I can show you how it's done.

 Tags close, data, distribution, normal

 Thread Tools Display Modes Linear Mode

 Similar Threads Thread Thread Starter Forum Replies Last Post froydipj Probability and Statistics 3 February 29th, 2016 04:35 PM nakys Advanced Statistics 0 October 3rd, 2013 08:27 AM jones12 Algebra 0 December 14th, 2012 05:19 AM magnetpest2k5 Advanced Statistics 1 March 7th, 2011 09:15 AM winsock Advanced Statistics 1 May 22nd, 2008 12:35 PM

 Contact - Home - Forums - Cryptocurrency Forum - Top