My Math Forum  

Go Back   My Math Forum > High School Math Forum > Probability and Statistics

Probability and Statistics Basic Probability and Statistics Math Forum


Reply
 
LinkBack Thread Tools Display Modes
January 25th, 2017, 07:47 PM   #1
Senior Member
 
Joined: Oct 2013
From: New York, USA

Posts: 675
Thanks: 88

How Close To The Normal Distribution Is My Data?

I have 351 numbers. About half are on each side of the mean. 169 are higher than the mean and 182 are lower than the mean. Here is how my data compares to the normal distribution in terms of what percent of the numbers are within 0.5, 1, and 2 standard deviations of the mean:

Within 0.5 standard deviations: exactly 1/3rd of my numbers, 38.3% for the normal distribution
Within 1 standard deviation: 63.5% of my numbers, 68.3% for the normal distribution
Within 2 standard deviations: 97.2% of my numbers, 95.4% for the normal distribution

The farthest from the mean any of my numbers are is 2.5334 standard deviations above the mean, so all of my numbers are within 3 standard deviations of the mean. It's obvious that my numbers are less likely than the normal distribution to be within 1 standard deviation of the mean and more likely than the normal distribution to be between 1 and 2 standard deviations away from the mean. The highest 35 numbers (about 10 percent of 351) are an average of 1.728 standard deviations above the mean. The lowest 35 numbers are an average of 1.598 standard deviations below the mean.

Without looking at all the numbers, would you say the numbers are close to being normally distributed?
EvanJ is offline  
 
January 26th, 2017, 07:32 AM   #2
Senior Member
 
Joined: Dec 2012
From: Hong Kong

Posts: 853
Thanks: 311

Math Focus: Stochastic processes, statistical inference, data mining, computational linguistics
From your description, it sure sounds close, but you should do a chi-squared test for goodness-of-fit (or some other goodness-of-fit test) to make sure, since you didn't really provide a lot of info...
123qwerty is offline  
January 27th, 2017, 11:58 AM   #3
Senior Member
 
Joined: Oct 2013
From: New York, USA

Posts: 675
Thanks: 88

I have no idea how to do a chi-square test. https://en.wikipedia.org/wiki/Pearso...i-squared_test says "the expected (theoretical) frequency of type i, asserted by the null hypothesis that the fraction of type i in the population is p_{i}" (the symbols might not look right." That sounds like it refers to whole numbers like comparing the observed and expected number of heads from 20 coins, but that's not what I'm working with.
EvanJ is offline  
January 28th, 2017, 07:44 PM   #4
Senior Member
 
Joined: Dec 2012
From: Hong Kong

Posts: 853
Thanks: 311

Math Focus: Stochastic processes, statistical inference, data mining, computational linguistics
Quote:
Originally Posted by EvanJ View Post
I have no idea how to do a chi-square test.
I would just copy code from the Internet and do it in R.

Joking aside...

Quote:
https://en.wikipedia.org/wiki/Pearso...i-squared_test says "the expected (theoretical) frequency of type i, asserted by the null hypothesis that the fraction of type i in the population is p_{i}" (the symbols might not look right." That sounds like it refers to whole numbers like comparing the observed and expected number of heads from 20 coins, but that's not what I'm working with.
What you're doing is quite similar, just the continuous analogue of that I've looked at a few websites and this link explains quite well what you should do. The basic idea is to make an empirical cdf from your data and determine the distance from your empirical cdf to the ideal normal cdf.
123qwerty is offline  
January 28th, 2017, 08:02 PM   #5
Senior Member
 
Joined: Oct 2013
From: New York, USA

Posts: 675
Thanks: 88

Is there a way of taking an amount of numbers, mean, and standard deviation, and having a website generate what all the numbers would be if they were normally distributed?
EvanJ is offline  
January 28th, 2017, 08:09 PM   #6
Senior Member
 
Joined: Dec 2012
From: Hong Kong

Posts: 853
Thanks: 311

Math Focus: Stochastic processes, statistical inference, data mining, computational linguistics
Quote:
Originally Posted by EvanJ View Post
Is there a way of taking an amount of numbers, mean, and standard deviation, and having a website generate what all the numbers would be if they were normally distributed?
The website wouldn't know what the bins you want are... C'mon, you can generate what you want in R or Excel
123qwerty is offline  
January 29th, 2017, 08:11 AM   #7
Senior Member
 
Joined: Oct 2013
From: New York, USA

Posts: 675
Thanks: 88

Quote:
Originally Posted by 123qwerty View Post
The website wouldn't know what the bins you want are... C'mon, you can generate what you want in R or Excel
I've never worked with bins. I have numbers in Excel, and I know how to make Excel do basics like standard deviations, but not statistical tests.
EvanJ is offline  
January 30th, 2017, 05:41 AM   #8
Senior Member
 
Joined: Dec 2012
From: Hong Kong

Posts: 853
Thanks: 311

Math Focus: Stochastic processes, statistical inference, data mining, computational linguistics
Quote:
Originally Posted by EvanJ View Post
I've never worked with bins.
I'm sure you have; you just probably didn't know they were called bins. For example, when you create a histogram, you have intervals like (.5, 5.5], (5.5, 10.5], etc., and those are called bins.

Quote:
I have numbers in Excel, and I know how to make Excel do basics like standard deviations, but not statistical tests.
The Excel functions you'll need for your current purpose are NORM.DIST and CHISQ.TEST. I'm sure you can find the necessary documentation online to learn to use them; if not perhaps you could a subset of the data and I can show you how it's done.
123qwerty is offline  
Reply

  My Math Forum > High School Math Forum > Probability and Statistics

Tags
close, data, distribution, normal



Thread Tools
Display Modes


Similar Threads
Thread Thread Starter Forum Replies Last Post
Normal distribution: a probability distribution? froydipj Probability and Statistics 3 February 29th, 2016 05:35 PM
Multivariate normal distribution and marginal distribution nakys Advanced Statistics 0 October 3rd, 2013 09:27 AM
Do this data follow a normal distribution? jones12 Algebra 0 December 14th, 2012 06:19 AM
determining the distribution of a data set magnetpest2k5 Advanced Statistics 1 March 7th, 2011 10:15 AM
How to determine a set of data follow a certain distribution winsock Advanced Statistics 1 May 22nd, 2008 01:35 PM





Copyright © 2019 My Math Forum. All rights reserved.