March 12th, 2017, 01:59 AM  #1 
Member Joined: May 2015 From: Australia Posts: 59 Thanks: 6  Why is my Standard Deviation larger than the mean?
Hello, I have an assignment asking me to calculate the mean and standard deviation for Weekly housing cost. All of the data came from 306 students who filled in the survey. I was given raw data in excel. I used excel to calculate the mean by typing '=average(' into the formula bar and selecting the appropriate category. I also used excel to calculate the standard deviation by typing '=stdev(' into the formula bar and selecting the appropriate category. I attached a notepad file with the raw data (i couldnt attach the excel file with the raw data, but if you'd like to double check my calculations, you can copy and paste the data to excel) Idont konw why the SD is larger than the mean. It doesn't make sense. For example, for the 'homecostwk' category, the mean is 84 dollars, and the SD is 175 dollars That means the data spread is from 91 dollars to 259 dollars How is that possible when the raw data values aren't even negative? Did I do the calculations wrong? I had to calculate for other categories too, such as food, entertainment and mobile phone weekly costs. The SD for those categories were also larger than the mean. 
March 12th, 2017, 07:27 PM  #2 
Senior Member Joined: Sep 2016 From: USA Posts: 197 Thanks: 105 Math Focus: Dynamical systems, analytic function theory, numerics 
There is exactly zero relationship between the standard deviation and the mean. There is no reason to expect the standard deviation to be smaller than the mean in general. Can you explain why you think it should be smaller?

March 12th, 2017, 07:37 PM  #3 
Member Joined: May 2015 From: Australia Posts: 59 Thanks: 6 
Thanks so much for your reply. I'm going to try to explain what I mean. When I posted this question, I thought that it was odd that the SD was larger than the mean. However, now that I think about it, it is okay for the SD to be larger than the mean. My concern is that the SD is too large for the mean. For example, regarding the weekly home costs, the mean is 84 dollars. The standard deviation is 175 dollars. From my understanding, the SD makes the data spread from 84175= 91 dollars to 84+175= 259 dollars. This doesn't make any sense to me because how is the standard deviation below the mean a negative value? The reason that it doesnt make sense to me is that none of the raw data values are negative AND how can it be negative when the data is about home weekly costs? Maybe I did the calculations incorrectly? 
March 12th, 2017, 10:45 PM  #4 
Senior Member Joined: May 2016 From: USA Posts: 825 Thanks: 335 
Without seeing either the data or your spreadsheet, almost any answer is going to be a guess. If the number of data elements is small and you used the sample formula rather than the population formula for standard deviation, your computation of the standard deviation may be in error. Let's assume, however, that you made no error in your computation. Your concern would be valid only if negative deviations from the mean were approximately as numerous as positive deviations. But you may have a situation where there are many more negative deviations than positive deviations, in which case the average positive deviation must have a much larger absolute value than the average negative deviation. In other words, the data are skewed. In a sense, the standard deviation is a kind of average of the deviations around the mean, but the absolute values of the average positive deviation and average negative deviation may differ considerably from the standard deviation. If the data are heavily skewed, the standard deviation cannot be used in the way you are using it because it will overestimate deviations in one direction and underestimate them in the other direction. Socioeconomic data are frequently skewed; thus you need to test your data for that. Last edited by JeffM1; March 12th, 2017 at 10:52 PM. 
March 12th, 2017, 10:57 PM  #5 
Member Joined: May 2015 From: Australia Posts: 59 Thanks: 6 
Thanks for your reply. In my assignment I was instructed to use Microsoft Excel to calculate the mean AND standard deviation. This means I used the formula "=average( )" and "=stdev( )" to work out the values. If it helps to get a better sense of the data I used, i have attached a notepad file containing all 306 raw data values. The data could be skewed since the lowest data value is 0 dollars and the highest data value is 1154 dollars. The mean is 84 dollars and SD is 175 dollars. Does this explain why the SD below the mean is a negative value? I'm still confused about why the SD below the mean is negative. 
March 13th, 2017, 05:32 PM  #6 
Global Moderator Joined: May 2007 Posts: 6,379 Thanks: 542 
The distribution is very unsymmetrical. The standard deviation does not take that into account, so the extreme high values tend to make it bigger.

March 13th, 2017, 05:47 PM  #7 
Member Joined: May 2015 From: Australia Posts: 59 Thanks: 6 
Thanks for your reply. Does that mean there are lots of outliers in the data? The fact that the standard deviation below the mean makes the data negative doesn't really help me I think. To give you a better idea of what i have to do, I had to create a summary table of the expense categories of last year's students. I couldnt attach a clear image of the table, so I had to create a link for it: https://www.dropbox.com/s/numj773zua...table.png?dl=0 Then I have to compare my own expenses (I'm basically a frugal university student =)) to the student cohort. The criteria is that i have to do 'an appropriate comparison'. For example, I could say my weekly income is below the mean. However, I dont know how to compare myself to the standard deviation. How could i utilise the standard deviation in my comparison? Any suggestions would be greatly appreciated Last edited by pianist; March 13th, 2017 at 05:51 PM. 
March 14th, 2017, 07:25 AM  #8 
Senior Member Joined: May 2016 From: USA Posts: 825 Thanks: 335 
Wow. You are making a number of assumptions that are either false or misleading. First, the standard deviation is NEVER negative. It is always nonnegative. Second, if the standard deviation exceeds the mean, it does not necessarily mean that any value in the population is negative. There may or may not be negative values in the population. Only if the mean is zero and the standard deviation is positive does it mean that there necessarily are negative values. (What may be confusing you is that in zscores, the mean is zero.) Third, if the mean is far from zero and the standard deviation exceeds the mean, it MAY indicate that the mean is not a helpful measure of central tendency and that either the median or mode provides a more meaningful summary. 
March 14th, 2017, 07:45 AM  #9 
Member Joined: May 2015 From: Australia Posts: 59 Thanks: 6 
Thanks for your reply. For my task, i have to compare myself to the student cohort in the table below: https://www.dropbox.com/s/numj773zua...table.png?dl=0 For example, in my comparison, i could say that my weekly income is 244146=98 dollars below the mean. The table also gives the standard deviation of the student cohort. How could I use the SD of 322 dollars referring to the weekly income to compare to myself? The standard deviation is provided in the table, so i don't think i should ignore it. However, i dont think i can compare a standard deviation to my individual data value of weekly income. Or is it possible to compare? 
March 14th, 2017, 11:18 AM  #10 
Senior Member Joined: May 2016 From: USA Posts: 825 Thanks: 335 
Basically, you cannot. Income cannot be negative. So what that standard deviation tells you is that some percentage of the students have incomes well in excess of the mean. In this case, the mean is not particularly representative. If the table gave you the median, it would be far more informative. The implication here is that there are other tables that may break the information down in different ways. For example, there may be a different table that shows the data for the students not employed. You would probably find that the mean and the standard deviation were smaller, perhaps considerably smaller. You need to understand three things about the standard deviation. First, it is an average of the deviations so small means small deviations on average and large means large deviations on average. In that sense, it just tells you whether the mean is informative or not. In this example, the mean of the entire population is not very informative. Second, it is a very common way to average the deviations around the mean. So you need to understand what it is telling about the mean. Third, many data sets are approximately "normally distributed." If so, the standard derivation lets you know what percentages of the population deviate by how much from the mean. In other words, the standard deviation gives you a lot of information for normally distributed data. 

Tags 
deviation, larger, standard 
Search tags for this page 
Click on a term to search for related topics.

Thread Tools  
Display Modes  

Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Average Mean Deviation VS. Standard Deviation  John Travolski  Probability and Statistics  1  July 14th, 2016 02:21 PM 
Standard Deviation  tjthedj  Probability and Statistics  3  December 18th, 2014 08:09 AM 
Standard Deviation  bilano99  Algebra  3  March 21st, 2012 11:20 PM 
Standard deviation VS Mean absolute deviation  Axel  Algebra  2  April 28th, 2011 04:25 AM 
Standard deviation  emtswife  Algebra  3  January 31st, 2008 02:21 AM 