June 26th, 2009, 01:41 AM  #1 
#1
You have observed the waiting time of 10 customers, who call your helpdesk. The average waiting time you have calculated is 1 minute. However, you have realized that the highest value is an outlier and you exclude it from your analysis. After you have excluded this outlier, how would the value of average waiting time change? 
June 26th, 2009, 05:18 AM  #2 
#2
That's pretty clear. Call the sum of the other nine S and the outlier O. Then S + O = 10 * 1. Can you solve for the new average?

June 26th, 2009, 05:52 AM  #3 
#3
it decreases right? but do we know by how much?

June 26th, 2009, 06:09 AM  #4 
#4
Can you solve the algebra?

June 26th, 2009, 06:39 AM  #5 
#5
what variable am i solving for?

July 3rd, 2009, 11:32 AM  #6 
#6
For O.

July 3rd, 2009, 09:36 PM  #7  
#7
 
July 5th, 2009, 11:56 AM  #8 
#8
The best way to deal with outliers is conservatively. First you must perform a test to make a determination whether it really is an outlier or not. The first question to ask about whether the point is an outlier is "Was there something different or wrong about the way the measurement of that point was made? If you can answer yes, discard the point and take the mean of the other points in a normal manner. If you can't find anything unique about the way the measurement was made compared to the others, you have to use some kind of test to determine if it is an outlier (Also, if you find many points are not measured in a consistent manner, you need to find a better method of measurement). The Qtest is a good first choice because it is very conservative in defining points that are outliers. The ratio is . Basically, you take the suspect point and determine the difference between it and the next closest point. You then divide this difference by the range of all data points (the suspect point) included. If the quotient is above the threshold for the confidence interval you seek (table at http://en.wikipedia.org/wiki/Qtest), then you can discard the point as an outlier and average the remaining points normally. It is important to note the Q test can only be used once per data set. Example: Say you have the following data set: 2,2,3,4,5,5,5,5,6,11. The mean is 4.8, but 11 seems like a possible outlier. So to apply the Qtest find the gap between it and the next closest point (6) is 5. This is divided by the range of the whole data set (112=9). 5/9=0.555. Compare this Q value thresholds for 10 data points (from the table). At 90, 95, and 99% confidence intervals are: 0.412, 0.466, and 0.568, respectively. Our value of 0.555 is between 0.466 and 0.568, and therefore we can say with 95% confidence that 11 is an outlier. We discard it and divide the sum of the remaining numbers by 9. The resulting average is 4.11. The interpretation of these Qtest results are that 95% of the time this will be a closer representation of the real mean, while the remaining 5% of the time 11 should not have been declared and outlier and 4.8 would be closer to the actual mean. Hope this helps 

