
Applied Math Applied Math Forum 
 LinkBack  Thread Tools  Display Modes 
March 12th, 2015, 07:37 AM  #1 
Newbie Joined: Mar 2015 From: Canada Posts: 2 Thanks: 0  Average of numbers in bin ranges
Hi All I'm wondering if any one can help me with the best way to find the average value of a set of measurements that are given in the following format. bin1: 5 to 0 ; 50 samples bin2: 0 to 5: 100 samples bin3: 5 to 10: 133 samples bin4: 10 to 15: 76 samples bin5: 15 to 20: 39 samples I've thought about calculating the average using the floor, mid, and ceiling of each range, then taking the average of the 3 to find what the best guess average would be. I've also just thought about find the average bin number would be to find the average bin number is, and map it to the proper value i.e. i use the following data set; 1: 50 samples 2: 100 samples 3: 133 samples 4: 76 samples 5: 39 samples If I take the average of this sample set, i would get 2.88 as the average bin number, indicating the 2nd bin (0 to 5) plus 0.88 of the range. 0.88 of the range = 0.88 x5 = 4.4. So the average would be 0+4.4 = 4.4. Does anyone else have any thoughts? 
March 12th, 2015, 08:17 AM  #2 
Math Team Joined: Dec 2013 From: Colombia Posts: 7,649 Thanks: 2630 Math Focus: Mainly analysis and algebra 
It rather depends on what you want to use the measurement for. One average, the mode, is bin 3 because there are more samples there than anywhere else. Another average, the median, is also b because if all samples were lined up in size order the middle sample would be one from that bin. Since your ranges are all the same size (5), the mean of the midpoints is equal to the mean of the mean of the minima and the mean of the maxima at 6.9. I don't like your last approach at all (there is a way of getting something vaguely reasonable, but your answer isn't it  it's far too close to the top of the range for bin 3). 6.9 is obviously a reasonable result because[list=a][*]it's in bin three, which is intuitively correct; and[*]it's nearer the bottom of bin 3 than the top which is again intuitively correct as the distribution is skewed in that direction. Personally, if I needed a more accurate mean, I'd take a random sample of all items coming off the line. Failing that, you could take a random sample of, say, 10% of the items from each box. 
March 12th, 2015, 08:47 AM  #3 
Global Moderator Joined: Nov 2006 From: UTC 5 Posts: 16,046 Thanks: 938 Math Focus: Number theory, computational mathematics, combinatorics, FOM, symbolic logic, TCS, algorithms 
Well, you can certainly say that the sum is between 1760 and 3750 by taking the smallest and largest values, respectively, and hence the mean is 4.422 to 9.423. Technically that's all you can be sure about. But graphing the values I see that they seem to be singlepeaked, falling off about linearly from the central value. You might suppose that the individual values follow the same trend. In that case we might model the data as y = a + bx  c where c is the most common value, a is the number of times the most common value appears, and b describes how quickly the tails fall off. Under this model, we'd expect (a + b5  c) + ... + (a + b1  c) + (a + b0  c)/2 values in bin1 and so forth for the others. (I'm counting the endpoints as half for each neighbor.) Probably c is between 2 and 10, so this would be 5a + b(15 + 5.5c) for bin1. Some numerics suggest that a around 36.3, b around 2.57, and c around 6.69 works well. In that case the model gives a mean of about 8.668, suggesting that the higher end is more accurate. 
March 16th, 2015, 09:11 AM  #4 
Newbie Joined: Mar 2015 From: Canada Posts: 2 Thanks: 0 
Thanks v8archie and CRGreathouse for your responses. I did some more reading and refreshing of my university statistics to find a solution on this after reading your comments, and I wanted to share what I decided to do. The example that I provided isn't that representative to the sample size that i would actually be using. just something that i could use quickly explain the problem. The data set I would be given would be closer to a normal distrubution, with the samples in bins. Something like this. <10 219 <i'm not 100% sure why there's this outlier 10  20 79 20  30 291 30  40 629 40  50 723 50  60 614 60  70 190 >70 30 What i decided to do is use the median of the data set, which as I understand it is very close to the mean if the data set is a normal distribution. This would also mitigate the effect of the outlier data i have in the <10 bin as I understand In the above sample, the total number of samples is 2775, and the median would be sampled at #1387.5. Sample #1387.5 lies within the 40  50 range, but I need to which sample in the bin is the #1387.5 =1387.5  number of samples from 0 to 40 =1387.5  219  79 291  629 = 169.5 the median would be the 169.5th sample in the 40 to 50 bin, which has 723 samples. Now to find the median value, I do the following. The range of the bin is 10= (5040) Therefore the median = (169.5 / 723)* 10 + 40 = 42.34 I dont' expect this method to be perfect, but for my needs plus or minus 2 is pretty reasonable. Feel free to comment, and thanks for your help. 

Tags 
average, bin, numbers, ranges 
Thread Tools  
Display Modes  

Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
What's it called when you average only middle numbers?  Apple30  Elementary Math  1  October 13th, 2014 06:02 AM 
Average of numbers  sam.jj  Elementary Math  5  March 2nd, 2012 05:14 AM 
Average of numbers having small standard deviation  rmas  Algebra  2  June 6th, 2011 07:29 PM 
Finding average salary from many inputs of all ranges  rabbithole  Advanced Statistics  2  November 4th, 2008 05:43 AM 
finding average numbers  johnny  Algebra  4  July 30th, 2007 05:31 AM 