My Math Forum  

Go Back   My Math Forum > College Math Forum > Applied Math

Applied Math Applied Math Forum

LinkBack Thread Tools Display Modes
March 12th, 2015, 07:37 AM   #1
Joined: Mar 2015
From: Canada

Posts: 2
Thanks: 0

Average of numbers in bin ranges

Hi All

I'm wondering if any one can help me with the best way to find the average value of a set of measurements that are given in the following format.

bin1: -5 to 0 ; 50 samples
bin2: 0 to 5: 100 samples
bin3: 5 to 10: 133 samples
bin4: 10 to 15: 76 samples
bin5: 15 to 20: 39 samples

I've thought about calculating the average using the floor, mid, and ceiling of each range, then taking the average of the 3 to find what the best guess average would be.

I've also just thought about find the average bin number would be to find the average bin number is, and map it to the proper value i.e. i use the following data set;

1: 50 samples
2: 100 samples
3: 133 samples
4: 76 samples
5: 39 samples

If I take the average of this sample set, i would get 2.88 as the average bin number, indicating the 2nd bin (0 to 5) plus 0.88 of the range. 0.88 of the range = 0.88 x5 = 4.4. So the average would be 0+4.4 = 4.4.

Does anyone else have any thoughts?
osiris is offline  
March 12th, 2015, 08:17 AM   #2
Math Team
Joined: Dec 2013
From: Colombia

Posts: 7,649
Thanks: 2630

Math Focus: Mainly analysis and algebra
It rather depends on what you want to use the measurement for.

One average, the mode, is bin 3 because there are more samples there than anywhere else.

Another average, the median, is also b because if all samples were lined up in size order the middle sample would be one from that bin.

Since your ranges are all the same size (5), the mean of the midpoints is equal to the mean of the mean of the minima and the mean of the maxima at 6.9.

I don't like your last approach at all (there is a way of getting something vaguely reasonable, but your answer isn't it - it's far too close to the top of the range for bin 3).

6.9 is obviously a reasonable result because[list=a][*]it's in bin three, which is intuitively correct; and[*]it's nearer the bottom of bin 3 than the top which is again intuitively correct as the distribution is skewed in that direction.

Personally, if I needed a more accurate mean, I'd take a random sample of all items coming off the line. Failing that, you could take a random sample of, say, 10% of the items from each box.
v8archie is offline  
March 12th, 2015, 08:47 AM   #3
Global Moderator
CRGreathouse's Avatar
Joined: Nov 2006
From: UTC -5

Posts: 16,046
Thanks: 938

Math Focus: Number theory, computational mathematics, combinatorics, FOM, symbolic logic, TCS, algorithms
Well, you can certainly say that the sum is between 1760 and 3750 by taking the smallest and largest values, respectively, and hence the mean is 4.422 to 9.423. Technically that's all you can be sure about.

But graphing the values I see that they seem to be single-peaked, falling off about linearly from the central value. You might suppose that the individual values follow the same trend. In that case we might model the data as y = a + b|x - c| where c is the most common value, a is the number of times the most common value appears, and b describes how quickly the tails fall off.

Under this model, we'd expect (a + b|-5 - c|) + ... + (a + b|-1 - c|) + (a + b|0 - c|)/2 values in bin1 and so forth for the others. (I'm counting the endpoints as half for each neighbor.) Probably c is between 2 and 10, so this would be 5a + b(15 + 5.5c) for bin1.

Some numerics suggest that a around 36.3, b around -2.57, and c around 6.69 works well. In that case the model gives a mean of about 8.668, suggesting that the higher end is more accurate.
CRGreathouse is offline  
March 16th, 2015, 09:11 AM   #4
Joined: Mar 2015
From: Canada

Posts: 2
Thanks: 0

Thanks v8archie and CRGreathouse for your responses.

I did some more reading and refreshing of my university statistics to find a solution on this after reading your comments, and I wanted to share what I decided to do.

The example that I provided isn't that representative to the sample size that i would actually be using. just something that i could use quickly explain the problem.

The data set I would be given would be closer to a normal distrubution, with the samples in bins. Something like this.

<10 219 <-i'm not 100% sure why there's this outlier
10 - 20 79
20 - 30 291
30 - 40 629
40 - 50 723
50 - 60 614
60 - 70 190
>70 30

What i decided to do is use the median of the data set, which as I understand it is very close to the mean if the data set is a normal distribution. This would also mitigate the effect of the outlier data i have in the <10 bin as I understand

In the above sample, the total number of samples is 2775, and the median would be sampled at #1387.5.

Sample #1387.5 lies within the 40 - 50 range, but I need to which sample in the bin is the #1387.5

=1387.5 - number of samples from 0 to 40
=1387.5 - 219 - 79 -291 - 629 = 169.5

the median would be the 169.5th sample in the 40 to 50 bin, which has 723 samples.

Now to find the median value, I do the following.

The range of the bin is 10= (50-40)
Therefore the median = (169.5 / 723)* 10 + 40 = 42.34

I dont' expect this method to be perfect, but for my needs plus or minus 2 is pretty reasonable.

Feel free to comment, and thanks for your help.
osiris is offline  

  My Math Forum > College Math Forum > Applied Math

average, bin, numbers, ranges

Thread Tools
Display Modes

Similar Threads
Thread Thread Starter Forum Replies Last Post
What's it called when you average only middle numbers? Apple30 Elementary Math 1 October 13th, 2014 06:02 AM
Average of numbers sam.jj Elementary Math 5 March 2nd, 2012 05:14 AM
Average of numbers having small standard deviation rmas Algebra 2 June 6th, 2011 07:29 PM
Finding average salary from many inputs of all ranges rabbithole Advanced Statistics 2 November 4th, 2008 05:43 AM
finding average numbers johnny Algebra 4 July 30th, 2007 05:31 AM

Copyright © 2019 My Math Forum. All rights reserved.