User Name Remember Me? Password

 Applied Math Applied Math Forum

 March 12th, 2015, 07:37 AM #1 Newbie   Joined: Mar 2015 From: Canada Posts: 2 Thanks: 0 Average of numbers in bin ranges Hi All I'm wondering if any one can help me with the best way to find the average value of a set of measurements that are given in the following format. bin1: -5 to 0 ; 50 samples bin2: 0 to 5: 100 samples bin3: 5 to 10: 133 samples bin4: 10 to 15: 76 samples bin5: 15 to 20: 39 samples I've thought about calculating the average using the floor, mid, and ceiling of each range, then taking the average of the 3 to find what the best guess average would be. I've also just thought about find the average bin number would be to find the average bin number is, and map it to the proper value i.e. i use the following data set; 1: 50 samples 2: 100 samples 3: 133 samples 4: 76 samples 5: 39 samples If I take the average of this sample set, i would get 2.88 as the average bin number, indicating the 2nd bin (0 to 5) plus 0.88 of the range. 0.88 of the range = 0.88 x5 = 4.4. So the average would be 0+4.4 = 4.4. Does anyone else have any thoughts? March 12th, 2015, 08:17 AM #2 Math Team   Joined: Dec 2013 From: Colombia Posts: 7,675 Thanks: 2655 Math Focus: Mainly analysis and algebra It rather depends on what you want to use the measurement for. One average, the mode, is bin 3 because there are more samples there than anywhere else. Another average, the median, is also b because if all samples were lined up in size order the middle sample would be one from that bin. Since your ranges are all the same size (5), the mean of the midpoints is equal to the mean of the mean of the minima and the mean of the maxima at 6.9. I don't like your last approach at all (there is a way of getting something vaguely reasonable, but your answer isn't it - it's far too close to the top of the range for bin 3). 6.9 is obviously a reasonable result because[list=a][*]it's in bin three, which is intuitively correct; and[*]it's nearer the bottom of bin 3 than the top which is again intuitively correct as the distribution is skewed in that direction. Personally, if I needed a more accurate mean, I'd take a random sample of all items coming off the line. Failing that, you could take a random sample of, say, 10% of the items from each box. March 12th, 2015, 08:47 AM #3 Global Moderator   Joined: Nov 2006 From: UTC -5 Posts: 16,046 Thanks: 938 Math Focus: Number theory, computational mathematics, combinatorics, FOM, symbolic logic, TCS, algorithms Well, you can certainly say that the sum is between 1760 and 3750 by taking the smallest and largest values, respectively, and hence the mean is 4.422 to 9.423. Technically that's all you can be sure about. But graphing the values I see that they seem to be single-peaked, falling off about linearly from the central value. You might suppose that the individual values follow the same trend. In that case we might model the data as y = a + b|x - c| where c is the most common value, a is the number of times the most common value appears, and b describes how quickly the tails fall off. Under this model, we'd expect (a + b|-5 - c|) + ... + (a + b|-1 - c|) + (a + b|0 - c|)/2 values in bin1 and so forth for the others. (I'm counting the endpoints as half for each neighbor.) Probably c is between 2 and 10, so this would be 5a + b(15 + 5.5c) for bin1. Some numerics suggest that a around 36.3, b around -2.57, and c around 6.69 works well. In that case the model gives a mean of about 8.668, suggesting that the higher end is more accurate. March 16th, 2015, 09:11 AM #4 Newbie   Joined: Mar 2015 From: Canada Posts: 2 Thanks: 0 Thanks v8archie and CRGreathouse for your responses. I did some more reading and refreshing of my university statistics to find a solution on this after reading your comments, and I wanted to share what I decided to do. The example that I provided isn't that representative to the sample size that i would actually be using. just something that i could use quickly explain the problem. The data set I would be given would be closer to a normal distrubution, with the samples in bins. Something like this. <10 219 <-i'm not 100% sure why there's this outlier 10 - 20 79 20 - 30 291 30 - 40 629 40 - 50 723 50 - 60 614 60 - 70 190 >70 30 What i decided to do is use the median of the data set, which as I understand it is very close to the mean if the data set is a normal distribution. This would also mitigate the effect of the outlier data i have in the <10 bin as I understand In the above sample, the total number of samples is 2775, and the median would be sampled at #1387.5. Sample #1387.5 lies within the 40 - 50 range, but I need to which sample in the bin is the #1387.5 =1387.5 - number of samples from 0 to 40 =1387.5 - 219 - 79 -291 - 629 = 169.5 the median would be the 169.5th sample in the 40 to 50 bin, which has 723 samples. Now to find the median value, I do the following. The range of the bin is 10= (50-40) Therefore the median = (169.5 / 723)* 10 + 40 = 42.34 I dont' expect this method to be perfect, but for my needs plus or minus 2 is pretty reasonable. Feel free to comment, and thanks for your help. Tags average, bin, numbers, ranges Thread Tools Show Printable Version Email this Page Display Modes Linear Mode Switch to Hybrid Mode Switch to Threaded Mode Similar Threads Thread Thread Starter Forum Replies Last Post Apple30 Elementary Math 1 October 13th, 2014 06:02 AM sam.jj Elementary Math 5 March 2nd, 2012 05:14 AM rmas Algebra 2 June 6th, 2011 07:29 PM rabbithole Advanced Statistics 2 November 4th, 2008 05:43 AM johnny Algebra 4 July 30th, 2007 05:31 AM

 Contact - Home - Forums - Cryptocurrency Forum - Top

Copyright © 2019 My Math Forum. All rights reserved.      