
Advanced Statistics Advanced Probability and Statistics Math Forum 
 LinkBack  Thread Tools  Display Modes 
February 14th, 2014, 11:46 PM  #1 
Newbie Joined: Feb 2014 Posts: 3 Thanks: 0  Real world problem, need help
Hi, I am new, I joined the forum specifically because I need help with a problem that I have to solve in the next few days... The problem is how to produce a 'good' ranking of the 'performance' of 150200 sites (programs) based on the voting of several thousand people. The programs all do essentially the same thus can be compared on subjective, qualitative merit. Historically, we let members vote 110 for any number of programs they happened to be a member of, 10 being good. The problem was we ended up with a not very robust ranking and I put this down to the distribution of votes  98% were either 9 or 10 or 1 or 2. Members aren't good at giving a representative score, the votes are way too clustered (there was nothing to stop them voting 10 for every program for example). So I changed the voting method and made it live in order to have some data to work with and when I devised the new voting, I was just sure it would be possible to analyse the data and produce a ranking. So each member now votes by doing their own miniranking  they submit from 210 programs in their perceived order of merit. They have complete choice over which programs they rank and how many from 210 they rank. What I need to do is combine all of those minirankings into one overall ranking. If all programs were ranked roughly the same number of times overall, I might find it relatively easy but they're not. There's a dynamic range of maybe 100:1 in the membership size of the programs and hence the number of people amongst our voters that happen to be a member of those programs. I'm not looking for a formula I guess (though if there were one it would be a lot more efficient than I imagine the final solution will be). I'm imagining it will be an algorithm so I'm looking for mathematically sound suggestions for producing the overall ranking. There isn't an absolute measure of performance but I am imagining there is and that it's possible to produce a true ranking from Program 1 (P1) the best thru P200. Then each persons vote will consist of a sample of 210 of P[1,200] placed in their subjective order or ranking and combining all these minirankings will lead to a good approximation of the true overall ranking. P1 will be voted a lot of times (say 40% of members will include P1 in their miniranking) and it will often be in the #1 position of the minirankings. P2 may have more or less members rank it but the general trend is P[n] will have a trend of decreasing as n increases. So, I have a reasonable maths background but am rusty and rigorous proof or reasoning for a good algorithm of something like this is beyond me. My thoughts so far: If we're dealing with say P1P20 (where there will be a statistically high number of members include those programs somewhere in their minirankings), we could probably look at the percentage of members that voted P[x] as #1 to get P1. However, this seems to be throwing away most of the information in the minirankings. So I imagine where the number of times P1P20 are miniranked is fairly evenly distributed, then we have what I think of as a 'ladder' scoring system (e.g. as used in chess) where if P1 > P2 in someones ranking (I'm using > as 'better') , P1 goes up one notch in the rankings and P2 goes down one notch. After many 'matches' between P[x] and P[y] an overall ranking takes shape. And I think this would work IFF P1P20 appeared equally often in the minirankings. (Sorry if this is too verbose, I'm trying to be as clear as possible) A miniranking where a member had ranked 10 programs is essentially 9 comparisons of programs taking their MR1 with MR2 (miniranking n), MR2 with MR3, MR3 and MR4 but it's also 45 comparisons in total so intuitively there is lot of information available. One of my problems in thinking about it is how to give weightings. If 1000 (out of 2000 people that included P1) put P1 as their MR1 and the remainder put it as MR2 or MR3, we know that program is definitely #1, #2 or #3 in the ficticious 'true ranking'. But if P200 was included in 10 minirankings and was MR1 in each case, I know that P200 is 'unlikely' to be #1, #2, or #3 in the true ranking, even though 100% of the members that voted thought so. If P4 always appears above P7 in the mini rankings, one thing we can say with high confidence is that P4 is above P7 in the true ranking. So that leads me to think that maybe if I built up a table of comparisons of P[x] vs P[y] where y > x and the percentage of times P[x] 'won' I could combine that data into a good overall ranking. However, I can also see that I would have to weight these results for the number of times P[x] was ranked against P[y]  it's no use if only one person had miniranked P200 > P1 so the percentage is 100% For the latter case above, intuitively we'd see that was an aberration but I need to do that mathematically or algorithmically. Although P1 vs P200 may have only been voted once, P200 will have been miniranked many times against other P[n] and most times 'lost' so it should be clear with an algorithmic approach that the lone P1 vs P200 miniranking should have low weight in the overall ranking. I will have a practical problem that some of those rows will be empty as we get to the progressively poorer programs. Sorry once again for being verbose. Were my maths better I'm sure I could have stated the problem in mathematically strict notation and been less verbose but as you can tell I am even struggling to state my objective rigoroursly. Can anyone give me insights, hints, point me to similar problems (or just send me some computer code to do it lol). Many thanks to anyone that takes the time to read and consider this. 
February 14th, 2014, 11:58 PM  #2 
Newbie Joined: Feb 2014 Posts: 3 Thanks: 0  Re: Real world problem, need help
Just spotted this post: viewtopic.php?f=24&t=46081 looks similar  I will do some reading... 
February 15th, 2014, 04:36 AM  #3 
Senior Member Joined: Oct 2013 From: New York, USA Posts: 673 Thanks: 88  Re: Real world problem, need help
You could rank each program by the average rank among people who ranked it. The problem is that if two programs were each ranked by about 10 percent of the people, one program could beat another in all of those comparisons but the losing program might have won if everybody had to compare those two programs. I don't know if there is a good answer. It's probably too late now, but you could have randomly divided the programs into groups with about equal numbers of programs and done multiple rounds of voting to determine a winner.

February 15th, 2014, 04:56 AM  #4 
Newbie Joined: Feb 2014 Posts: 3 Thanks: 0  Re: Real world problem, need help
What may not have come across clearly is that each member's miniranking is not absolute. They may be a member of 10 programs but their #1 is actually #20 or lower in the 'true' ranking. So what they put as #1 is not that interesting, the data comes from the fact they think #1 is better than #2  they are qualified to say that even if they're not a member of the best 20 programs and so not qualified to say their #1 should be THE overall #1  make sense?


Tags 
problem, real, world 
Thread Tools  
Display Modes  

Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Realworld production problem  billymac00  Applied Math  5  December 10th, 2013 02:15 PM 
Real world problem.  TeRr0rP1n  Elementary Math  3  May 2nd, 2013 01:40 AM 
Real World Newtons Law of Cooling Problem  lndwarrior  Calculus  1  April 30th, 2013 09:54 AM 
Real world problem  weight on an incline  long_quach  Applied Math  7  April 5th, 2013 08:03 PM 
A real world problem  FreaKariDunk  Elementary Math  5  April 4th, 2012 07:06 AM 