My Math Forum  

Go Back   My Math Forum > College Math Forum > Advanced Statistics

Advanced Statistics Advanced Probability and Statistics Math Forum


Reply
 
LinkBack Thread Tools Display Modes
February 14th, 2014, 11:46 PM   #1
Newbie
 
Joined: Feb 2014

Posts: 3
Thanks: 0

Real world problem, need help

Hi,

I am new, I joined the forum specifically because I need help with a problem that I have to solve in the next few days...

The problem is how to produce a 'good' ranking of the 'performance' of 150-200 sites (programs) based on the voting of several thousand people. The programs all do essentially the same thus can be compared on subjective, qualitative merit.

Historically, we let members vote 1-10 for any number of programs they happened to be a member of, 10 being good. The problem was we ended up with a not very robust ranking and I put this down to the distribution of votes - 98% were either 9 or 10 or 1 or 2. Members aren't good at giving a representative score, the votes are way too clustered (there was nothing to stop them voting 10 for every program for example).

So I changed the voting method and made it live in order to have some data to work with and when I devised the new voting, I was just sure it would be possible to analyse the data and produce a ranking.

So each member now votes by doing their own mini-ranking - they submit from 2-10 programs in their perceived order of merit. They have complete choice over which programs they rank and how many from 2-10 they rank. What I need to do is combine all of those mini-rankings into one overall ranking.

If all programs were ranked roughly the same number of times overall, I might find it relatively easy but they're not. There's a dynamic range of maybe 100:1 in the membership size of the programs and hence the number of people amongst our voters that happen to be a member of those programs.

I'm not looking for a formula I guess (though if there were one it would be a lot more efficient than I imagine the final solution will be). I'm imagining it will be an algorithm so I'm looking for mathematically sound suggestions for producing the overall ranking.

There isn't an absolute measure of performance but I am imagining there is and that it's possible to produce a true ranking from Program 1 (P1) the best thru P200. Then each persons vote will consist of a sample of 2-10 of P[1,200] placed in their subjective order or ranking and combining all these mini-rankings will lead to a good approximation of the true overall ranking.

P1 will be voted a lot of times (say 40% of members will include P1 in their mini-ranking) and it will often be in the #1 position of the mini-rankings. P2 may have more or less members rank it but the general trend is P[n] will have a trend of decreasing as n increases.

So, I have a reasonable maths background but am rusty and rigorous proof or reasoning for a good algorithm of something like this is beyond me.

My thoughts so far:

If we're dealing with say P1-P20 (where there will be a statistically high number of members include those programs somewhere in their mini-rankings), we could probably look at the percentage of members that voted P[x] as #1 to get P1. However, this seems to be throwing away most of the information in the mini-rankings.

So I imagine where the number of times P1-P20 are mini-ranked is fairly evenly distributed, then we have what I think of as a 'ladder' scoring system (e.g. as used in chess) where if P1 > P2 in someones ranking (I'm using > as 'better') , P1 goes up one notch in the rankings and P2 goes down one notch. After many 'matches' between P[x] and P[y] an overall ranking takes shape. And I think this would work IFF P1-P20 appeared equally often in the mini-rankings.

(Sorry if this is too verbose, I'm trying to be as clear as possible)

A mini-ranking where a member had ranked 10 programs is essentially 9 comparisons of programs taking their MR1 with MR2 (mini-ranking n), MR2 with MR3, MR3 and MR4 but it's also 45 comparisons in total so intuitively there is lot of information available.

One of my problems in thinking about it is how to give weightings. If 1000 (out of 2000 people that included P1) put P1 as their MR1 and the remainder put it as MR2 or MR3, we know that program is definitely #1, #2 or #3 in the ficticious 'true ranking'. But if P200 was included in 10 mini-rankings and was MR1 in each case, I know that P200 is 'unlikely' to be #1, #2, or #3 in the true ranking, even though 100% of the members that voted thought so.

If P4 always appears above P7 in the mini rankings, one thing we can say with high confidence is that P4 is above P7 in the true ranking. So that leads me to think that maybe if I built up a table of comparisons of P[x] vs P[y] where y > x and the percentage of times P[x] 'won' I could combine that data into a good overall ranking. However, I can also see that I would have to weight these results for the number of times P[x] was ranked against P[y] - it's no use if only one person had mini-ranked P200 > P1 so the percentage is 100%

For the latter case above, intuitively we'd see that was an aberration but I need to do that mathematically or algorithmically. Although P1 vs P200 may have only been voted once, P200 will have been mini-ranked many times against other P[n] and most times 'lost' so it should be clear with an algorithmic approach that the lone P1 vs P200 mini-ranking should have low weight in the overall ranking.

I will have a practical problem that some of those rows will be empty as we get to the progressively poorer programs.


Sorry once again for being verbose. Were my maths better I'm sure I could have stated the problem in mathematically strict notation and been less verbose but as you can tell I am even struggling to state my objective rigoroursly.

Can anyone give me insights, hints, point me to similar problems (or just send me some computer code to do it lol).

Many thanks to anyone that takes the time to read and consider this.
troyw is offline  
 
February 14th, 2014, 11:58 PM   #2
Newbie
 
Joined: Feb 2014

Posts: 3
Thanks: 0

Re: Real world problem, need help

Just spotted this post: viewtopic.php?f=24&t=46081
looks similar - I will do some reading...
troyw is offline  
February 15th, 2014, 04:36 AM   #3
Senior Member
 
Joined: Oct 2013
From: New York, USA

Posts: 673
Thanks: 88

Re: Real world problem, need help

You could rank each program by the average rank among people who ranked it. The problem is that if two programs were each ranked by about 10 percent of the people, one program could beat another in all of those comparisons but the losing program might have won if everybody had to compare those two programs. I don't know if there is a good answer. It's probably too late now, but you could have randomly divided the programs into groups with about equal numbers of programs and done multiple rounds of voting to determine a winner.
EvanJ is offline  
February 15th, 2014, 04:56 AM   #4
Newbie
 
Joined: Feb 2014

Posts: 3
Thanks: 0

Re: Real world problem, need help

What may not have come across clearly is that each member's mini-ranking is not absolute. They may be a member of 10 programs but their #1 is actually #20 or lower in the 'true' ranking. So what they put as #1 is not that interesting, the data comes from the fact they think #1 is better than #2 - they are qualified to say that even if they're not a member of the best 20 programs and so not qualified to say their #1 should be THE overall #1 - make sense?
troyw is offline  
Reply

  My Math Forum > College Math Forum > Advanced Statistics

Tags
problem, real, world



Thread Tools
Display Modes


Similar Threads
Thread Thread Starter Forum Replies Last Post
Real-world production problem billymac00 Applied Math 5 December 10th, 2013 02:15 PM
Real world problem. TeRr0rP1n Elementary Math 3 May 2nd, 2013 01:40 AM
Real World Newtons Law of Cooling Problem lndwarrior Calculus 1 April 30th, 2013 09:54 AM
Real world problem - weight on an incline long_quach Applied Math 7 April 5th, 2013 08:03 PM
A real world problem FreaKariDunk Elementary Math 5 April 4th, 2012 07:06 AM





Copyright © 2019 My Math Forum. All rights reserved.