|July 13th, 2016, 02:46 AM||#1|
Joined: Jul 2016
I have a data mining complex task with Dog Racing. The information is as below:
Every race has 6 dogs running and all of them finish the race always.
The race time is always 45 seconds and between 2 races there is in total 241 seconds. [It means the time has no effect in the data since is always the same.]
The odds for each dog are fixed during 1 race, they don't change.
I have the information for all winners and second places for a period of 3 months (History data). The other dogs rank is not shown.
Regarding the odds, I have noticed that in total there are:
The odds for winners are inside the range 2.00 - 10.60
The odds for the combination winner + second place inside a range 10.50 - 134.00
In total there are 995 possible combination of odds for all dogs(there are 36 odds(6x6 because it is the winner and the second place) for each of these 995 possibilities).
How can i predict the winner and the second place based on this historical data?
|July 13th, 2016, 05:35 AM||#2|
Joined: Apr 2014
Do you have the climate details at the track for each race and the hardness of the ground? Dog weights?
I would suggest that the odds are irrelevant to the question, otherwise you'd just pick the 1st and 2nd place based on the lowest and 2nd lowest odds. All the historical odds would tell you is when the bookies got the places right.
I'd be looking at which dog beat which and under what conditions.
I'd also look at the individual finishing times and 'predict' their times based on the conditions at the time of the race. Do the dogs race more than once?
I don't know where your 995 combinations comes from, but I can say the 1/36 odds is incorrect, there is 1/6 chance for the first place, 1/5 for the 2nd place, that's odds of 1 in 30.
|July 14th, 2016, 10:32 AM||#3|
Joined: Jul 2016
Thanks for your reply. I was wrong on that 36, you are right it is 30. In total there are 995*30=29850 different possible races. Regarding the 995 raws i made a search based on the odds and on a total of 65000 races there are just 995 different raws. (Even if you search in around 5000 thousand consecutive entries you see that there are 995 different raws.) So the odds are repeated but the winners and second place is changing.
There is no dogs weight, no finish time, no ground, no race name, no name just dog 1,2,3,4,5,6 number. (The dimensions are: odds, dog nr, race_nr, race_time, first place and second place.)
It looks to me like there are fixed odds (995 raws) in total and there is just a change of winners and seconds places based on a specific algorithm to be sure that it is guaranteed the win of the booking company. It looks like the data are faked and not real, just produced to stimulate a dog race for a booking company.
Have a look on the odds data (attached image) to create a better idea.
D11 -> Dog 1 odd to win
D12 -> Dog 1 to win and Dog 2 second place odd
D22 -> Dog 2 winner odd
D24 -> Dog 2 to win and Dog 4 second place odd
|Thread||Thread Starter||Forum||Replies||Last Post|
|What can you say about the data?||shunya||Probability and Statistics||2||January 1st, 2016 02:14 PM|
|What is the best fit for this data||nekdolan||Calculus||8||December 10th, 2014 02:36 AM|
|Rolling 3 months YOY data to Monthly YOY data||lumpa||Real Analysis||0||October 19th, 2012 08:07 AM|
|Is formalization of data mining possible?||Deb_D||Applied Math||19||February 23rd, 2011 09:49 AM|
|How to regress actual data towards projected data.||BigLRIP||Advanced Statistics||1||May 18th, 2009 10:01 AM|