
Advanced Statistics Advanced Probability and Statistics Math Forum 
 LinkBack  Thread Tools  Display Modes 
January 5th, 2017, 07:47 AM  #1 
Newbie Joined: Jan 2015 From: hong kong Posts: 3 Thanks: 0  Predictive modelling and understanding how important your last data point is
Hi all. Using this as a simple example – There is a fairly basically computer game called Mr Jump. The aim of the game is to complete the level by finishing the course. Your game performance is measured in percentage terms, if you complete the level you get to 100%. If you “die” a third of the way through you have completed 33%. The level will be completed by jumping at the correct time negotiating an obstacle course that is always the same. Obstacles that can get in the way are things like walls to jump over / big jumps / short jumps / series of successive jumps etc etc Each time you play the game you learn something new, so the more you play the more likely you are to complete the course. However, there are times when you play and due to a misclick or a loss of concentration you can easily “die” early. In the first 25 attempts, your % performance is as follows 11 16 28 13 29 30 20 21 19 62 35 40 7 45 28 40 50 14 63 38 42 15 42 55 51 As you can see from the number you are slowly improving. I have got the following questions, and would like to know the appropriate maths that need to be applied to find out the answer 1. Predicting. How long will it take (statistically to beat the game)? 2. Weighting. How much do you learn each time you play?? How much more important is the 25th performance compared with the 24th? And how much more is the 24th compared with the 23rd etc Any info would be greatly appreciated. Thanks Marco 
January 5th, 2017, 06:37 PM  #2 
Senior Member Joined: Oct 2013 From: New York, USA Posts: 573 Thanks: 79 
I'm not guaranteeing that a least squares regression line produces the best answer, but the regression equation is % performance = 0.2347998797*attempt number + 5.354915917. To the nearest whole number, it would take 403 attempts to beat the game. Note that a regression equation would predict impossible values over 100 if there were enough attempts. For example, the equation would predict 240% on the 1,000th attempt.

January 8th, 2017, 08:46 AM  #3 
Newbie Joined: Jan 2015 From: hong kong Posts: 3 Thanks: 0 
Hi Can I please ask. What time of regression did you use there? Using a simple linear regression I am getting 100% on the 72nd attempt using y = 1.1531x + 17.57 
January 31st, 2017, 08:51 AM  #4 
Senior Member Joined: Dec 2012 From: Hong Kong Posts: 853 Thanks: 311 Math Focus: Stochastic processes, statistical inference, data mining, computational linguistics  I got the same result. I don't know if this is still helpful (you might have beat the game already?) but a 95% prediction interval would be (55.62466861, 87.34931404).

January 31st, 2017, 08:54 AM  #5 
Senior Member Joined: Dec 2012 From: Hong Kong Posts: 853 Thanks: 311 Math Focus: Stochastic processes, statistical inference, data mining, computational linguistics  This doesn't directly answer your question, but you might want to look into Studentised residuals, DFFITS, DFBETAS and Cook's distance, which measure how influential a data point is on your model.
Last edited by 123qwerty; January 31st, 2017 at 08:58 AM. 

Tags 
data, important, modelling, point, predictive, understanding 
Thread Tools  
Display Modes  

Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Tricky Bayesian question using posterior predictive distributions, help!!!?  laurenblair325  Advanced Statistics  0  February 8th, 2015 07:32 AM 
Difficulty understanding pointset topology theorem  LordofthePenguins  Topology  8  July 9th, 2013 01:44 PM 
Modelling real life data  n3rdwannab3  Algebra  1  February 25th, 2013 11:56 AM 
Modelling Data Help!  jimmy_neutron987  Algebra  5  February 18th, 2013 11:52 AM 
How to find the percentiles for certain data point?  Knight  Advanced Statistics  1  February 8th, 2009 10:36 AM 