
Advanced Statistics Advanced Probability and Statistics Math Forum 
 LinkBack  Thread Tools  Display Modes 
July 16th, 2010, 04:07 AM  #1 
Newbie Joined: Jul 2010 Posts: 5 Thanks: 0  How to calculate simple linear model for noisy data?
I have data about the number of a website's hits per day. This data is very noisy. I've seen someone calculate a stepwise linear model in order to simplify the graph and highlight significant changes  like this: . What do you think is the right approach / method / tool to generate the linear model (red line)? PS: I posted in this forum because I assume statistics is involved in the process. 
July 16th, 2010, 05:13 AM  #2 
Global Moderator Joined: Nov 2006 From: UTC 5 Posts: 16,046 Thanks: 938 Math Focus: Number theory, computational mathematics, combinatorics, FOM, symbolic logic, TCS, algorithms  Re: How to calculate simple linear model for noisy data?
I think a better method would be to plot a bestfit line through the data. Excel can do that, as can (for example) R.

July 18th, 2010, 01:28 AM  #3 
Newbie Joined: Jul 2010 Posts: 5 Thanks: 0  Re: How to calculate simple linear model for noisy data?
The stepwise approach can be accommodated, but you need to make some assumptions to get it done. One set of assumptions: The steps occur at set breaks in time. With this assumption you only need to do a constant regression on each of the subsets of data (which is just using the mean to estimate the data on each of the subintervals. However, this is not satisfying. Since your data is a time series, there is a model which is I believe more appropriate for piecewise constant regression. Use an average of the last N data points. This will smooth the data. Jumps in the average will become more apparent. Use these jumps to determine where the breaks between piecewise approximations should occur, and then use the average on each interval to perform the regression there. Note this second formulation is NOT a linear regression as the regression does not change linearly in the data (due to movements in the jump points). 
July 18th, 2010, 04:20 AM  #4 
Newbie Joined: Jul 2010 Posts: 5 Thanks: 0  Re: How to calculate simple linear model for noisy data?
Thank you for your great post. I will definitely do some reading on piecewise constant regression. I agree with you that in order to use the very simplified linear model it must be assumed that "the mean of the data basically stays the same and only changes from time to time". The core problem here of course is to find where these breaks happen (it will later be used in 'news'like fashion, notifying the user about these breaks). I guess I could use threshold and "jumps in the average will become more apparent"  or is there a better way? By the way, on the slide where I saw this the author also wrote down the confidence level for each break. How might that fit into the model? 
July 22nd, 2010, 05:54 AM  #5 
Newbie Joined: May 2009 Posts: 25 Thanks: 0  Re: How to calculate simple linear model for noisy data?
Do you expect any seasonality in your data....for example is Saturdays volume likely to be greater than Monday each week. If so it is wrong to attempt to put a trendline line through your raw data  instead you should attempt to seasonally adjust the data first. A simple way of doing that is create a 7day rolling average (7day if you think your data is weekly seasonal) then try to fit that to a linear model. Why do you expect the underlying trend in your data to be linear by the way  maybe the data reflects the amount of promotion you are getting  if that promotion is done in bursts then I might expect a sawtooth shape to your underlying data. 

Tags 
calculate, data, linear, model, noisy, simple 
Thread Tools  
Display Modes  

Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Given the following data calculate the following...  pnf123  Applied Math  1  March 29th, 2014 04:53 AM 
Linear programming model  mbcpineda  Linear Algebra  1  November 23rd, 2012 07:43 AM 
Linear Model Problem (easy)  boomer029  Algebra  3  March 19th, 2012 05:58 PM 
Linear probability model  donald coolme  Advanced Statistics  0  April 14th, 2011 03:32 AM 
THE NONLINEAR VIBRATION MODEL  coth123  Calculus  2  February 16th, 2007 11:42 AM 