My Math Forum  

Go Back   My Math Forum > College Math Forum > Advanced Statistics

Advanced Statistics Advanced Probability and Statistics Math Forum


Reply
 
LinkBack Thread Tools Display Modes
July 16th, 2010, 04:07 AM   #1
Newbie
 
Joined: Jul 2010

Posts: 5
Thanks: 0

How to calculate simple linear model for noisy data?

I have data about the number of a website's hits per day. This data is very noisy. I've seen someone calculate a stepwise linear model in order to simplify the graph and highlight significant changes - like this:
.

What do you think is the right approach / method / tool to generate the linear model (red line)?

PS: I posted in this forum because I assume statistics is involved in the process.
stephanos is offline  
 
July 16th, 2010, 05:13 AM   #2
Global Moderator
 
CRGreathouse's Avatar
 
Joined: Nov 2006
From: UTC -5

Posts: 16,046
Thanks: 938

Math Focus: Number theory, computational mathematics, combinatorics, FOM, symbolic logic, TCS, algorithms
Re: How to calculate simple linear model for noisy data?

I think a better method would be to plot a best-fit line through the data. Excel can do that, as can (for example) R.
CRGreathouse is offline  
July 18th, 2010, 01:28 AM   #3
Newbie
 
Joined: Jul 2010

Posts: 5
Thanks: 0

Re: How to calculate simple linear model for noisy data?

The stepwise approach can be accommodated, but you need to make some assumptions to get it done.

One set of assumptions:

The steps occur at set breaks in time.

With this assumption you only need to do a constant regression on each of the subsets of data (which is just using the mean to estimate the data on each of the subintervals.

However, this is not satisfying. Since your data is a time series, there is a model which is I believe more appropriate for piecewise constant regression.

Use an average of the last N data points. This will smooth the data. Jumps in the average will become more apparent. Use these jumps to determine where the breaks between piecewise approximations should occur, and then use the average on each interval to perform the regression there. Note this second formulation is NOT a linear regression as the regression does not change linearly in the data (due to movements in the jump points).
CLST is offline  
July 18th, 2010, 04:20 AM   #4
Newbie
 
Joined: Jul 2010

Posts: 5
Thanks: 0

Re: How to calculate simple linear model for noisy data?

Thank you for your great post. I will definitely do some reading on piecewise constant regression.

I agree with you that in order to use the very simplified linear model it must be assumed that "the mean of the data basically stays the same and only changes from time to time". The core problem here of course is to find where these breaks happen (it will later be used in 'news'-like fashion, notifying the user about these breaks). I guess I could use threshold and "jumps in the average will become more apparent" - or is there a better way?

By the way, on the slide where I saw this the author also wrote down the confidence level for each break. How might that fit into the model?
stephanos is offline  
July 22nd, 2010, 05:54 AM   #5
Newbie
 
Joined: May 2009

Posts: 25
Thanks: 0

Re: How to calculate simple linear model for noisy data?

Do you expect any seasonality in your data....for example is Saturdays volume likely to be greater than Monday each week.
If so it is wrong to attempt to put a trendline line through your raw data - instead you should attempt to seasonally adjust the data first. A simple way of doing that is create a 7-day rolling average (7-day if you think your data is weekly seasonal) then try to fit that to a linear model. Why do you expect the underlying trend in your data to be linear by the way - maybe the data reflects the amount of promotion you are getting - if that promotion is done in bursts then I might expect a saw-tooth shape to your underlying data.
CarlPierce is offline  
Reply

  My Math Forum > College Math Forum > Advanced Statistics

Tags
calculate, data, linear, model, noisy, simple



Thread Tools
Display Modes


Similar Threads
Thread Thread Starter Forum Replies Last Post
Given the following data calculate the following... pnf123 Applied Math 1 March 29th, 2014 04:53 AM
Linear programming model mbcpineda Linear Algebra 1 November 23rd, 2012 07:43 AM
Linear Model Problem (easy) boomer029 Algebra 3 March 19th, 2012 05:58 PM
Linear probability model donald coolme Advanced Statistics 0 April 14th, 2011 03:32 AM
THE NON-LINEAR VIBRATION MODEL coth123 Calculus 2 February 16th, 2007 11:42 AM





Copyright © 2019 My Math Forum. All rights reserved.