Case Study - Regression - Introduction
This case study demonstrates a simple regression, investigating the relationship between clothing brand Benetton's advertising expenditure and its sales.
This data can be imported using R using the following code:
Benetton <- data.frame(sales = c(1261.08,1475.28,1657.52,2059.05,2303.76,2512.64,2751.46,2787.67,2939.13), advertising = c(43.60,50.44,59.01,66.30,82.36,92.15,100.51,110.06,111.51))
Fitting the linear regression
A linear regression of this data is created as follows:
- Insert > Regression (Analysis).
- Drag sales across as the Outcome variable.
- Drag Advertising across as one of the Predictor(s) variable.
- Press Calculate.
Interpreting the regression
The regression in Displayr will give you the following table:
Looking at the coefficients (in the Estimate column), we have estimated that the relationship between sales and advertising is::
Sales = 313 + 24Advertising
Thus, the regression indicates that:
- If Benetton conducts no advertising it will have sales of 313,000,000. (The unit of data is millions of dollars).
- For every dollar that Benetton spends on advertising, it will gain an extra $24 of sales.
Validating the regression
With a simple regression, the easiest way of understanding if the model is good or not is to plot the data and the model. We can do this using Insert > Advanced > Regression > Plot > Prediction Plot. The dots on the plot show the observed data. The dashed red line shows the prediction. All the points are close to the dashed line, suggesting that the model is a good model.
The R-Square statistic, which appears in the regression output as Multiple R Squared is very high, at 0.9768. This indicates that the regression model has a near perfect fit to the data. Thus, if the assumptions of the model are appropriate, the model will provide a highly accurate way for predicting sales based on advertising.
As the regression model has produced no warnings, this indicates that the model has passed various Automated Test of Regression Models.
However, in this case the model is not valid. It should not be relied upon for drawing any conclusions about how sales are influenced by advertising. There are two clues to this:
- If the model is true, the implication is that Benetton should spend every dollar it can obtain on advertising. This is not a realistic prediction. However, this on its own does not invalidate the model. Most real-world models are guilty of similar interpretative problems, and can still provide useful interpretations provided that interpretation are limited to data similar to that used to create the model (i.e., as the advertising data expenditure is in the range of about 40 to 120, the model should only be relied upon for predictions within that range).
- The fit of the model is too good. Such high values of R-Square almost always signify some kind of error in the assumptions.
An investigation into how Benetton sets its advertising expenditure revealed that its advertising was always set at 4% of the previous year's sales. Thus, a core assumption of the model - that sales is caused by advertising - is not correct.