Statcon: Why we need the lower-order-terms ...

This is the second article in our series about non-hierarchical models. Here i want to point out the problem of non-hierarchical regression models meaning models that include some higher-order-terms without the corresponding lower-order-terms.

Have you ever received such a message when fitting a linear model?

Design Expert - Warning

Or this?

JMP Warning

When you face these warnings you probably tried to fit a model containing higher-order-terms without using the corresponding main effects. Typical examples are models containing two-factor-interactions ($x_1*x_2$) without using the main effects in the model ($x_1$ and $x_2$ itself). The same applies to polynomial models containing higher order effects (quadratic, cubic, ...) leaving out the linear term.

So what is the problem of non-hierarchical models? Why these warnings? Why do we usually recommend to respect model-hierarchy? The main argument is the following:

Non-hierarchical models aren't invariant versus location shifts

Assume we are using a non-hierarchical model like $taste_i = \beta_0 + \beta_1 temp_i*time_i$. We have a response $taste$ and two predictors $temp$ and $time$. The model only uses the interaction of both predictors to explain the response.

temp	time	taste	ctemp	ctime
°C	min	---	°C	min
190	10	2	5.8	-11.7
195	15	5	10.8	-6.7
200	20	3	15.8	-1.7
175	30	8	-9.2	8.3
180	30	6	-4.2	8.3
165	25	4	-19.2	3.3

Lets start with analysing the full model: $taste_i = b_0 + b_1 \cdot temp_i + b_2 \cdot time_i + b_{12} \cdot temp_i\cdot time_i + \epsilon_i$

The estimated model is:

But the vifs seem to be rather problematic:

	VIF	VIF centered
temp	358	2.54
time	8990	2.75
*temptime**	7074	1.72

To avoid the problem of multicollinearity let us center the factors. Then recalculate the model:

Of course the main effect estimates change but the estimate and p-value of the interaction are still the same! The VIFs are all below 3 now.

Results

Use centered data as it removes a multicollinearity problem.
The 2-factor-interaction is significant.
The centered temperature factor is not significant.

The question is: May we remove the main-effect temperature now? Go one step further: Let us figure out what happens if we remove both main effects.

As we are using only one factor now we do not care for multicollinearity any more. So we might use the original data.
Estimate the pure interaction model: $taste_i = b_0 + b_{12} temp_i \cdot time_i + \epsilon_i$.

The estimated model is the following:

Non-hierarchical Model (computed with R)

The two-factor-interaction seems to be significant at a level of significance of 0.1 (typical level of significance in a screening situation).

Finally estimate the same model for the centered data (just for comparison):

Non-hierarchical Model for centered data (in R)

As you can see the results change heavily. While we did not touch the relation between $temp,time$ and $taste$ (we only subtracted the mean of each variable) the interaction in the first model is significant it is not in the second.

What happened here and why might this be a problem?

Mathematical motivation

If you are using an interaction-model without the main effects, the model is not invariant to location shifts of the factors. If the factors in the interaction model $y_i = \beta_0 + \beta_1 x_i*z_i$ are centered the model is extended to some kind of full model, as: $$ y_i = \beta_0 + \beta_1 (x_i - \bar{x})*(z_i - \bar{z}) = $$
$$ y_i = \beta_0 + \beta_1 (x_i*z_i - \bar{x}*z_i - x_i*\bar{z} + \bar{x}\bar{z})$$
You see, that this new model contains the pure main effects in the terms $z_i*\bar{x}$ and $x_i*\bar{z}$ as $\bar{x}$ and $\bar{z}$ are only constants.

Why is this bad?

Two simple arguments:

1. Especially in the presence of quadratic effects we often want to center variables to avoid multi-colinearity-problems. So we will often be in the situation that this problem occurs.
2. Inference should be independent from units. Think of a temperature as a predictor: There shouldn't be a difference in the models if you change degree Celsius to degree Kelvin. But this is exactly what happens in non-hierarchical models.

Literature
[1.] Discussion on CrossValidated: Link

2013-01-09

Why we need the lower-order-terms ...

Keine Kommentare:

Kommentar veröffentlichen