This vignette explains the rational behind the demean() function.

We give recommendations how to analyze multilevel or hierarchical data structures, when macro-indicators (or level-2 predictors, or higher-level units, or more general: group-level predictors) are used as covariates and the model suffers from heterogeneity bias (Bell and Jones 2015).

Sample data used in this vignette

  • Variables:

    • QoL : Response (quality of life of patient)

    • phq4 : Patient Health Questionnaire, time-varying variable

    • hospital : Location of treatment, time-invariant variable, co-variate

    • education: Educational level, time-invariant variable, co-variate

    • ID : patient ID

    • time : time-point of measurement

Heterogeneity bias

Heterogeneity bias occurs when group-level predictors vary within and across groups, and hence fixed effects may correlate with group (or random) effects. This is a typical situation when analyzing longitudinal or panel data: Due to the repeated measurements of persons, the “person” (or subject-ID) is now a level-2 variable. Predictors at level-1 (“fixed effects”), e.g. self-rated health or income, now have an effect at level-1 (“within”-effect) and at higher-level units (level-2, the subject-level, which is the “between”-effect) (see also this posting). This inevitably leads to correlating fixed effects and error terms - which, in turn, results in biased estimates, because both the within- and between-effect are captured in one estimate.

You can check if your model may suffer from heterogeneity bias using the check_heterogeneity_bias() function:

check_heterogeneity_bias(qol_cancer, select = c("phq4", "education"), group = "ID")

Adressing heterogeneity bias: the Fixed Effects Regression (FE) approach

Fixed effects regression models (FE) are a popular approach for panel data analysis in particular in econometrics and considered as gold standard. To avoid the problem of heterogeneity bias, in FE all higher-level variance (and thus, any between-effects), “are controlled out using the higher-level entities themselves, included in the model as dummy variables” (Bell and Jones 2015). As a consequence, FE models are only able estimate within-effects.

To remove between-effects and only model within-effects, the data needs some preparation: de-meaning. De-meaning, or person-mean centering, or centering within clusters, takes away the higher-level mean from the regression equation, and as such, FE avoids estimating a parameter for each higher-level unit.

Computing the de-meaned and group-meaned variables

qol_cancer <- cbind(
  datawizard::demean(qol_cancer, select = c("phq4", "QoL"), group = "ID")

Now we have:

  • phq4_between: time-varying variable with the mean of phq4 across all time-points, for each patient (ID).

  • phq4_within: the de-meaned time-varying variable phq4.

A FE model is a classical linear model, where

  • Intercept is removed

  • time-invariant predictors are not allowed to be included

  • the group-level factor is included as predictor

  • time-varying predictors are de-meaned (“person-mean centered”, indicating the “within-subject” effect)

fe_model1 <- lm(
  QoL ~ 0 + time + phq4_within + ID,
  data = qol_cancer
# we use only the first two rows, because the remaining rows are
# the estimates for "ID", which is not of interest here...
model_parameters(fe_model1)[1:2, ]

# instead of removing the intercept, we could also use the
# de-meaned response...
fe_model2 <- lm(
  QoL_within ~ time + phq4_within + ID,
  data = qol_cancer
model_parameters(fe_model2)[2:3, ]

# we compare the results with those from the "lfe"-package for panel data
fe_model3 <- felm(
  QoL ~ time + phq4 | ID,
   data = qol_cancer

As we can see, the within-effect of PHQ-4 is -3.66, hence the mean of the change for an average individual case in our sample (or, the “net” effect), is -3.66.

But what about the between-effect? How do people with higher PHQ-4 score differ from people with lower PHQ-4 score? Or what about educational inequalities? Do higher educated people have a higher PHQ-4 score than lower educated people?

This question cannot be answered with FE regression. But: “Can one fit a multilevel model with varying intercepts (or coefficients) when the units and predictors correlate? The answer is yes. And the solution is simple.” (Bafumi and Gelman 2006)

Adressing heterogeneity bias: the Mixed Model approach

Mixed models include different levels of sources of variability (i.e. error terms at each level). Predictors used at level-1 that are varying across higher-level units will thus have residual errors at both level-1 and higher-level units. “Such covariates contain two parts: one that is specific to the higher-level entity that does not vary between occasions, and one that represents the difference between occasions, within higher-level entities” (Bell and Jones 2015). Hence, the error terms will be correlated with the covariate, which violates one of the assumptions of mixed models (iid, independent and identically distributed error terms) - also known and described above as heterogeneity bias.

But how can this issue be addressed outside the FE framework?

There are several ways how to address this using a mixed models approach:

  • Correlated group factors and predictors are no problem anyway, because partial pooling allows estimates of units o borrow strength from the whole sample and shrink toward a common mean (Shor et al. (2007)).

  • If predictor and group factors correlate, one can remove this correlation by group-meaning (or “mean within clusters”, Bafumi and Gelman 2006; Gelman and Hill 2007, Chap. 12.6.).

  • When time-varying predictors are “decomposed” into their time-varying and time-invariant components (demeaning), then mixed models can model both within- and between-subject effects (Bell, Fairbrother, and Jones 2019) - this approach is essentially a further development of a long-known recommendation by Mundlak (Mundlak 1978).

For now, we will follow the last recommendation and use the within- and between-version of phq4.

mixed_1 <- lmer(
  QoL ~ time + phq4_within + phq4_between + (1 | ID),
  data = qol_cancer

# compare to FE-model
model_parameters(fe_model1)[1:2, ]

As we can see, the estimates and standard errors are identical. The argument against the use of mixed models, i.e. that using mixed models for panel data will yield biased estimates and standard errors, is based on an incorrect model specification (Mundlak 1978). As such, when the (mixed) model is properly specified, the estimator of the mixed model is identical to the ‘within’ (i.e. FE) estimator.

As a consequence, we cannot only use the above specified mixed model for panel data, we can even specify more complex models including within-effects, between-effects or random effects variation. A mixed models approach can model the causes of endogeneity explicitly by including the (separated) within- and between-effects of time-varying fixed effects and including time-constant fixed effects.

mixed_2 <- lmer(
  QoL ~ time + phq4_within + phq4_between + education + (1 + time | ID),
  data = qol_cancer
# effects = "fixed" will not display random effects, but split the
# fixed effects into its between- and within-effects components.
model_parameters(mixed_2, effects = "fixed")

For more complex models, within-effects will naturally change slightly and are no longer identical to simpler FE models. This is no “bias”, but rather the result of building more complex models: FE models lack information of variation in the group-effects or between-subject effects. Furthermore, FE models cannot include random slopes, which means that fixed effects regressions are neglecting “cross-cluster differences in the effects of lower-level controls (which) reduces the precision of estimated context effects, resulting in (…) low statistical power” (Heisig, Schaeffer, and Giesecke 2017).

Conclusion: Complex Random Effects Within-Between Models

Depending on the structure of the data, the best approach to analyzing panel data would be a so called “complex random effects within-between” model (Bell, Fairbrother, and Jones 2019):

In R-code, the model is written down like this:

rewb <- lmer(
  QoL ~ time + phq4_within + phq4_between + education +
    (1 + time | ID) + (1 + phq4_within | ID),
  data = qol_cancer

What about time-constant predictors?

After demeaning time-varying predictors, “at the higher level, the mean term is no longer constrained by Level 1 effects, so it is free to account for all the higher-level variance associated with that variable” (Bell and Jones 2015).

Thus, time-constant categorical predictors, that only have a between-effect, can be simply included as fixed effects predictor (since they’re not constrained by level-1 effects). Time-constant continuous group-level predictors (for instance, GDP of countries) should be group-meaned, to have a proper “between”-effect (Gelman and Hill 2007, Chap. 12.6.).

The benefit of this kind of model is that you have information on within-, between- and other time-constant (i.e. between) effects or group-level predictors…

model_parameters(rewb, effects = "fixed")

… but you can also model the variation of (group) effects across time (and probably space), and you can even include more higher-level units (e.g. nested design or cross-classified design with more than two levels):

What about imbalanced groups, i.e. large differences in N per group?

See little example after this visual example…

A visual example

First, we generate some fake data that implies a linear relationship between outcome and independent variable. The objective is that the amount of typing errors depends on how fast (typing speed) you can type, however, the more typing experience you have, the faster you can type. Thus, the outcome measure is “amount of typing errors”, while our predictor is “typing speed”. Furthermore, we have repeated measurements of people with different “typing experience levels”.

The results show that we will have two sources of variation: Overall, more experienced typists make less mistakes (group-level pattern). When typing faster, typists make more mistakes (individual-level pattern).


n <- 5
b <- seq(1, 1.5, length.out = 5)
x <- seq(2, 2 * n, 2)

d <-, lapply(1:n, function(i) {
    x = seq(1, n, by = .2),
    y = 2 * x[i] + b[i] * seq(1, n, by = .2) + rnorm(21),
    grp = as.factor(2 * i)

d <- d %>%
  group_by(grp) %>%
  mutate(x = rev(15 - (x + 1.5 * as.numeric(grp)))) %>%

labs <- c("very slow", "slow", "average", "fast", "very fast")
levels(d$grp) <- rev(labs)

d <- cbind(d, datawizard::demean(d, c("x", "y"), group = "grp"))

Let’s look at the raw data…

Model 1: Linear relationship between typing errors and typing speed

We can now assume a (linear) relationship between typing errors and typing speed.

Looking at the coefficients, we have following model with a coefficient of -1.92.

m1 <- lm(y ~ x, data = d)

However, we have ignored the clustered structure in our data, in this example due to repeated measurements.

Model 2: Within-subject effect of typing speed

A fixed effects regression (FE-regression) would now remove all between-effects and include only the within-effects as well as the group-level indicator.

This returns the coefficient of the “within”-effect, which is 1.2, with a standard error of 0.07. Note that the FE-model does not take the variation between subjects into account, thus resulting in (possibly) biased estimates, and biased standard errors.

m2 <- lm(y ~ 0 + x_within + grp, data = d)
model_parameters(m2)[1, ]

Model 3: Between-subject effect of typing speed

To understand, why the above model 1 (m1) returns a biased estimate, which is a “weighted average” of the within- and between-effects, let us look at the between-effect now.

As we can see, the between-effect is -2.93, which is different from the -1.92 estimated in the model m1.

m3 <- lm(y ~ x_between, data = d)

Model 4: Mixed model with within- and between-subjects

Since FE-models can only model within-effects, we now use a mixed model with within- and between-effects.

We see, the estimate for the within-effects is not biased. Furthermore, we get the correct between-effect as well (standard errors differ, because the variance in the grouping structure is more accurately taken into account).

m4 <- lmer(y ~ x_between + x_within + (1 | grp), data = d)

Model 5: Complex Random-Effects Within-Between Model

Finally, we can also take the variation between subjects into account by adding a random slope. This model can be called a complex “REWB” (random-effects within-between) model. Due to the variation between subjects, we get larger standard errors for the within-effect.

m5 <- lmer(y ~ x_between + x_within + (1 + x_within | grp), data = d)

Balanced versus imbalanced groups

The “simple” linear slope of the between-effect (and also from the within-effect) is (almost) identical in “classical” linear regression compared to linear mixed models when the groups are balanced, i.e. when the number of observation per group is similar or the same.

Whenever group size is imbalanced, the “simple” linear slope will be adjusted. This leads to different estimates for between-effects between classical and mixed models regressions due to shrinkage - i.e. for larger variation of group sizes we find stronger regularization of estimates.

Hence, for mixed models with larger differences in number of observation per random effects group, the between-effect will differ from the between-effect calculated by “classical” regression models. However, this shrinkage is a desired property of mixed models and usually improves the estimates.

n <- 5
b <- seq(1, 1.5, length.out = 5)
x <- seq(2, 2 * n, 2)

d <-, lapply(1:n, function(i) {
    x = seq(1, n, by = .2),
    y = 2 * x[i] + b[i] * seq(1, n, by = .2) + rnorm(21),
    grp = as.factor(2 * i)

# create imbalanced groups
d$grp[sample(which(d$grp == 8), 10)] <- 6
d$grp[sample(which(d$grp == 4), 8)] <- 2
d$grp[sample(which(d$grp == 10), 9)] <- 6

d <- d %>%
  group_by(grp) %>%
  mutate(x = rev(15 - (x + 1.5 * as.numeric(grp)))) %>%

labs <- c("very slow", "slow", "average", "fast", "very fast")
levels(d$grp) <- rev(labs)

d <- cbind(d, datawizard::demean(d, c("x", "y"), group = "grp"))

# Between-subject effect of typing speed
m1 <- lm(y ~ x_between, data = d)

# Between-subject effect of typing speed, accounting for group structure
m2 <- lmer(y ~ x_between + (1 | grp), data = d)


Bafumi, Joseph, and Andrew Gelman. 2006. “Fitting Multilevel Models When Predictors and Group Effects Correlate.” In. Philadelphia, PA.

Bell, Andrew, Malcolm Fairbrother, and Kelvyn Jones. 2019. “Fixed and Random Effects Models: Making an Informed Choice.” Quality & Quantity 53: 1051–74.

Bell, Andrew, and Kelvyn Jones. 2015. “Explaining Fixed Effects: Random Effects Modeling of Time-Series Cross-Sectional and Panel Data.” Political Science Research and Methods 3 (1): 133–53.

Gelman, Andrew, and Jennifer Hill. 2007. Data Analysis Using Regression and Multilevel/Hierarchical Models. Analytical Methods for Social Research. Cambridge; New York: Cambridge University Press.

Heisig, Jan Paul, Merlin Schaeffer, and Johannes Giesecke. 2017. “The Costs of Simplicity: Why Multilevel Models May Benefit from Accounting for Cross-Cluster Differences in the Effects of Controls.” American Sociological Review 82 (4): 796–827.

Mundlak, Yair. 1978. “On the Pooling of Time Series and Cross Section Data.” Econometrica 46 (1): 69.

Shor, Boris, Joseph Bafumi, Luke Keele, and David Park. 2007. “A Bayesian Multilevel Modeling Approach to Time-Series Cross-Sectional Data.” Political Analysis 15 (2): 165–81.