Princeton University Library Data and Statistical 

Search DSS

Finding Data Analyzing Data Citing data

About Us

DSS lab consultation schedule
Sep 1-Nov 3By appt. here
Nov 6-Dec 15Walk-in, 2-5 pm*
Dec 18-Feb 2By appt. here
Feb 5-May 4Walk-in, 1-5 pm*
May 7-May 15Walk-in, 2-5 pm*
May 16-Aug 31By appt. here
For quick questions email
*No appts. necessary during walk-in hrs.
Note: the DSS lab is open as long as Firestone is open, no appointments necessary to use the lab computers for your own analysis.

Follow DssData on Twitter
See DSS on Facebook

Home Online Help Analysis Panel Data

Panel Data


Panel data, also called longitudinal data or cross-sectional time series data, are data where multiple cases (people, firms, countries etc) were observed at two or more time periods. An example is the National Longitudinal Survey of Youth, where a nationally representative sample of young people were each surveyed repeatedly over multiple years.

There are two kinds of information in cross-sectional time-series data: the cross-sectional information reflected in the differences between subjects, and the time-series or within-subject information reflected in the changes within subjects over time. Panel data regression techniques allow you to take advantage of these different types of information.

While it is possible to use ordinary multiple regression techniques on panel data, they may not be optimal. The estimates of coefficients derived from regression may be subject to omitted variable bias - a problem that arises when there is some unknown variable or variables that cannot be controlled for that affect the dependent variable. With panel data, it is possible to control for some types of omitted variables even without observing them, by observing changes in the dependent variable over time. This controls for omitted variables that differ between cases but are constant over time. It is also possible to use panel data to control for omitted variables that vary over time but are constant between cases.

Using Panel Data in Stata

A panel dataset should have data on n cases, over t time periods, for a total of n × t observations. Data like this is said to be in long form. In some cases your data may come in what is called the wide form, with only one observation per case and variables for each different value at each different time period. To analyze data like this in Stata using commands for panel data analysis, you need to first convert it to long form. This can be done using Stata's reshape command. For assistance in using reshape, see Stata's online help or this web page.

Stata provides a number of tools for analyzing panel data. The commands all begin with the prefix xt and include xtreg, xtprobit, xtsum and xttab - panel data versions of the familiar reg, probit, sum and tab commands.

To use these commands, first tell Stata that your dataset is panel data. You need to have a variable that identifies the case element of your panel (for example, a country or person identifier) and also a time variable that is in Stata date format. For information about Stata's date variable formats, see our Time Series Data in Stata page.

Sort your data by the panel variable and then by the date variable within the panel variable. Then you need to issue the tsset command to identify the panel and date variables. If your panel variable is called panelvar and your date variable is called datevar, the commands needed are:

	. sort panelvar datevar
	. tsset panelvar datevar 

If you prefer to use menus, use the command under Statistics > Time Series > Setup and Utilities > Declare Data to be Time Series.

Fixed, Between and Random Effects models

Fixed Effects Regression

Fixed effects regression is the model to use when you want to control for omitted variables that differ between cases but are constant over time. It lets you use the changes in the variables over time to estimate the effects of the independent variables on your dependent variable, and is the main technique used for analysis of panel data.

The command for a linear regression on panel data with fixed effects in Stata is xtreg with the fe option, used like this:

	xtreg dependentvar independentvar1 independentvar2 independentvar3 ... , fe

If you prefer to use the menus, the command is under Statistics > Cross-sectional time series > Linear models > Linear regression.

This is equivalent to generating dummy variables for each of your cases and including them in a standard linear regression to control for these fixed "case effects". It works best when you have relatively fewer cases and more time periods, as each dummy variable removes one degree of freedom from your model.

Between Effects

Regression with between effects is the model to use when you want to control for omitted variables that change over time but are constant between cases. It allows you to use the variation between cases to estimate the effect of the omitted independent variables on your dependent variable.

The command for a linear regression on panel data with between effects in Stata is xtreg with the be option.

Running xtreg with between effects is equivalent to taking the mean of each variable for each case across time and running a regression on the collapsed dataset of means. As this results in loss of information, between effects are not used much in practice. Researchers who want to look at time effects without considering panel effects generally will use a set of time dummy variables, which is the same as running time fixed effects.

The between effects estimator is mostly important because it is used to produce the random effects estimator.

Random Effects

If you have reason to believe that some omitted variables may be constant over time but vary between cases, and others may be fixed between cases but vary over time, then you can include both types by using random effects. Stata's random-effects estimator is a weighted average of fixed and between effects.

The command for a linear regression on panel data with random effects in Stata is xtreg with the re option.

Choosing Between Fixed and Random Effects

The generally accepted way of choosing between fixed and random effects is running a Hausman test.

Statistically, fixed effects are always a reasonable thing to do with panel data (they always give consistent results) but they may not be the most efficient model to run. Random effects will give you better P-values as they are a more efficient estimator, so you should run random effects if it is statistcally justifiable to do so.

The Hausman test checks a more efficient model against a less efficient but consistent model to make sure that the more efficient model also gives consistent results.

To run a Hausman test comparing fixed with random effects in Stata, you need to first estimate the fixed effects model, save the coefficients so that you can compare them with the results of the next model, estimate the random effects model, and then do the comparison.

	. xtreg dependentvar independentvar1 independentvar2 independentvar3 ... , fe
	. estimates store fixed
	. xtreg dependentvar independentvar1 independentvar2 independentvar3 ... , re
	. estimates store random
	. hausman fixed random

The hausman test tests the null hypothesis that the coefficients estimated by the efficient random effects estimator are the same as the ones estimated by the consistent fixed effects estimator. If they are (insignificant P-value, Prob>chi2 larger than .05) then it is safe to use random effects. If you get a significant P-value, however, you should use fixed effects.

Further Reading

Panel Data Analysis (fixed & random effects)
Multilevel Analysis
Further detail on the xtreg commands from
A technical description of the different options for the xtreg command.

Between estimators from
A discussion comparing the between estimator to the random effects estimator.

Testing for panel-level heteroskedasticity and autocorrelation from
Includes a user-written command that performs a simple test for serial correlation.

Introduction to Econometrics by James H. Stock and Mark W. Watson, 2003
This text has a good discussion of the theory behind panel data analysis, and was used in the preparation of this page. See in particular Chapter 8, Regression with Panel Data.

This page last updated on: