Search DSS 
Home Online Help Analysis Lag Selection Lag Selection in Time Series DataWhen running regressions on timeseries data, it is often important to include lagged values of the dependent variable as independant variables. In technical terminology, the regression is now called a vector autoregression (VAR). For example, when trying to sort out the dterminants of GDP, it is likely that last year's GDP is correlated with this year's GDP. If this is the case, GDP lagged for at least one year should be included on the righthand side of the regression. If the variable in question is persistentthat is, values in the far past are still affecting today's valuesmore lags will be necessary. In order to determine how many lags to use, several selection criteria can be used. The two most common are the Akaike Information Criterion (AIC) and the Schwarz' Bayesian Information Criterion (SIC/BIC/SBIC). These rules choose lag length j to minimize: log(SSR(j)/n) + (j + 1)C(n)/n, where SSR(j) is the sum or squared residuals for the VAR with j lags and n is the number of observations; C(n) = 2 for AIC and C(n) = log(n) for BIC. Fortunately, in Stata 8 there is a single command that will do the math for any number of specified lags: varsoc. To get the AIC and BIC, simply type 'varsoc depvar' in the command window. The default number of lags Stata checks is 4; in order to check a different number, add ', maxlags(#oflags)' after the 'varsoc depvar'. If, in addition, the regression has independent variables other than the lags, include those after the 'maxlag()' option by typing 'exog(varnames)'. The output will indicate the optimal lag number with an asterisk. Then proceed to run the regression using the specified number of lags on the dependent variable on the righthand side with the other independent variables. Example: varsoc y, maxlag(5) exog(x z) Selection order criteria endogenous variables: y exogenous variables: x z constant included in models Sample: 6 20 Obs = 15  lag LL LR df p FPE AIC HQIC SBIC  0 45.854 . . . 39.70191 6.51381 6.5123 6.65542 1 35.849 20.009* 1 0.000 12.04354* 5.31319* 5.31118* 5.50201* 2 35.837 0.024 1 0.877 13.92282 5.44493 5.44241 5.68094 3 35.305 1.063 1 0.302 15.13169 5.50737 5.50435 5.79059 4 35.233 0.145 1 0.703 17.66201 5.63103 5.62751 5.96145 5 35.108 0.250 1 0.617 20.7534 5.74767 5.74365 6.1253 From this output, it is clear that the optimal number of lags is 1, so the regression should look like: reg y l.y x z (For further options with the varsoc command, see the TimeSeries Stata manual.) For more on lag selection please check Time
Series 101
