Search DSS

DSS

Finding Data

Analyzing Data

Software

Stata
R

Getting Started
Consultants

Citing data

About Us

DSS lab consultation schedule
(Monday-Friday)

Sep 1-Oct 16	By Zoom appt. here
Oct 19-Dec 4	Virtual Zoom Walk-ins
Dec 7-Jan 31	By Zoom appt. here
Feb 1-April 30	Virtual Zoom Walk-ins
May 3-Aug 31	By appt. here

For quick questions email data@princeton.edu.
*No appts. necessary during walk-in hrs.
Note: the DSS lab is open as long as Firestone is open, no appointments necessary to use the lab computers for your own analysis.

Event Studies with Stata

An event study is used to examine reactions of the market to events of interest. A simple event study involves the following steps:

Cleaning the Data and Calculating the Event Window
Estimating Normal Performance
Calculating Abnormal and Cumulative Abnormal Returns
Testing for Significance
Testing Across All Events

This document is designed to help you conduct event studies using Stata. We assume that you already have data with a date variable, which we call "date", and a company identifier, which we have called "company_id". If you need to prepare your data or want to try out the commands with our sample data, go to data preparation page.

We also assume that you have a basic familiarity with Stata. If you need assistance with Stata commands, you can find out more about it here. Your task will be much easier if you enter the commands in a do file, which is a text file containing a list of Stata commands.

Cleaning the data and Calculating the Event and Estimation Windows

It's likely that you have more observations for each company than you need. It's also possible that you do not have enough for some. Before you can continue, you must make sure that you will be conducting your analyses on the correct observations. To do this, you will need to create a variable, dif, that will count the number of days from the observation to the event date. This can be either calendar days or trading days.

For number of trading days:

sort company_id date
by company_id: gen datenum=_n
by company_id: gen target=datenum if date==event_date
egen td=min(target), by(company_id)
drop target
gen dif=datenum-td

For calendar days:

gen dif=date-event_date

As you can see, calculating the number of trading days is a little trickier than calendar days. For trading days, we first need to create a variable that counts the number of days within each company_id. Then we determine which observation occurs on the event date. We create a variable with the event date's day number on all of the observations within that company_id. Finally, we simply take the difference between the two, creating a variable, dif, that counts the number of days between each individual observation and the event day. Next, we need to make sure that we have the minimum number of observations before and after the event date, as well as the minimum number of observations before the event window for the estimation window. Let's say we want 2 days before and after the event date (a total of 5 days in the event window) and 30 days for the estimation window. (You can of course change these numbers to suit your analysis.)

by company_id: gen event_window=1 if dif>=-2 & dif<=2
egen count_event_obs=count(event_window), by(company_id)
by company_id: gen estimation_window=1 if dif<-30 & dif>=-60
egen count_est_obs=count(estimation_window), by(company_id)
replace event_window=0 if event_window==.
replace estimation_window=0 if estimation_window==.

The procedure for determining the event and estimation windows is the same. First we create a variable that equals 1 if the observation is within the specified days. Second, we create another variable that counts how many observations, within each company_id, has a 1 assigned to it. Finally, we replace all the missing values with zeroes, creating a dummy variable. You can now determine which companies do not have a sufficient number of observations.

tab company_id if count_event_obs<5
tab company_id if count_est_obs<30

The "tab" will produce a list of company_ids that do not have enough observations within the event and estimation windows, as well as the total number of observations for those company_ids. To eliminate these companies:

 drop if count_event_obs < 5
 drop if count_est_obs < 30

You should make sure the dataset has been saved under a different name before dropping any observations!

At this point you can also drop some variables you won't need any longer: count_event_obs and count_est_obs.

Estimating Normal Performance

Now we are at the point where we can actually start an analysis. First we need a way to estimate Normal Performance. To do this, we will run a seperate regression for each company using the data within the estimation window and save the alphas (the intercept) and betas (the coefficient of the independent variable). We will later use these saved regression equations to predict normal performance during the event window.

Note that return, the dependent variable in our regression, is simply the CRSP variable for a given stock's return, while the independent variable vretd that we use to predict ret is the value-weighted return of an index for whatever exchange the stock trades on. Use the equivalent variables for your dataset.

set more off /* this command just keeps stata from pausing after each screen of output */

gen predicted_return=.
egen id=group(company_id) 
 /* for multiple event dates, use: egen id = group(group_id) */
forvalues i=1(1)N { /*note: replace N with the highest value of id */ 
	l id company_id if id==`i' & dif==0
	reg ret market_return if id==`i' & estimation_window==1 
	predict p if id==`i'
	replace predicted_return = p if id==`i' & event_window==1 
	drop p
}

Here, we created a variable "id" that numbers the companies from 1 to however many there are. The N is the number of company-event combinations that have complete data. This process iterates over the companies, runs a regression in the estimation window for each, and then uses that regression to predict a 'normal' return in the event window.

Abnormal and Cumulative Abnormal Returns

We can now calculate the abnormal and cumulative abnormal returns for our data. The daily abnormal return is computed by subtracting the predicted normal return from the actual return for each day in the event window. The sum of the abnormal returns over the event window is the cumulative abnormal return.

sort id date
gen abnormal_return=ret-predicted_return if event_window==1
by id: egen cumulative_abnormal_return = sum(abnormal_return)

Here we simply calculate the abnormal return for each observation in the event window. Then we set the cumulative abnormal return equal to the sum of the abnormal returns for each company.

Testing for Significance

We are going to compute a test statistic, test, to check whether the average abnormal return for each stock is statistically different from zero.*

TEST= ((ΣAR)/N) / (AR_SD/sqrt(N))

where AR is the abnormal return and AR_SD is the abnormal return standard deviation. If the absolute value of test is greater than 1.96, then the average abnormal return for that stock is significantly different from zero at the 5% level. The value of 1.96 comes from the standard normal distribution with a mean of 0 and a standard deviation of 1. 95% of the distribution is between ±1.96.

sort id date
by id: egen ar_sd = sd(abnormal_return) 
gen test =(1/sqrt(number of days in event window)) * ( cumulative_abnormal_return /ar_sd) 
list company_id cumulative_abnormal_return test if dif==0

Note: this test uses the sample standard deviation. A less conservative alternative is to use the population standard deviation. To derive this from the sample standard deviation produced by Stata, multiply ar_sd by the square root of n-1/n; in our example, by the square root of 4/5.

This will output the results of your event study into an Excel-readable spreadsheet file:

outsheet  company_id event_date cumulative_abnormal_return test using stats.csv if dif==0, comma names

Testing Across All Events

Instead of, or in addition to, looking at the average abnormal return for each company, you probably want to calculate the cumulative abnormal for all companies treated as a group. Here's the code for that:

reg cumulative_abnormal_return if dif==0, robust

The P-value on the constant from this regression will give you the significance of the cumulative abnormal return across all companies. This test preferable to a t-test because it allows you to use robust standard errors.

Note

* Thank you to Kim Baum for useful feedback

This page was last updated on May 20, 2008