Search DSS

Finding Data Analyzing Data Citing data

DSS lab consultation schedule
(Monday-Friday)
 Sep 1-Nov 3 By appt. here Nov 6-Dec 15 Walk-in, 2-5 pm* Dec 18-Feb 2 By appt. here Feb 5-May 4 Walk-in, 1-5 pm* May 7-May 15 Walk-in, 2-5 pm* May 16-Aug 31 By appt. here
For quick questions email data@princeton.edu.
*No appts. necessary during walk-in hrs.
Note: the DSS lab is open as long as Firestone is open, no appointments necessary to use the lab computers for your own analysis.

# Log transformations

If the distribution of a variable has a positive skew, taking a natural logarithm of the variable sometimes helps fitting the variable into a model. Log transformations make positively skewed distribution more normal. Also, when a change in the dependent variable is related with percentage change in an independent variable, or vice versa, the relationship is better modeled by taking the natural log of either or both of the variables.

For example, I estimate person's wage based on one's education, experience, and region of residence using Stata's sample data nlsw88, an extract from 1988 National Logitudinal Study of Young Women.

```		  sysuse nlsw88
```

It looks ok, but when I look at the distribution of tenure, it looks somewhat skewed.

`histogram tenure`

So I compute a natural log of tenure.

```		  gen lntenure=ln(tenure)
histogram lntenure
```

It seems to have overshot a little, but looks somewhat normal. I try a regression with the logged tenure.

The R-squared has gotten a little higher, so taking the natural log seems to have helped to fit it the model better.

When the independent variable but not the dependent variable is logged, one percent change in the independent variable is associated with 1/100 times the coefficient change in the dependent variable.

So one percent increase in tenure is associated with an increase in the wage of 0.01x0.774 or about \$0.0077.

Now I examine the wage, and find that it is very skewed.

`histogram wage`

So I take a natural log of wage, and look at the distribution of logged wage.

```		  gen lnwage=ln(wage)
histogram lnwage
```

The distribution looks much more normal. Now I run the same regression with the logged wage as the dependent variable.

`reg lnwage grade tenure south`

When the dependent variable but not an independent variable is logged, a one-unit change in the independent variable is associated with a 100 times the coefficient percent change in the dependent variable.