Princeton University Data and Statistical 
Services Princeton University Library

Search DSS





Finding Data Analyzing Data About Us

DSS lab consultation schedule
(Monday-Friday)
Sep 1-Nov 2By appt.
Nov 4-Dec 13Walk-in, 2-5 pm*
Dec 14-Jan 5By appt.
Jan 6-Feb 2Walk-in, 2-5 pm*
Feb 3-May 4Walk-in, 1-5 pm*
May 5-May 13Walk-in, 2-5 pm*
May 14-Aug 31By appt.
For quick questions email data@princeton.edu.
*No appts. necessary during walk-in hrs.
Note: the DSS lab is open as long as Firestone is open, no appointments necessary to use the lab computers for your own analysis.

Follow DssData on Twitter
See DSS on Facebook

Log transformations

If the distribution of a variable has a positive skew, taking a natural logarithm of the variable sometimes helps fitting the variable into a model. Log transformations make positively skewed distribution more normal. Also, when a change in the dependent variable is related with percentage change in an independent variable, or vice versa, the relationship is better modeled by taking the natural log of either or both of the variables.

For example, I estimate person's wage based on one's education, experience, and region of residence using Stata's sample data nlsw88, an extract from 1988 National Logitudinal Study of Young Women.

		  sysuse nlsw88
		  reg wage grade tenure south
		  

It looks ok, but when I look at the distribution of tenure, it looks somewhat skewed.

histogram tenure

So I compute a natural log of tenure.

		  gen lntenure=ln(tenure)
		  histogram lntenure
		  

It seems to have overshot a little, but looks somewhat normal. I try a regression with the logged tenure.

The R-squared has gotten a little higher, so taking the natural log seems to have helped to fit it the model better.

When the independent variable but not the dependent variable is logged, one percent change in the independent variable is associated with 1/100 times the coefficient change in the dependent variable.

predicted wage = -1.639+0.681GRADE+0.774LNTENURE-1.134SOUTH

So one percent increase in tenure is associated with an increase in the wage of 0.01x0.774 or about $0.0077.

Now I examine the wage, and find that it is very skewed.

histogram wage

So I take a natural log of wage, and look at the distribution of logged wage.

		  gen lnwage=ln(wage)
		  histogram lnwage
		  

The distribution looks much more normal. Now I run the same regression with the logged wage as the dependent variable.

reg lnwage grade tenure south

When the dependent variable but not an independent variable is logged, a one-unit change in the independent variable is associated with a 100 times the coefficient percent change in the dependent variable.

predicted lnwage=0.666+0.085GRADE+0.026TENURE-0.150SOUTH

In this data, tenure is measured in years: so, one year increase in tenure increases the wage by 100x0.026 % or about 2.6%.

If we logged both the dependent and an independent variables, then we are looking at elasticity: percentage change in X results in percentage change in Y.

predicted lnwage = 0.659 + 0.084GRADE+0.136LNTENURE-0.151SOUTH

One percent increase in tenure is estimated to result in about 0.136 % increase in wage.

This page was last updated on August 28, 2008