Search DSS |
Log transformationsIf the distribution of a variable has a positive skew, taking a natural logarithm of the variable sometimes helps fitting the variable into a model. Log transformations make positively skewed distribution more normal. Also, when a change in the dependent variable is related with percentage change in an independent variable, or vice versa, the relationship is better modeled by taking the natural log of either or both of the variables. For example, I estimate person's wage based on one's education, experience, and region of residence using Stata's sample data nlsw88, an extract from 1988 National Logitudinal Study of Young Women. sysuse nlsw88 reg wage grade tenure south It looks ok, but when I look at the distribution of tenure, it looks somewhat skewed. histogram tenure So I compute a natural log of tenure. gen lntenure=ln(tenure) histogram lntenure It seems to have overshot a little, but looks somewhat normal. I try a regression with the logged tenure. The R-squared has gotten a little higher, so taking the natural log seems to have helped to fit it the model better. When the independent variable but not the dependent variable is logged, one percent change in the independent variable is associated with 1/100 times the coefficient change in the dependent variable. predicted wage = -1.639+0.681GRADE+0.774LNTENURE-1.134SOUTH So one percent increase in tenure is associated with an increase in the wage of 0.01x0.774 or about $0.0077. Now I examine the wage, and find that it is very skewed. histogram wage So I take a natural log of wage, and look at the distribution of logged wage. gen lnwage=ln(wage) histogram lnwage The distribution looks much more normal. Now I run the same regression with the logged wage as the dependent variable. reg lnwage grade tenure south When the dependent variable but not an independent variable is logged, a one-unit change in the independent variable is associated with a 100 times the coefficient percent change in the dependent variable. predicted lnwage=0.666+0.085GRADE+0.026TENURE-0.150SOUTH In this data, tenure is measured in years: so, one year increase in tenure increases the wage by 100x0.026 % or about 2.6%. If we logged both the dependent and an independent variables, then we are looking at elasticity: percentage change in X results in percentage change in Y. predicted lnwage = 0.659 + 0.084GRADE+0.136LNTENURE-0.151SOUTH One percent increase in tenure is estimated to result in about 0.136 % increase in wage. |