Stat Trek Teach yourself statistics

Tutorials

AP Statistics

Stat Tables

Stat Tools

Calculators

Books

Help

Overview of tutorials | Advanced placement statistics | Introduction to probability and statistics | Matrix Algebra

AP Statistics Tutorial

Exploring Data

Planning a Study

Anticipating Patterns

Statistical Inference

Appendices

*	AP and Advanced Placement Program are registered trademarks of the College Board, which was not involved in the production of, and does not endorse this web site.

AP Statistics: Transformations to Achieve Linearity

When a residual plot reveals a data set to be nonlinear, it is often possible to "transform" the raw data to make it linear. This allows us to use linear regression techniques appropriately with nonlinear data.

What is a Transformation to Achieve Linearity?

Transforming a variable involves using a mathematical operation to change its measurement scale. Broadly speaking, there are two kinds of transformations.

Linear transformation. A linear transformation preserves linear relationships between variables. Therefore, the correlation between x and y would be unchanged after a linear transformation. Examples of a linear transformation to variable x would be multiplying x by a constant, dividing x by a constant, or adding a constant to x.
Nonlinear tranformation. A nonlinear transformation changes (increases or decreases) linear relationships between variables and, thus, changes the correlation between variables. Examples of a nonlinear transformation of variable x would be taking the square root of x or the reciprocal of x.

In regression, a transformation to achieve linearity is a special kind of nonlinear transformation. It is a nonlinear transformation that increases the linear relationship between two variables.

Methods of Transforming Variables to Achieve Linearity

There are many ways to transform variables to achieve linearity for regression analysis. Some common methods are summarized below.

Method	Transformation(s)	Regression equation	Predicted value (ŷ)
Standard linear regression	None	y = b₀ + b₁x	ŷ = b₀ + b₁x
Exponential model	Dependent variable = log(y)	log(y) = b₀ + b₁x	ŷ = 10^{b₀ + b₁x}
Quadratic model	Dependent variable = sqrt(y)	sqrt(y) = b₀ + b₁x	ŷ = ( = b₀ + b₁x )²
Reciprocal model	Dependent variable = 1/y	1/y = b₀ + b₁x	ŷ = 1 / ( b₀ + b₁x )
Logarithmic model	Independent variable = log(x)	y= b₀ + b₁log(x)	ŷ = b₀ + b₁log(x)
Power model	Dependent variable = log(y) Independent variable = log(x)	log(y)= b₀ + b₁log(x)	ŷ = 10^{b₀ + b₁log(x)}

Each row shows a different nonlinear transformation method. The second column shows the specific transformation applied to dependent and/or independent variables. The third column shows the regression equation used in the analysis. And the last column shows the "back transformation" equation used to restore the dependent variable to its original, non-transformed measurement scale.

In practice, these methods need to be tested on the data to which they are applied to be sure that they increase rather than decrease the linearity of the relationship. Testing the effect of a transformation method involves looking at residual plots and correlation coefficients, as described in the following sections.

Note: The logarithmic model and the power model require the ability to work with logarithms. Use a graphic calculator to obtain the log of a number or to transform back from the logarithm to the original number. If you need it, the Stat Trek glossary has a brief refresher on logarithms.

How to Perform a Transformation to Achieve Linearity

Transforming a data set to achieve linearity is a multi-step, trial-and-error process.

Choose a transformation method (see above table).
Transform the independent variable, dependent variable, or both.
Plot the independent variable against the dependent variable, using the transformed data.
- If the scatterplot is linear, proceed to the next step.
- If the plot is not linear, return to Step 1 and try a different approach. Choose a different transformation method and/or transform a different variable.
Conduct a regression analysis, using the transformed variables.
Create a residual plot, based on regression results.
- If the residual plot shows a random pattern, the transformation was successful. Congratulations!
- If the plot pattern is not random, return to Step 1 and try a different approach.

The best tranformation method (exponential model, quadratic model, reciprocal model, etc.) will depend on nature of the original data. The only way to determine which method is best is to try each and compare the result (i.e., residual plots, correlation coefficients).

A Transformation Example

Below, the table on the left shows data for independent and dependent variables - x and y, respectively. When we apply a linear regression to the raw data, the residual plot shows a non-random pattern (a U-shaped curve), which suggests that the data are nonlinear.

x	1	2	3	4	5	6	7	8	9
y	2	1	6	14	15	30	40	74	75

Suppose we repeat the analysis, using a quadratic model to transform the dependent variable. For a quadratic model, we use the square root of y, rather than y, as the dependent variable. The table below shows the data we analyzed.

x	1	2	3	4	5	6	7	8	9
y	1.14	1.00	2.45	3.74	3.87	5.48	6.32	8.60	8.66

The residual plot (above right) suggests that the transformation to achieve linearity was successful. The pattern of residuals is random, suggesting that the relationship between the independent variable (x) and the transformed dependent variable (square root of y) is linear. And the coefficient of determination was 0.96 with the transformed data versus only 0.88 with the raw data. The transformed data resulted in a better model.

Test Your Understanding of This Lesson

Problem

In the context of regression analysis, which of the following statements is true?

I. A linear transformation increases the linear relationship between variables.
II. A logarithmic model is the most effective transformation method.
III. A residual plot reveals departures from linearity.

(A) I only
(B) II only
(C) III only
(D) I and II only
(E) I, II, and III

Solution

The correct answer is (C). A linear transformation neither increases nor decreases the linear relationship between variables; it preserves the relationship. A nonlinear transformation is used to increase the relationship between variables. The most effective transformation method depends on the data being transformed. In some cases, a logarithmic model may be more effective than other methods; but it other cases it may be less effective. Non-random patterns in a residual plot suggest a departure from linearity in the data being plotted.