Back to chapter

11.3:

Calculating and Interpreting the Linear Correlation Coefficient

JoVE Core
Statistics
A subscription to JoVE is required to view this content.  Sign in or start your free trial.
JoVE Core Statistics
Calculating and Interpreting the Linear Correlation Coefficient

Languages

Share

Consider the data set of carbon dioxide levels versus the annual temperature over a specific period. The scatter plot of the data points shows a probable linear pattern between the two variables.

To confirm a straight-line pattern, the linear correlation coefficient, r, is calculated.

First, x square, y square, and the product of x and y are determined and then added. The number of data points is 7.

From these values, the coefficient of correlation is calculated.

The meaning of the correlation coefficient value can be interpreted using the critical value table.

At a significance level of 0.05, and n equals 7, the critical value comes out to be 0.754.

Since the modulus of r is more than the critical value, there is sufficient evidence to support the conclusion that there is a linear correlation between the variables.

The r square value indicates that 76.2% of the variation in annual temperature can be explained by the linear relationship between carbon dioxide levels and annual temperature.

11.3:

Calculating and Interpreting the Linear Correlation Coefficient

The correlation coefficient, r, developed by Karl Pearson in the early 1900s, is numerical and provides a measure of strength and direction of the linear association between the independent variable, x, and the dependent variable, y. Hence, it is also known as the Pearson product-moment correlation coefficient. It can be calculated using the following equation:

Equation1

where n = the number of data points.

The 95% critical values of the sample correlation coefficient table can be used to give you a good idea of whether the computed value of r is significant or not. Compare r to the appropriate critical value in the table. If r is not between the positive and negative critical values, then the correlation coefficient is significant. If r is significant, then you may want to use the line for prediction.

The Coefficient of Determination

The variable r2 is called the coefficient of determination and is the square of the correlation coefficient but is usually stated as a percent rather than in decimal form. It has an interpretation in the context of the data:

r2, when expressed as a percent, represents the percent of variation in the dependent (predicted) variable y that can be explained by variation in the independent (explanatory) variable x using the regression (best-fit) line.

1 – r2, when expressed as a percentage, represents the percent of the variation in y that is NOT explained by variation in x using the regression line. This can be seen as the scattering of the observed data points about the regression line.

This text is adapted from Openstax, Introductory Statistics, Section 12.3 The Regression Equation