The Correlation Coefficient (r)

The sample correlation coefficient (r) is a measure of the closeness of association of the point out in a scatter plot to a straight regression line based on those points, as in the example above for collected saving end time. Possible values the the correlation coefficient variety from -1 to +1, with -1 indicating a perfectly direct negative, i.e., inverse, correlation (sloping downward) and also +1 indicating a perfectly direct positive correlation (sloping upward).

You are watching: Describe the range of values for the correlation coefficient.

*

A correlation coefficient close to 0 suggests little, if any, correlation. The scatter plot suggests that measure up of IQ execute not adjust with enhancing age, i.e., over there is no evidence that IQ is linked with age.

*

Calculation the the Correlation Coefficient

The equations below show the calculations sed to compute "r". However, you carry out not should remember these equations. Us will use R to do these calculations for us. Nevertheless, the equations provide a sense of how "r" is computed.

*

where Cov(X,Y) is the covariance, i.e., how much each it was observed (X,Y) pair is from the typical of X and the average of Y, simultaneously, and also and sx2 and also sy2 room the sample variances for X and Y.

. Cov (X,Y) is computed as:

*

You don"t need to memorize or use these equations for hand calculations. Instead, we will usage R to calculate correlation coefficients. Because that example, we might use the following command come compute the correlation coefficient because that AGE and TOTCHOL in a subset the the Framingham Heart examine as follows:

> cor(AGE,TOTCHOL)<1> 0.2917043

Describing Correlation Coefficients

The table listed below provides some guidelines for just how to define the stamin of correlation coefficients, but these are just guidelines because that description. Also, save in psychic that also weak correlations deserve to be statistically significant, as you will find out shortly.

Correlation Coefficient (r)Description(Rough guideline )
+1.0Perfect hopeful + association
+0.8 to 1.0Very solid + association
+0.6 come 0.8Strong + association
+0.4 come 0.6Moderate + association
+0.2 come 0.4Weak + association
0.0 come +0.2Very weak + or no association
0.0 to -0.2Very weak - or no association
-0.2 to – 0.4Weak - association
-0.4 come -0.6Moderate - association
-0.6 to -0.8Strong - association
-0.8 to -1.0Very strong - association
-1.0Perfect negative association

The 4 images listed below give one idea of just how some correlation coefficients can look top top a scatter plot.

*

The scatter plot below illustrates the relationship between systolic blood pressure and also age in a huge number that subjects. It suggests a weak (r=0.36), but statistically far-ranging (p

Beware that Non-Linear Relationships

Many relationships between measurement variables are fairly linear, however others space not because that example, the image listed below indicates that the risk of fatality is no linearly correlated with body mass index. Instead, this type of partnership is often explained as "U-shaped" or "J-shaped," since the worth of the Y-variable at first decreases with boosts in X, but with further increases in X, the Y-variable boosts substantially. The relationship between alcohol consumption and mortality is also "J-shaped."

*

Source: Calle EE, et al.: N Engl J Med 1999; 341:1097-1105

A simple way to evaluate even if it is a connection is reasonably straight is to study a scatter plot. Come illustrate, look at the scatter plot below of elevation (in inches) and body load (in pounds) making use of data indigenous the Weymouth wellness Survey in 2004. R was used to produce the scatter plot and compute the correlation coefficient.

weyattach(wey) plot(hgt_inch,weight)cor(hgt_inch,weight)<1> 0.5653241

*

There is fairly a lot of scatter, and the huge number of data points provides it complicated to totally evaluate the correlation, but the tendency is sensibly linear. The correlation coefficient is +0.56.

See more: Edible Bubbles Recipe - Edible Candy Bubbles Recipe

Beware that Outliers

Note additionally in the plot above that there are two individuals with evident heights the 88 and 99 inches. A elevation of 88 inch (7 feet 3 inches) is plausible, yet unlikely, and also a height of 99 inches is certainly a coding error. Obvious coding errors have to be excluded indigenous the analysis, due to the fact that they can have one inordinate effect on the results. It"s always a good idea to look in ~ the raw data in order come identify any gross mistake in coding.