Step-by-Step Instructions
Gather Your Paired Data
First, identify and list your paired data points (x, y). Ensure that each pair corresponds to the same subject or observation.
Calculate the Means of X and Y
Calculate the mean of your x values and the mean of your y values. These will be used to find the deviations from the means.
Calculate Deviations and Their Products
For each data point, calculate the deviation from the mean for both x and y, and then calculate the product of these deviations. Sum these products.
Calculate Squared Deviations
Calculate the squared deviation for each x and y value from their respective means. Sum these squared deviations separately for x and y.
Apply the Pearson Correlation Formula
Plug the summed products of deviations and the summed squared deviations into the Pearson correlation coefficient formula to find the correlation coefficient (r).
Interpret Your Results
Determine the strength and direction of the linear relationship between your variables based on the value of r. Values close to 1 or -1 indicate strong linear relationships, while values close to 0 indicate weak linear relationships.
Introduction to Pearson Correlation
The Pearson correlation coefficient is a measure of the linear relationship between two variables. It has a value between +1 and −1, where 1 is total positive linear correlation, 0 is no linear correlation, and −1 is total negative linear correlation.
Understanding the Formula
The Pearson correlation coefficient (r) is calculated using the following formula: [ r = rac{\sum{(x_i - ar{x})(y_i - ar{y})}}{\sqrt{\sum{(x_i - ar{x})^2} \cdot \sum{(y_i - ar{y})^2}}} ] where (x_i) and (y_i) are individual data points, (ar{x}) and (ar{y}) are the means of the datasets.
Worked Example
Let's calculate the Pearson correlation coefficient for the following paired data:
| x | y |
|---|---|
| 1 | 2 |
| 2 | 3 |
| 3 | 5 |
| 4 | 7 |
| 5 | 8 |
First, calculate the means of x and y: [ ar{x} = rac{1 + 2 + 3 + 4 + 5}{5} = 3 ] [ ar{y} = rac{2 + 3 + 5 + 7 + 8}{5} = 5 ]
Then, calculate the deviations from the means and their products:
| x | y | (x_i - ar{x}) | (y_i - ar{y}) | ((x_i - ar{x})(y_i - ar{y})) |
|---|---|---|---|---|
| 1 | 2 | -2 | -3 | 6 |
| 2 | 3 | -1 | -2 | 2 |
| 3 | 5 | 0 | 0 | 0 |
| 4 | 7 | 1 | 2 | 2 |
| 5 | 8 | 2 | 3 | 6 |
[ \sum{(x_i - ar{x})(y_i - ar{y})} = 6 + 2 + 0 + 2 + 6 = 16 ]
Next, calculate the squared deviations:
| x | y | (x_i - ar{x}) | (y_i - ar{y}) | ((x_i - ar{x})^2) | ((y_i - ar{y})^2) |
|---|---|---|---|---|---|
| 1 | 2 | -2 | -3 | 4 | 9 |
| 2 | 3 | -1 | -2 | 1 | 4 |
| 3 | 5 | 0 | 0 | 0 | 0 |
| 4 | 7 | 1 | 2 | 1 | 4 |
| 5 | 8 | 2 | 3 | 4 | 9 |
[ \sum{(x_i - ar{x})^2} = 4 + 1 + 0 + 1 + 4 = 10 ] [ \sum{(y_i - ar{y})^2} = 9 + 4 + 0 + 4 + 9 = 26 ]
Finally, calculate the Pearson correlation coefficient: [ r = rac{16}{\sqrt{10 \cdot 26}} = rac{16}{\sqrt{260}} \approx rac{16}{16.12} \approx 0.993 ]
Common Mistakes to Avoid
- Forgetting to calculate the means of the datasets before proceeding with the formula.
- Incorrectly calculating the deviations from the means or their products.
- Not squaring the deviations when calculating the denominator of the formula.
When to Use a Calculator
For large datasets, using a calculator or statistical software is highly recommended to avoid manual calculation errors and to speed up the process. Most graphing calculators and statistical software packages have built-in functions to calculate the Pearson correlation coefficient.