Center Home -> Content Areas Home -> Math Home -> Project Activities -> Fathom Activities ->

 Conceptualizing Correlation and Regression Equations

Activity Description

Activity Guide

 

Part 1: Making Sense of the Dynamic Environment and the Calculated Values

Note: The formula we are exploring for Correlation Coefficient is: 

  • Open the Fathom file called CorrelParts.ftm

  • Examine Collection 1. The first two columns are the numerical values of the data points shown in the scatterplot. If you change any of the numerical values in the table, the graph will update. Likewise, if you move one of the data points in the scatterplot, the numerical values in the table will change accordingly. The rest of the columns in Collection 1 have been calculated based on the X and Y values and are needed calculations for the formula for r above. 

  • The next two columns represent the deviations of each X and Y value from the respective means. To help make sense of these deviations graphically, plot a horizontal line (Plot function) to indicate the Ymean and a vertical line (Plot value) to indicate the Xmean. 

To Plot a Function:
  • Select the graph. Under the Graph menu, select Plot Function. The Expression for Function pop-up box will appear.
  • Type the right hand expression for the function. In this case we are plotting the function y=mean(y). Thus you need to type in mean(y).
  • Click on OK. The function should appear in the graph window.

To Plot a Value:
  • Select the graph. Under the Graph menu, select Plot Value. The Expression for Value pop-up box will appear.
  • Type the expression for the relation. In this case we are plotting the relation x=mean(x). Thus you need to type in mean(x).
  • Click on OK. The vertical line should appear in the graph window.
  • What data point is represented at the intersection of the vertical and horizontal lines?
    • What is the significance of this data point?
    • Is it possible that all points can be on one side of either   or  ? Why or why not?
    • Can nine of the 10 data points be on one side of either  or  ? Why or why not?
    • Can the data be changed in such a way that nine of the data points lie in the new “third quadrant” with the last point in the “first quadrant”? What would this result say about the mean?

  • DoubleClick the Collection 1 icon to display calculated measures. These measures are based on the calculated values in collection 1 and correspond to parts of the formula for r. Make sense of what these measures represent before proceeding.

Part 2: Making Sense of Correlation

Have students work in pairs to do and answer the following questions. 

  • Move the points on the graph so they are approximately on a line with positive slope. 

    • What do you notice about the magnitude and sign of the Xdeviations and Ydeviations??

    • What do you notice about the magnitude and sign of the XdevSquared and YdevSquared?

    • How are these values influencing the value of r?

    • Click on the Graph. Then under the Graph menu, choose Least-Squares Line. Does the equation show a positive slope? 

  • Move the points so they are approximately on a line with negative slope. 

    • What do you notice about the magnitude and sign of the Xdeviations and Ydeviations?

    • What do you notice about the magnitude and sign of the XdevSquared and YdevSquared?

    • How are these values influencing the value of r?

  • Move the points so they appear to have no association.

    • What do you notice about the magnitude and sign of the Xdeviations and Ydeviations?

    • What do you notice about the magnitude and sign of the XdevSquared and YdevSquared?

    • How are these values influencing the value of r?

  • Place the points in a positive linear trend. Drag one of the points on the graph so that it is clearly an outlier. Observe the effects on the regression line and the value of r. 

    • Based on the formula for r, describe why the value of r is affected so greatly by an outlier.

    • Pick up the outlier point and drag it to different locations on the graph. Find three different locations of an outlier that cause the regression line to drastically change. Where did you have to place the outlier for this effect? Why does this make sense?

    • Make sense of the effects of an outlier. Reason from the formula for r and slope as well as the calculated measures.

  • In the lower left corner of the coordinate plane, place 9 of your points in a “cloud” that appears to have no trend. Then move one point to the upper right corner.

    • Is this scatterplot linear?

    • What is the value of r? Reason from the formula for r and the displayed measures to make sense of this value.

    • Find two other sets of data points that give a high r value but show no linear trend.

    • Does a high r value necessarily mean that the data are generally linear?

    • Does a low r value necessarily mean that the data are NOT generally linear?

  • Place your points in a nearly horizontal line on the graph. What is the value of r? Why?

  • Place your points in a nearly vertical line on the graph. What is the value of r? Why?

  • Based on how the formula for r is computed, why do you think the values of r are constrained between –1 and 1?

Part 3: Conceptualizing the Regression Line

  • Throughout the investigation in Part 2, what did you notice about the relationship between the data point and the regression line?

  • How is correlation coefficient r related to the slope of the regression equation? Is the value of r the same as the slope of the line? Does an r value of 1 imply a y=x relationship? Why or why or not?

  • Recall that the formula for slope of a regression line can be expressed as . Thus, by calculating the means and standard deviations for X values and Y values in a data set, as well as r, one can derive the line of best fit using algebraic techniques. Test this procedure with 10 data points.

Part 4: Suggested Data Exploration

A biology student noticed that crickets seemed to chirp faster in the summer than in the spring or fall.  Her grandmother had always told her that she could determine the temperature by listening to the crickets.  Over the next season she counted the chirps per minute of a cricket and recorded the temperature.  Her data is provided in the table below.

  • Input these values into Fathom as the 10 data points.

Chirps (per minute)

Temperature (Fahrenheit)

67

54

75

55

83

58

91

58

99

60

119

67

134

69

140

70

149

74

164

77

  • Find a mathematical model that the student can use to estimate the temperature by listening to the crickets.

  • Interpret the r-value for this data set.

  • Interpret the slope and y-intercept in terms of the phenomenon.

  • Explain how this model could by used to estimate the temperature quickly by counting chirps for only 15 seconds.

  • If you wanted to describe mathematically the relationship between temperature and cricket chirps, which variable is more appropriate to consider as the dependent variable?  Is this the same variable that you treated as the dependent variable?  If not, find a new model.  Interpret the slope and y-intercept. 
               


Back to Project Activities | Back to Math Homepage
Send questions or comments here.
Last modified on June 14, 2002.