SDSU

Math 121 - Calculus for Biology I
Spring Semester, 2009
Least Squares Examples

 © 2001, All Rights Reserved, SDSU & Joseph M. Mahaffy
San Diego State University -- This page last updated 02-Feb-09

 

 

Least Squares Examples

 

  1. Sum of Squares Error
  2. Least Squares Best Fit
  3. Juvenile Height with Sum of Squares Error

 

Example 1:

with y increasing with increasing x.

with y decreasing with increasing x.

 

 

 

Solution:

 

JA = e12 + e22 + e32 = 10.89

JB = e12 + e22 + e32 = 10.89

 

 

 

 

Example 2:

Solution:

The average of the x data values:

The slope a of the best fit line is calculated as follows:

 

The intercept b of the best fit line can then be calculated.

The equation of the best fit line is:

The sum of square errors with this model compared to the data is 8.167, which is lower than the sum of square errors from either Model A or Model B.

Note that since the best fit model shows y increasing with x, Researcher A actually has a more appropriate model than Researcher B. However, more data points are necessary in order to develop a more accurate model of the data.

You can also use Excel to find the best fit line.

 

 

 

 

 

 

Example 3: Often data sets have points that are clearly erroneous due to problems with the experiment (say contamination) or simply a poorly recorded value. If these points are included in the model, then they can result in misleading models.

We saw that growth rates are determined by the slope of a line from our example on juvenile height.

a. Consider the following data set:

t (weeks)

0

1

2

3

5

7

9

L(cm)

2.4

3.1

3.7

4.1

5.2

4.9

6.9

 

 

The least squares best fit to this data set is given by

L = 0.437t + 2.644

Determine the growth rate for this model and find the sum of squares error. Graph the data and the least squares best fit line.

b. Which point is most likely erroneous? When this point is removed, then the new least squares best fit model is given by

L = 0.492t + 2.594

Determine the growth rate for this model and find the sum of squares error for this model. What is the percent error (taking the growth rate from the model in Part b. as the actual one) between the computed growth rates?

Solution:

a. The growth rate is represented by the slope of the best fit line, or 0.437 cm/week. The sum of squares error is calculated as follows:

J (a, b) = e12 + e22 + e32 + e42 + e52 + e62 + e72, where:

e12 = (2.4 - 2.644)2 = 0.0595

e22 = [3.1 - (0.437 + 2.644)]2 = 0.0004

e32 = [3.7 - (0.874 + 2.644)]2 = 0.0331

e42 = [4.1 - (1.311 + 2.644)]2 = 0.0210

e52 = [5.2 - (2.185 + 2.644)]2 = 0.1376

e62 = [4.9 - (3.059 + 2.644)]2 = 0.6448

e72 = [6.9 - (3.933 + 2.644)]2 = 0.1043

So the sum of squares error J =1.0008.

b. From the squares of the errors calculated above, the point with the most error is (7, 4.9), or the second to last point in the data table. Eliminating this point from the data set yields a new best fit line, and a smaller sum of squares error, as shown below.

L = 0.492t + 2.594

J(a, b) = 0.0376 + 0.0002 + 0.0149 + 0.0009 + 0.0213 + 0.0149 = 0.0898,

which is only 9% of the sum of squares error from Part a.

Percent error is calculated as follows:

 

If the new best fit growth rate is assumed to be the theoretical value, and the old best fit growth rate is the experimental value, the percent error is