
Samantha H.
asked 12/08/20Need help putting the equation in the form of ŷ = a + bx
"The maximum discount value of the Entertainment® card for the "Fine Dining" section for various pages is given in the table below"
Page numberMaximum value ($) | |
4 | 17 |
14 | 18 |
25 | 15 |
32 | 18 |
43 | 20 |
57 | 15 |
72 | 15 |
85 | 16 |
90 | 16 |
Let page number be the independent variable and maximum value be the dependent variable.
- I need help calculating the least-squares line, I keep getting it wrong so if you can show me how to solve step by step that would be great (I must put the equation in the form of: ŷ = a + bx and need to round my answers to three decimal places)
ŷ = ? + ? x
1 Expert Answer

Patrick B. answered 12/08/20
Math and computer tutor/teacher
First off, you need, at most, a SEVEN (7) column table***
the first column is X
the 2nd column is Y
the 3rd column is X^2
the 4th column is XY
the 5th column is the Line Estimate = M*x+B
the 6th column is the ERROR = y - Line Estimate
the 7th column is Y^2 <--- ***this is for the calculation of
the R-coefficient; more on that
later; if you are not interested
in the R-coefficient, then this
column is not needed, which means
you only need 6 columns
Next, you put the stats for the independent variable
(the page number) in column X and the price of each
corresponding in column Y...
so the ordered pairs (x,y) in each row are (4,17),(14,18),
(25,15), (32,18), ... (90,16); so yes, there are N=9 rows of data
Now, Here are the formulas that I TEACH !!!
That is, when I TEACH STATISTICS, THESE are the formulae I use
for linear , least squares regression, so as to calculate the
SLOPE and the INTERCEPT of the trend line...
Denominator D = (# of rows) * (total of X^2) - (total of X)^2
note here the location and setting of the parenthesis....
in the first term (total of X^2) is the total of the 3rd
column labeled X^2... while in the second term of this
formula (total of X)^2 is the total of the first column
labeled X, AND THEN SQUARES THAT TOTAL !!! Do not confuse
these two values!!!
As you can see in the following formulae, this same value appears
in the denominator of both of the formulae for slope M and intercept B.
slope M = [(# of rows) * (total of XY) - (total of X)*(total of Y)] / D
intercept B = [ (total of Y)*(total of X^2) - (total of X)*(total of XY)]/ D
Once found, you WRITE the equation of the trend line y = mx+b, as you have asked.
This in turn, gets you the LINE ESTIMATE, that is where the LINE
is for each value of X.
The ERROR then is the vertical distance from the data point to the line for each x,
hence Y - Line Estimate = Y - (mx+b)
Yes, the total of the errors should be some value CLOSE to ZER0, at least within
order 10^(-N) for some HIGH degree of accuracy N digits; in other words, the total
of Y should BALANCE against the error and the line estimates
I have uploaded a spreadsheet that I REGULARLY use to perform all
of these calculations, and it includes a picture/graph as well as
the aforementioned R-coefficient which gives you a RATING of how
well the data fits a linear model. The more that R-coefficient
is closer to 1 or -1, the better. On the other hand,
if the R-coefficient is closer to 1/2 or -1/2, it is NOT GOOD LINEAR FIT!
The formula for the R-coefficient is MUCH more unfriendly, and for now
deferred. You can google it, look it up online or in wikipedia, and/or
even check it out from the formula in the spreadsheet. (Not for the weak at heart)
Here are the results, as shown in the spreadsheet for this data set:
N = 9 <--- there are 9 data points
D = 69128
M = -0.021785673 <--- the slope is practically zero,
so the trend line is almost horizontal
B = 17.68817267
the equation of the best fit trend line is then y = -0.021785673 x + 17.68817267
the R-Coefficient is -0.389736518, so the linear model is NOT a good fit
for this data. This is consistent with the picture, as the price fluctuates
between 15 and 18 with a spike in the middle at 20. Its more of a polynomial
or perhaps trigonometric/sinusoidal curve.
Finally, I would recommend that you study how these formulae
work with a SMALLER data set and simpler stats, so you can
see a little better how they work. For that reason, I will
conclude this mini-lesson with a VERY SIMPLE example
that runs just a few data points through these formulae, and then let you explore
your particular posted problem via the spreadsheet...sometimes LESS is MORE
Given the data set of N=4 data points:
(2,5); (5,11); (7,15); (10,21) <-- y = 2x+1
line estimate and error columns are deferred until we get
the slope and the intercept...
X Y X^2 XY
-----------------------------
2 5 4 10
5 11 25 55
7 15 49 105
10 21 100 210
-------------------------------
24 52 178 380 <---- totals
plugs these totals into the formulae above....
D = 4*178 - 24^2 = 136
M = [ 4 * 380 - 24 * 52 ] / 136
= [ 1520 - 1248]/126
= 272/136= 2
B = [ 52 * 178 - 24 * 380]/136
= 136/136 = 1
so yeah, y = 2x+1
extending the table for the line estimates and errors...
X Y X^2 XY line est = mx+b error=y - line est.
---------------------------------------------------------------------
2 5 4 10 5 0
5 11 25 55 11 0
7 15 49 105 15 0
10 21 100 210 21 0
---------------------------------------------------------------------
24 52 178 380 52 0
Samantha H.
Thank you for showing your work!!12/09/20
Still looking for help? Get the right answer, fast.
Get a free answer to a quick problem.
Most questions answered within 4 hours.
OR
Choose an expert and meet online. No packages or subscriptions, pay only for the time you need.
Patrick B.
y = -0.021785673 x + 17.68817267 is the SHORT Answer I have uploaded the spreadsheet in the RESOURCES section under this link12/08/20