Method of least squares

Question

A magazine reported on a study of the reliability of a commercial kit to test for arsenic in groundwater. The field kit was used to test a sample of 20 groundwater wells in a country. In addition to the arsenic level (micrograms per liter) the latitude (degrees) longitude (degrees) and depth (feet) of each well was measured Data table:
Well_ID   Latitude   Longitude   Depth     Arsenic1             23.7714   90.6221       30          782             23.7764   90.6492       60          653             23.7832   90.6321      125        1924             23.7944   90.6386        30          215             23.7644   90.6224      100         1876             23.7916   90.6228      100         1877             23.7674   90.6225        45         1238             23.7694   90.6176        90         2859             23.7732   90.6147        75         18210           23.7894   90.6482       125         29211           23.7852   90.6215         75         18212           23.7856   90.6207         45           7613           23.7745   90.6496         90           5514           23.7479   90.6387         60         17915           23.7734   90.6487         90         18516           23.7681   90.6463         90         28517           23.7754   90.6452         90         18518           23.7881   90.6489        125        19219           23.7872   90.6218          30          5520           23.7818   90.6243        125         192
 
 
b.Fit the model to the data using the method of least squares. Let x1 be the latitudes, x2 be the longitudes, x3 be the depthsAnswer. y= (?) +(?) x1 (?) x2+(?) x3  Roundthe constant term and the coefficients for x1 and x2. Round the coefficient for x3 to the nearest hundredth as needed

Kenneth G. · Accepted Answer

I am not sure what level you are from the question as asked.  The methods presented so far have been sort of down-in-the-ditches, concentrating on getting the results rather than understanding the problem.  
 
The problem states "Fit the data using the method of least squares".  It does not say to fit the model using Excel, RStudio, SPSS, or any other software product.  Be also careful because Excel computations for this kind of problem have been called into question.  
 
If your goal is to just "get the answer", then definitely use software to get the answer.  But if you want to understand least squares, then you should work it mostly by hand with only a little help from a calculator.
 
It's important to understand the framework of Statistics.  Statisticians have spent many years studying the various assumptions, and yet it's easy for a careless researcher to misuse Multiple Regression to prove hypotheses which are not valid.
 
Each Multiple Regression assumes a particular mathematical model.  If the model is not as expected, the results will not be reliable and things will go very wrong. A regression model can be linear or non-linear, and can have a number of odd twists if certain assumptions are not satisfied, or if the wrong independent variables are in the model, or if the independent variables are correlated, or if the errors are correlated.   
 
I guess that the person who wrote the problem is assuming a linear model, as specified in the assumptions described here: http://www.acastat.com/Statbook/molsassumptions.htm.  One needs to continually validate assumptions.
 
If the assumptions for a Multiple Regression with a linear model are satisfied, then the general equation for Multiple Regression with a dependent variable Y and a matrix of coefficients of the dependent to independent variables X is Y = Xβ + u, where u is a vector of the error values for the Y observations.  
 
The general solution to a Multiple Regression problem in the linear case where β-hat is the estimate of β and XT  is the transpose of the matrix X is 
 
                                        β-hat = (XTX)-1XTY   
 
This general solution is obtained by taking the derivative of the sum of squared residual values, and setting it to zero to find the minimum, which is why the technique is called "least squares".  (This technique uses the Euclidean distance as a measure of the deviations, and one should note that if a different metric for distance were used the results could be different.)
 
In this case, Y is the 20x1 vector of dependent variable values containing 78, 65, ... , 192, and X is the 20x3 matrix of the dependent variables latitude, longitude, and depth.   The vector β is the 3x1 matrix of unknown coefficients β1 β2 and β3.  
  
Matrix operations are very straightforward, and make the formulas you need to solve these problems much easier to remember.
 
Since the problem does not state any null or alternate hypothesis, it makes no sense to talk about significance in the solution to this problem.  The problem states that the purpose of taking the observations is to test the reliability of the new commercial kit, but does not specify what the criteria are for evaluating that reliability.  See http://www.uni.edu/chfasoa/reliabilityandvalidity.htm  or http://en.wikipedia.org/wiki/Reliability or http://web.utk.edu/~leon/rel/overview/reliability.html for a definition of the term "reliability".  It appears that "reliability" is a very vague, nearly undefined, term.   So the purpose of the curve fitting is not clear from the problem.

Roman C. · Answer

Solution using R.
 
1. Copy and paste this data in Notepad and save. Let's say you saved it as "WellData.txt".
 
2. Launch RStudio.
 
3. On the menu bar, select Tools > Import Dataset > From Text File...
 
4. Select WellData.txt and click Open.
 
5. In the RStudio Command prompt type model <- lm(WellData$Arsenic ~ WellData$Latitude + WellData$Longitude + WellData$Depth, data=WellData) and press enter. This creates a regression object and stores it in the variable model.
 
6. Now type summary(model) to see the regression summary. It should look like this
 
Call:lm(formula = WellData$Arsenic ~ WellData$Latitude + WellData$Longitude +  WellData$Depth, data = WellData)Residuals:        Min           1Q  Median          3Q          Max -111.853   -21.778    0.697    14.938    101.219 Coefficients:                                     Estimate     Std. Error   t value    Pr(>|t|) (Intercept)                149622.704    95674.027    1.564    0.13741 WellData$Latitude    -2074.360      1114.034  -1.862    0.08107 . WellData$Longitude  -1106.621     1006.227   -1.100    0.28770 WellData$Depth                  1.953           0.401    4.871    0.00017 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 54.13 on 16 degrees of freedomMultiple R-squared: 0.6136, Adjusted R-squared: 0.5412 F-statistic: 8.47 on 3 and 16 DF, p-value: 0.001343
 
-----------------------------------------------------------------
From this we have y = 149622.70 - 2074.36x1 - 1106.62x2 + 1.95x3.

Russell B. · Answer

Mik,
The method of least squares applied to this situation is "Multiple regression". You can use the "Data analysis" tool pack of Excel to find the equation of the regression hyperplane (y= (?) +(?) x1 (?) x2+(?) x3). Make sure that "Data Analysis" has been activated in the Excel of your computer. If it is activated, you should be able to see its menu on the far right of Excel menus when you click on the "Data" tab of Excel. If it has not been activated and depending on the version of Excel in your computer (2003/2007/2010), you can activate it by clicking on Microsoft icon (top/left) (or on the "File" menu in the 2010 version), and then successively click on "Options"--"Add-ins"--"Go". At this point the data analysis dialogue box will appear and you must check the "Data analysis tool-pack" option. It may then take a few minutes for your computer to add the data analysis tool pack.
 
Next, enter the data in five different columns. The first column representing the "Well ID" is not important and does not play any role. Suppose this is Column A of Excel. Also suppose the Columns B, C, D, and E respectively contain the X1 (latitude), X2 (longitude), X3 (depth), and Y (arsenic) values; with all 4 top cells being the labels; so that, the numeric data are in cell B2 through B21 for the latitudes, cells C2 through C21 for the longitudes, cells D2 through D21 for the depths, and cells E2 through E21 for the arsenic levels.
 
Next, click on the "Data" menu, and then click on the "Data Analysis" tab. When the Data analysis dialogue box opens, choose "Regression" in the available list. In the regression dialogue box that opens, enter columns B, C, and D as the X-values. This can be done by clicking into cell B1, dragging the mouse to the right until C1 and D1 are also included, and then dragging all the way down until all three columns representing X1 (latitude), X2 (longitude), X3 (depth) values have been included. If done correctly, the box containing the three X values should indicate something like: "Sheet 1!$B$1:$D$21". If you have a hard time dragging in with the mouse, you can simply type in "B1:D21" into the box for X (independent variables). Next, type in "E1:E21" into the box for Y (dependent variable), or use the dragging with the mouse method. Finally, Click on "OK".
 
Excel will now generate a new sheet containing all the necessary information for the multiple regression. The information that you would need to answer the questions of your problem are located at the bottom of that sheet under the "Regression Output". The value indicated as the "Intercept" is the first "?" of the equation; and the other three values indicated as the coefficients b1, b2, and b3 corresponding to the three independent variables X1 , X2, and X3 would replace the other three "?" marks in the equation.

Alexander S. · Answer

The regression equation is given by
 
y = -2074 x1 - 1107 x2 + 1.95 x3 + 149623
 
This is the output using Microsoft Excel multiple regression analysis:
 
SUMMARY OUTPUT  Regression Statistics Multiple R 0.783344535 R Square 0.61362866 Adjusted R Square 0.541184034 Standard Error 54.12807997 Observations 20   CoefficientsIntercept 149622.7039LATITUDE -2074.359732LONGITUDE -1106.621483DEPTH 1.95344088

Method of least squares

4 Answers By Expert Tutors

Still looking for help? Get the right answer, fast.

OR

RELATED TOPICS

RELATED QUESTIONS

When selecting two people, find the probability that one works in an office and the other one drives a car?

The slope and the y-intercept

Probability & Statistics

Probability & Statistics

RECOMMENDED TUTORS

IXL

Rosetta Stone

Education.com

TPT

Vocabulary.com

ABCya

SpanishDictionary.com

Inglés.com

Emmersion

Method of least squares

4 Answers By Expert Tutors

Still looking for help? Get the right answer, fast.

OR

RELATED TOPICS

RELATED QUESTIONS

When selecting two people, find the probability that one works in an office and the other one drives a car?

The slope and the y-intercept

Probability & Statistics

Probability & Statistics

RECOMMENDED TUTORS

find an online tutor