Ashley P.

asked • 12/08/20

Multiple Linear Regression - Python

I'm currently working on a project to see which data imputation method works best with a dataset I have.

I have the complete dataset.


Independent variable : Yield of the crop

Dependent variables : Year , Season , Production per hectare



So I'm planning to apply data imputation methods such as Multiple Linear Regression, KNN, Polynomial Interpolation.


My method is to randomly remove some independent variable fields(test set) and then try to imputate them using above techniques by training the rest of the dataset using above techniques, and comparing with the original Yield value.

Then I plan to select the data imputation method which works best for this dataset.


Consider this procedure done using Python programming language.(Google Colab environment)


Now I've coded upto the part where I've trained the model using 80:20 train:test data ratio.


I've computed the linear regression coefficients and my test dataset already have been inserted with the Yield values from the model.


Since, I need graphical and statistical evidence of the efficiency and accuracy of each model, how am I supposed to impute Yield values to the whole dataset and compare with original Yield values.


Do I have to manually create an equation containing the equation of the linear model , substitute independent variables and then find the Yield values from the model and then then compare with the original Yield value?


Is there any code that automatically adds a column with the Yield values derived from the linear regression model, for the whole dataset, just any method that will give the estimate values for all the Yield values in the dataset.


1 Expert Answer

By:

Patrick B. answered • 12/08/20

Tutor
4.7 (31)

Math and computer tutor/teacher

Ashley P.

R-coefficient is = 0.89 and RMSE = 160 Predicted value ranges between 2000-4000 Would that be a fairly goof model?
Report

12/08/20

Patrick B.

should be fine; imputing the outliers will make it even better!
Report

12/08/20

Ashley P.

Thank you very much for the response. Actually what I was doing was splitting the dataset into training and test datasets and train a model using training dataset and then impute values to the test dataset using the training dataset in several ways, choose the best data imputation method for this dataset. Any ideas for a better model? Thank you!
Report

12/09/20

Still looking for help? Get the right answer, fast.

Ask a question for free

Get a free answer to a quick problem.
Most questions answered within 4 hours.

OR

Find an Online Tutor Now

Choose an expert and meet online. No packages or subscriptions, pay only for the time you need.