
William L. answered 05/18/21
Data Scientist and Evaluation Specialist
There are a couple of different approaches for figuring out the missing score depending on your personal preference.
The first option is to perform a regression using the column that is missing test scores as your outcome variable (missing_test ~ test1 + test 2). This will create a general linear model (glm) for what the missing score should be based on how your other students performed in their other two tests. You can then use a fitted model to predict what the missing scores should be. The risk with this is that students who have significantly different performance between exams will increase the potential variance in your missing test outcome.
The second option is designed to overcome the problem of score variance based on the assumption of bands of performance. You can perform a K-Nearest Neighbor (KNN) imputation. A KNN imputation assumes that the performance of your students will be similar to students who have scored similarly to them in other tests and will assign them a score based on their academic peers as opposed to the entire class.
It is important to note that whenever you estimate/substitute missing values (impute) your estimates are limited by the quality and quantity of your data. It is therefor essential that you look at your data beforehand and make yourself aware of ways your data may be biased by extreme scores or variance in scores.