You are asked to implement a data-fitting classifier that can predict the label of an input vector. Specifically, in the context of hand-written digit classification, the label is an integer number between 0 to 9, and the input vector is a vectorized 28 × 28 image with each element as a floating-point number ranging from 0 to 1 (telling the grayscale of the pixel). Try to solve the hand-written digit classification problems using our well-understood mathematical tool of least squares. We will use the least-squares to classify a subset of the MNIST hand-written digit dataset.
The training dataset for this project has 1000 images, which are stored as a 1000 × 784 matrix in train_data.txt (You may refer to the constructor of the class Classifier to learn how to read a matrix from a text file). The testing dataset for this project has 200 images, which are stored as a 200 × 784 matrix in test_data.txt. The labels for the training set are stored as a 1D array (or a 1000 × 1 matrix) in train_labels.txt and the labels for the testing set are in test_labels.txt.
Here is a brief and high-level description of the mathematical model you may consider implementing. You will solve a least-squares problem 𝑋𝑊 = 𝑌 to fit your data set, with 𝑋 as a 1000 × 785 (with the bias term!) data matrix, 𝑊 as a 785 × 10 model matrix, and 𝑌 is a 1000 × 10 true value right-hand matrix composed of 1 and −1 (or any values that distinguish the two classes). For the easiness of the implementation, you may solve the problem as ten separate least-squares problems with each column 𝜃𝑖 (a 785 × 1 vector) as the unknowns. After calculating the values of 𝑊 , you may use it to predict the class of a new image 𝑥 by a simple vector-matrix multiplication 𝑣⃗ = 𝑥⃗𝑊 and then pick the index with the maximum value from the vector 𝑝 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑖{𝑣𝑖}. You need to test your least-squares classifier on the test dataset and report your least-squares model’s accuracy (by counting how many images are predicted correctly).
[Hint: you may suffer from a singular 𝑋 due to the large blank regions in each image. You need to add a small random number to each element in A to solve the problem to make it solvable in your least-squares solver. For example, you may add a noise ranging from [0, 0.001] to each element in A to perturb its features.]
def classifier(x, y, classes=10):
#x: train_data
#y: train_labels
#classes: # of classes
x = add_bias_column(x)
theta = np.zeros([x.shape[1], classes])
### your code starts here ###
### your code ends here ###
return theta