Sources:
[Title] 10.2: Neural Networks: Perceptron Part 1
[Link with timestamp] https://youtu.be/ntKn5TPHHAk?t=1639
[Title] 3.5: Mathematics of Gradient Descent
[Link with timestamp] https://youtu.be/jc2IthslyzM?t=700
A few days ago I wanted to relearn all I had forgotten about neural networks.
I first read the book "Make Your Own Neural Network" and I watched the calculus videos from 3blue1brown as well as their Neural Network with gradient descent explanation videos.
I tried to soak as much info about the math as possible before diving into Shiffman's neural network series.
Then I watched the "Perceptron Part 1" video and got a little confused on the part "ERROR x X" but kept going.
After that, I looked into the Mathematics of Gradient Descent and followed the code video and the math one.
First of all, I'm a little confused why he started his whole equation with the error being squared.
I feel like he shouldn't have because, in the end, we want to get both negative and positive numbers right?
He then removes the "2" in front of the error so I guess it still works in the final result.
I'm really struggling to apply the math described in the "Mathematics of Gradient Descent" into "10.2: Neural Networks: Perceptron Part 1" (the timestamp in particular).
I could understand the linear regression example because it was easier but when I tried to work out the same problem in the code of the perceptron I was having
a hard time understanding and coming up with a mathematical equation that describes why the delta weight is "ERROR x X" (It also includes the learning rate but I'm not mentioning it because I want to focus on what I don't understand).