反向传播算法

当网络给出预测之后,需要根据预测值与实际标签的差异调整网络中的权重和偏置,以便模型在将来能够更好地预测。这个调整过程称为反向传播(误差计算 → 梯度计算 → 参数更新)。

神经网络的结构

假设一共有 层网络,激活函数为 表示未激活的状态, 表示激活后的状态。

损失函数为

损失函数对 的偏导数为

基本方程

为了实现参数更新,我们需要计算

其中涉及到激活函数即 ,为了简化计算,先定义一个中间变量

输出层的

推广到 ,得到

对于 层,

其中 影响了图中红线所示部分

同理,可以得到

对于任意第 层,


下面计算

算法流程

  1. 输入数据
  2. 前向传播

  1. 反向传播误差

  1. 梯度下降,更新参数

代码实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
def backprop(self, x, y):
"""Return a tuple ``(nabla_b, nabla_w)`` representing the
nablaient for the cost function C_x. ``nabla_b`` and
``nabla_w`` are layer-by-layer lists of numpy arrays, similar
to ``self.biases`` and ``self.weights``."""
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]
# Forward propagation
activation = x
activations = [x] # list to store all the activations, layer by layer
zs = [] # list to store all the z vectors, layer by layer
for b, w in zip(self.biases, self.weights):
z = np.dot(w, activation)+b
zs.append(z)
activation = sigmoid(z)
activations.append(activation)
# Backward propagation
delta = self.cost_derivative(activations[-1], y) * \
sigmoid_prime(zs[-1])
nabla_b[-1] = delta
nabla_w[-1] = np.dot(delta, activations[-2].transpose())
# Note that the variable l in the loop below is used a little
# differently to the notation in Chapter 2 of the book. Here,
# l = 1 means the last layer of neurons, l = 2 is the
# second-last layer, and so on. It's a renumbering of the
# scheme in the book, used here to take advantage of the fact
# that Python can use negative indices in lists.
for l in xrange(2, self.num_layers):
z = zs[-l]
sp = sigmoid_prime(z)
delta = np.dot(self.weights[-l+1].transpose(), delta) * sp
nabla_b[-l] = delta
nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())

return (nabla_b, nabla_w)

参考资料

  1. mnielsen/neural-networks-and-deep-learning