标签归档:propagation

DeepLearning:Linear Class解释

class Linear(Node):
    """
    Represents a node that performs a linear transform.
    """
    def __init__(self, X, W, b):
        # The base class (Node) constructor. Weights and bias
        # are treated like inbound nodes.
        Node.__init__(self, [X, W, b])

    def forward(self):
        """
        Performs the math behind a linear transform.
        """
        X = self.inbound_nodes[0].value
        W = self.inbound_nodes[1].value
        b = self.inbound_nodes[2].value
        self.value = np.dot(X, W) + b

    def backward(self):
        """
        Calculates the gradient based on the output values.
        """
        # Initialize a partial for each of the inbound_nodes.
        self.gradients = {n: np.zeros_like(n.value) for n in self.inbound_nodes}
        # Cycle through the outputs. The gradient will change depending
        # on each output, so the gradients are summed over all outputs.
        for n in self.outbound_nodes:
            # Get the partial of the cost with respect to this node.
            grad_cost = n.gradients[self]
            # Set the partial of the loss with respect to this node's inputs.
            self.gradients[self.inbound_nodes[0]] += np.dot(grad_cost, self.inbound_nodes[1].value.T)
            # Set the partial of the loss with respect to this node's weights.
            self.gradients[self.inbound_nodes[1]] += np.dot(self.inbound_nodes[0].value.T, grad_cost)
            # Set the partial of the loss with respect to this node's bias.
            self.gradients[self.inbound_nodes[2]] += np.sum(grad_cost, axis=0, keepdims=False)

1. the loss with respect to inputs
self.gradients[self.inbound_nodes[0]] += np.dot(grad_cost, self.inbound_nodes[1].value.T)
Cost对某个Node的导数(gradient)等于Cost对前面节点导数的乘积。

解释一:

np.dot(grad_cost, self.inbound_nodes[1].value.T)

对于Linear节点来说,有三个输入参数,即inputs, weights, bias分别对应着

self.inbound_nodes[0],self.inbound_nodes[1],self.inbound_nodes[2]

So, each node will pass on the cost gradient to its inbound nodes and each node will get the cost gradient from it’s outbound nodes. Then, for each node we’ll need to calculate a gradient that’s the cost gradient times the gradient of that node with respect to its inputs.

于是Linear对inputs的求导就是weights。所以是grad_cost*weights.grad_cost是Linear输出节点传递进来的变化率。

np.dot(self.inbound_nodes[0].value.T, grad_cost)

同理可推对weights的求导为inputs,于是gradient=grad_cost*inputs

np.sum(grad_cost, axis=0, keepdims=False)

而对于bias,Linear对bias求导恒为1.所以gradient=1*grad_cost

解释二:为何是+=

因为每一个节点将误差传递给每一个输出节点。于是在Backpropagation时,要求出每一个节点的误差,就要将每一份传递出去给输出节点的误差加起来。于是用+=。
于是可以理解为什么要for n in self.outbound_nodes: 目的是为了在每一个节点的输出节点里遍历。
If a node has multiple outgoing nodes, you just sum up the gradients from each node.

注意点一:
要区分Backpropagation 和Gradient Descent是两个步骤,我通过Backpropagation找到gradient,于是找到了变化方向。再通过Gradient Descent来最小化误差。

To find the gradient, you just multiply the gradients for all nodes in front of it going backwards from the cost. This is the idea behind backpropagation. The gradients are passed backwards through the network and used with gradient descent to update the weights and biases.

最终目的是:

Backpropagation只求了导数部分。Gradient Descent则是整个过程。