目录
- 一、线性模型基本概念
- 二、梯度下降
- 三、反向传播
- 四、使用 Pytorch 实现线性模型
一、线性模型基本概念
线性模型: y ^ = x ∗ ω + b \hat{y} = x * \omega + b y^=x∗ω+b
简化版本,将 b b b 加入到权重矩阵 ω \omega ω 中: y ^ = x ∗ ω \hat{y} = x * \omega y^=x∗ω
单个样本的损失函数: l o s s = ( y ^ − y ) 2 = ( x ∗ ω − y ) 2 loss = (\hat{y} - y)^2 = (x * \omega - y)^2 loss=(y^−y)2=(x∗ω−y)2
整个 t r a i n i n g s e t training\ set training set 的损失函数(Mean Square Error, MSE): c o s t = 1 N ∑ n = 1 N ( y n ^ − y n ) 2 cost = \frac{1}{N} \sum_{n=1}^N (\hat{y_n} - y_n)^2 cost=N1∑n=1N(yn^−yn)2
代码实现及绘图:
import numpy as np
import matplotlib.pyplot as plt
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]
def forward(x):  # 定义模型
    return x * w
def loss(x, y):  # 损失函数
    y_pred = forward(x)
    return (y_pred - y) * (y_pred - y)
w_list = []
mse_list = []
for w in np.arange(0.0, 4.1, 0.1):
    print('w=', w)
    l_sum = 0
    for x_val, y_val in zip(x_data, y_data):
        y_pred_val = forward(x_val)
        loss_val = loss(x_val, y_val)
        l_sum += loss_val
        print('\t', x_val, y_val, y_pred_val, loss_val)
    print('MSE=', l_sum / 3)
    w_list.append(w)
    mse_list.append(l_sum / 3)
plt.plot(w_list, mse_list)
plt.ylabel('loss')
plt.xlabel('w')
plt.show()
二、梯度下降
损失函数 MSE: c o s t = 1 N ∑ n = 1 N ( y n ^ − y n ) 2 cost = \frac{1}{N} \sum_{n=1}^N (\hat{y_n} - y_n)^2 cost=N1∑n=1N(yn^−yn)2
为了使损失函数最小,即需要找到一个 ω ∗ \omega^* ω∗ 使得 c o s t cost cost 值最小: ω ∗ = a r g m i n c o s t ( ω ) ω \omega^* = \underset{\omega} {arg \ min \ cost(\omega)} ω∗=ωarg min cost(ω)
梯度: G r a d i e n t = ∂ c o s t ∂ ω = ∂ ∂ ω 1 N ∑ n = 1 N ( x n ⋅ ω − y n ) 2 = 1 N ∑ n = 1 N ∂ ∂ ω ( x n ⋅ ω − y n ) 2 = 1 N ∑ n = 1 N 2 ⋅ ( x n ⋅ ω − y n ) ∂ ( x n ⋅ ω − y n ) ∂ ω = 1 N ∑ n = 1 N 2 ⋅ x n ⋅ ( x n ⋅ ω − y n ) \begin{aligned} Gradient &= \frac{\partial cost}{\partial \omega} \\ &= \frac{\partial}{\partial \omega} \frac{1}{N} \sum^N_{n=1} (x_n \cdot\ \omega - y_n)^2 \\ &= \frac{1}{N} \sum_{n=1}^N \frac{\partial}{\partial \omega} (x_n \cdot \ \omega - y_n)^2 \\ &= \frac{1}{N} \sum_{n=1}^N 2 \cdot \ (x_n \cdot \ \omega - y_n) \frac{\partial (x_n \cdot \ \omega - y_n)}{\partial \omega} \\ &= \frac{1}{N} \sum_{n=1}^N 2 \cdot \ x_n \cdot \ (x_n \cdot \ \omega - y_n) \end{aligned} Gradient=∂ω∂cost=∂ω∂N1n=1∑N(xn⋅ ω−yn)2=N1n=1∑N∂ω∂(xn⋅ ω−yn)2=N1n=1∑N2⋅ (xn⋅ ω−yn)∂ω∂(xn⋅ ω−yn)=N1n=1∑N2⋅ xn⋅ (xn⋅ ω−yn)
梯度更新: ω = ω − α ∂ c o s t ∂ ω = ω − α 1 N ∑ n = 1 N 2 ⋅ x n ⋅ ( x n ⋅ ω − y n ) \begin{aligned} \omega &= \omega - \alpha \frac{\partial cost}{\partial \omega} \\ &= \omega - \alpha \frac{1}{N} \sum_{n=1}^N 2 \cdot x_n \cdot \ (x_n \cdot \ \omega - y_n) \end{aligned} ω=ω−α∂ω∂cost=ω−αN1n=1∑N2⋅xn⋅ (xn⋅ ω−yn)
代码实现:
import matplotlib.pyplot as plt
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]
w = 1.0
def forward(x):
    return x * w
def cost(xs, ys):
    cost = 0
    for x, y in zip(xs, ys):
        y_pred = forward(x)
        cost += (y_pred - y) ** 2
    return cost / len(xs)
def gradinet(xs, ys):
    grad = 0
    for x, y in zip(xs, ys):
        grad += 2 * x * (x * w - y)
    return grad / len(xs)
print('Predict (before training)', 4, forward(4))
Epoch_list = []
loss_list = []
for epoch in range(100):
    cost_val = cost(x_data, y_data)
    grad_val = gradinet(x_data, y_data)
    w -= 0.01 * grad_val
    Epoch_list.append(epoch)
    loss_list.append(cost_val)
    print('Epoch:', epoch, 'w=', w, 'loss=', cost_val)
print('Predict (after training)', 4, forward(4))
plt.plot(Epoch_list, loss_list)
plt.xlabel('Epoch')
plt.ylabel('cost')
plt.show()
随机梯度下降(Stochastic Gradient Desce,SGD),多指 mini-batch 的随机梯度下降。
三、反向传播
首先了解计算图,对于一个两层的神经网络模型: 
      
       
        
         
          
           
            
            
              y 
             
            
              ^ 
             
            
           
          
          
           
            
             
            
              = 
             
             
             
               W 
              
             
               2 
              
             
            
              ( 
             
             
             
               W 
              
             
               1 
              
             
            
              ⋅ 
             
            
              X 
             
            
              + 
             
             
             
               b 
              
             
               1 
              
             
            
              ) 
             
            
              + 
             
             
             
               b 
              
             
               2 
              
             
            
           
          
         
         
          
           
            
           
          
          
           
            
             
            
              = 
             
             
             
               W 
              
             
               2 
              
             
            
              ⋅ 
             
             
             
               W 
              
             
               1 
              
             
            
              ⋅ 
             
            
              X 
             
            
              + 
             
            
              ( 
             
             
             
               W 
              
             
               2 
              
             
             
             
               b 
              
             
               1 
              
             
            
              + 
             
             
             
               b 
              
             
               2 
              
             
            
              ) 
             
            
           
          
         
         
          
           
            
           
          
          
           
            
             
            
              = 
             
            
              W 
             
            
              ⋅ 
             
            
              X 
             
            
              + 
             
            
              b 
             
            
           
          
         
        
       
         \begin{aligned} \hat{y} &= W_2(W_1 \cdot X + b_1) + b_2 \\ &= W_2 \cdot W_1 \cdot X + (W_2b_1 + b_2) \\ &= W \cdot X + b\\ \end{aligned} 
        
       
     y^=W2(W1⋅X+b1)+b2=W2⋅W1⋅X+(W2b1+b2)=W⋅X+b对于这个模型,可以构建计算图如下:
 由上式可以知道,不管多少层的一个线性的神经网络模型,都可以化简为一个仅包含一层的神经网络模型,这样的话,那些设计多个隐藏层的神经网络模型就没有意义了。所以在每一层的结尾都需要一个非线性函数,改进后的模型如下,其中  
     
      
       
       
         σ 
        
       
      
        \sigma 
       
      
    σ 表示一个非线性函数。
 
课程中这里有对链式求导法则的讲解,但是那个很简单,这里就跳过了。
有了计算图,就可以开始了解传播算法了。下面是一个例子,先进性前馈传播,再进行反向传播,虽然在 Pytorch 这样的框架里面实现了自动求解梯度的功能,但是下面的两张图是原理的解释。

 下面是一个计算的实例:
 
 下面使用 Pytorch 实现一下传播算法。
注意,在 Pytorch 中的一个基本单位是张量(Tensor),它可以用于动态创建计算图,其中包含数据(data)以及损失函数对于该张量(一般是指权重)的梯度(grad)。
import torch
import matplotlib.pyplot as plt
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]
# 创建一个张量并且需要自动计算梯度
w = torch.Tensor([1.0])
w.requires_grad = True
# 构建计算图
def forward(x):
    return x * w  # 注意这里计算的时候x也被转换为了张量
def loss(x, y):
    y_pred = forward(x)
    return (y_pred - y) ** 2
print('predict (before training)', 4, forward(4).item())
epoch_list = []
l_list = []
for epoch in range(100):
    for x, y in zip(x_data, y_data):
        l = loss(x, y)
        l.backward()
        epoch_list.append(epoch)
        l_list.append(l.item())
        print('\tgrad:', x, y, w.grad.item())
        w.data = w.data - 0.01 * w.grad.data
        w.grad.data.zero_()  # 梯度置零,否则会累加
    print('progress:', epoch, l.item())
print('predict (after training)', 4, forward(4).item())
plt.plot(epoch_list, l_list)
plt.xlabel('epoch')
plt.ylabel('loss')
plt.show()
四、使用 Pytorch 实现线性模型
import torch
x_data = torch.Tensor([[1.0], [2.0], [3.0]])
y_data = torch.Tensor([[2.0], [4.0], [6.0]])
class LinearModel(torch.nn.Module):
    def __init__(self):
        super(LinearModel, self).__init__()  # 调用父类的构造函数
        self.linear = torch.nn.Linear(1, 1)  # 权重和偏置值的维度feature
    def forward(self, x):
        y_pred = self.linear(x)  # 对象后面接参数,调用了__call__()方法
        return y_pred
model = LinearModel()
criterion = torch.nn.MSELoss(reduction='sum')  # 损失函数
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)  # 优化方法:梯度下降
for epoch in range(1000):
    y_pred = model(x_data)  # forward,调用了__call__()方法
    loss = criterion(y_pred, y_data)
    print(epoch, loss.item())
    optimizer.zero_grad()  # 梯度置零
    loss.backward()
    optimizer.step()  # 更新梯度
print('w=', model.linear.weight.item())
print('b=', model.linear.bias.item())
x_test = torch.Tensor([[4.0]])
y_test = model(x_test)
print('y_pred=', y_test.item())



















