Pytorch张量和损失函数

文章目录

张量
- 张量类型
- 张量例子
- 使用概率分布创建张量
- - 正态分布创建张量 (torch.normal)
  - 正态分布创建张量示例
  - 标准正态分布创建张量
  - 标准正态分布创建张量示例
  - 均匀分布创建张量
  - 均匀分布创建张量示例
激活函数
- 常见激活函数
损失函数(Pytorch API)
- L1范数损失函数
- 均方误差损失函数
- 交叉熵损失函数
- 余弦相似度损失
- - 计算两个向量的余弦相似度
  - 计算两个矩阵的余弦相似度（逐行计算）
  - 计算两个 batch 数据的余弦相似度

张量

张量类型

张量是一个多维数组，它的每个方向都被称为模(Mode)。张量的阶数就是它的维数，一阶张量就是向量，二阶张量就是矩阵，三界以上的张量统称为高阶张量。

Tensor是Pytorch的基本数据结构，在使用时表示为torch.Tensor形式。主要属性包括以下内容（前四个属性与数据相关，后四个属性与梯度求导相关）：
- data：被包装的张量。
- dtype：张量的数据类型。
- shape：张量的形状/维度。
- device：张量所在的设备，加速计算的关键（CPU、GPU）
- grad：data的梯度
- grad_fn：创建张量的Function（自动求导的关键）
- requires_grad：指示是否需要计算梯度
- is_leaf：指示是否为叶子节点

torch.dtype是表示torch.Tensor数据类型的对象，PyTorch支持以下9种数据类型：

数据类型	dtype表示	CPU张量类型	GPU张量类型
32位浮点数	`torch.float32` 或 `torch.float`	`torch.FloatTensor`	`torch.cuda.FloatTensor`
64位浮点数	`torch.float64` 或 `torch.double`	`torch.DoubleTensor`	`torch.cuda.DoubleTensor`
16位浮点数	`torch.float16` 或 `torch.half`	`torch.HalfTensor`	`torch.cuda.HalfTensor`
8位无符号整数	`torch.uint8`	`torch.ByteTensor`	`torch.cuda.ByteTensor`
8位有符号整数	`torch.int8`	`torch.CharTensor`	`torch.cuda.CharTensor`
16位有符号整数	`torch.int16` 或 `torch.short`	`torch.ShortTensor`	`torch.cuda.ShortTensor`
32位有符号整数	`torch.int32` 或 `torch.int`	`torch.IntTensor`	`torch.cuda.IntTensor`
64位有符号整数	`torch.int64` 或 `torch.long`	`torch.LongTensor`	`torch.cuda.LongTensor`
布尔型	`torch.bool`	`torch.BoolTensor`	`torch.cuda.BoolTensor`

浮点类型默认使用torch.float32
整数类型默认使用torch.int64
布尔类型用于存储True/False值
GPU张量类型需在CUDA环境下使用

张量例子

import torch
import numpy as np
# 1. 创建Tensor
x = torch.tensor([[1, 2], [3, 4.]])  # 自动推断为float32类型
print("Tensor x:\n", x)
y=torch.tensor(np.ones((3,3)))
print("Tensor y:\n", y)

Tensor x:
 tensor([[1., 2.],
        [3., 4.]])
Tensor y:
 tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)

# 2. 查看Tensor属性
print("\nTensor属性:")
print("data:", x.data)        # 被包装的张量
print("dtype:", x.dtype)      # 数据类型 torch.float32
print("shape:", x.shape)      # 形状/维度 torch.Size([2, 2])
print("device:", x.device)    # 所在设备 cpu
print("requires_grad:", x.requires_grad)  # 是否需要计算梯度 False
print("is_leaf:", x.is_leaf)  # 是否为叶子节点 True

Tensor属性:
data: tensor([[1., 2.],
        [3., 4.]])
dtype: torch.float32
shape: torch.Size([2, 2])
device: cpu
requires_grad: False
is_leaf: True

# 3. 设置requires_grad=True以跟踪计算
x = torch.tensor([[1., 2], [3, 4]], device='cpu', requires_grad=True)
print("\n设置requires_grad=True后的x:", x)

设置requires_grad=True后的x: tensor([[1., 2.],
        [3., 4.]], requires_grad=True)

# 4. 进行一些计算操作
y = x + 2
z = y * y * 3
out = z.mean()

print("\n计算过程:")
print("y = x + 2:\n", y)
print("z = y * y * 3:\n", z)
print("out = z.mean():", out)

计算过程:
y = x + 2:
 tensor([[3., 4.],
        [5., 6.]], grad_fn=<AddBackward0>)
z = y * y * 3:
 tensor([[ 27.,  48.],
        [ 75., 108.]], grad_fn=<MulBackward0>)
out = z.mean(): tensor(64.5000, grad_fn=<MeanBackward0>)

# 5. 反向传播计算梯度
out.backward()
print("\n梯度计算:")
print("x.grad:\n", x.grad)  # d(out)/dx

梯度计算:
x.grad:
 tensor([[4.5000, 6.0000],
        [7.5000, 9.0000]])

# 6. 查看grad_fn
print("\n梯度函数:")
print("y.grad_fn:", y.grad_fn)  # <AddBackward0>
print("z.grad_fn:", z.grad_fn)  # <MulBackward0>
print("out.grad_fn:", out.grad_fn)  # <MeanBackward0>

梯度函数:
y.grad_fn: <AddBackward0 object at 0x0000025AD0B28670>
z.grad_fn: <MulBackward0 object at 0x0000025AD0B919A0>
out.grad_fn: <MeanBackward0 object at 0x0000025AD0B28670>

# 7. 设备管理
if torch.cuda.is_available():
    device = torch.device("cuda")
    x_cuda = x.to(device)
    print("\nGPU Tensor:")
    print("x_cuda device:", x_cuda.device)
else:
    print("\nCUDA不可用")

GPU Tensor:
x_cuda device: cuda:0

# 8. 数据类型转换
x_int = x.int()
print("\n数据类型转换:")
print("x_int dtype:", x_int.dtype)  # torch.int32

数据类型转换:
x_int dtype: torch.int32

使用概率分布创建张量

正态分布创建张量 (torch.normal)

通过torch.normal()函数从给定参数的离散正态分布中抽取随机数创建张量。

torch.normal(mean, std, size=None, out=None)

mean (Tensor/float): 正态分布的均值（支持标量或张量）
std (Tensor/float): 正态分布的标准差（支持标量或张量）
size (tuple): 输出张量的形状（仅当mean/std为标量时必需）
out (Tensor): 可选输出张量

均值和标准差均为标量
均值为张量，标准差为标量
均值为标量，标准差为张量
均值和标准差均为张量（需同形状）

正态分布创建张量示例

import torch

# 模式1：标量均值和标准差
normal_tensor1 = torch.normal(mean=0.0, std=1.0, size=(2,2))
print("标量参数:\n", normal_tensor1)

# 模式2：张量均值 + 标量标准差
mean_tensor = torch.arange(1, 5, dtype=torch.float)
normal_tensor2 = torch.normal(mean=mean_tensor, std=1.0)
print("\n张量均值:\n", normal_tensor2)

# 模式4：张量均值 + 张量标准差
std_tensor = torch.linspace(0.1, 0.4, steps=4)
normal_tensor3 = torch.normal(mean=mean_tensor, std=std_tensor)
print("\n双张量参数:\n", normal_tensor3)

标量参数:
 tensor([[-1.5585,  0.2315],
        [-1.5771, -0.0783]])

张量均值:
 tensor([0.9710, 1.2523, 3.6285, 4.2808])

双张量参数:
 tensor([1.0566, 2.1025, 3.1653, 3.3020])

标准正态分布创建张量

torch.randn

torch.randn(*size, out=None, dtype=None, 
           layout=torch.strided, device=None, 
           requires_grad=False)

size (tuple): 定义张量形状的整数序列
dtype (torch.dtype): 指定数据类型（如torch.float32）
device (torch.device): 指定设备（‘cpu’或’cuda’）
requires_grad (bool): 是否启用梯度计算
torch.randn_like

torch.randn_like(input, dtype=None, layout=None, 
                device=None, requires_grad=False)

input (Tensor): 参考张量（复制其形状）

标准正态分布创建张量示例

# 基础用法
randn_tensor = torch.randn(3, 4, dtype=torch.float64)
print("标准正态张量:\n", randn_tensor)

# 类似张量创建
base_tensor = torch.empty(2, 3)
randn_like_tensor = torch.randn_like(base_tensor)
print("\n类似形状创建:\n", randn_like_tensor)

# GPU张量创建（需CUDA环境）
if torch.cuda.is_available():
    gpu_tensor = torch.randn(3, 3, device='cuda')
    print("\nGPU张量:", gpu_tensor.device)

标准正态张量:
 tensor([[-0.3266, -0.9314,  0.1892, -0.3418],
        [ 0.4397, -1.2986, -0.7380, -0.6443],
        [ 0.7485,  0.4076, -0.6021, -0.9000]], dtype=torch.float64)

类似形状创建:
 tensor([[-0.8994,  0.5934, -1.3246],
        [-0.1019,  0.8172, -1.3164]])

GPU张量: cuda:0

均匀分布创建张量

torch.rand：生成[0,1)区间内的均匀分布

torch.rand(*size, out=None, dtype=None, 
          layout=torch.strided, device=None,
          requires_grad=False) → Tensor

torch.rand_like

torch.rand_like(input, dtype=None, layout=None, 
               device=None, requires_grad=False)

均匀分布创建张量示例

# 基础均匀分布
uniform_tensor = torch.rand(2, 2)
print("均匀分布张量:\n", uniform_tensor)

# 指定范围的均匀分布（需线性变换）
a, b = 5, 10
scaled_tensor = a + (b - a) * torch.rand(3, 3)
print("\n[5,10)区间张量:\n", scaled_tensor)

# 整数均匀分布（需结合random.randint）
int_tensor = torch.randint(low=0, high=10, size=(4,))
print("\n整数均匀分布:\n", int_tensor)

均匀分布张量:
 tensor([[0.4809, 0.6847],
        [0.9278, 0.9965]])

[5,10)区间张量:
 tensor([[8.6137, 5.9940, 7.2302],
        [5.1680, 7.0532, 5.9403],
        [8.3315, 6.1549, 8.5181]])

整数均匀分布:
 tensor([8, 5, 9, 6])

激活函数

激活函数是指在神经网络的神经元上运行的函数，其负责将神经元的输入映射到输出端。

常见激活函数

参看深度学习系统学习系列【5】之深度学习基础

损失函数(Pytorch API)

在监督学习中，损失函数表示样本真实值与模型预测值之间的偏差，其值通常用于衡量模型的性能。现有的监督学习算法不仅使用了损失函数，而且求解不同应用场景的算法会使用不同的损失函数。即使在相同场景下，不同的损失函数度量同一样本的性能时也存在差异。
损失函数的选用是否合理直接决定着监督学习算法预测性能的优劣。
在PyTorch中，损失函数通过torch.nn包实现调用。

L1范数损失函数

L1范数损失即L1LoSS，原理就是取预测值和真实值的绝对误差的平均数，计算模型预测输出output和目标target之差的绝对值，可选择返回同维度的张量或者标量。
$loss(x,y)=\frac{1}{N}\sum_{i=1}^{N}|x-y|$

torch.nn.L1Loss (size_average=None, reduce=None, reduction='mean')

size_average：为True时，返回的loss为平均值；为False时，返回的loss为各样本的loss值之和。
reduce：返回值是否为标量，默认为True。

import torch
import torch.nn as nn
loss=nn.L1Loss(eduction='mean')
input=torch.tensor([1.0,2.0,3.0,4.0])
target=torch.tensor([4.0,5.0,6.0,7.0])
output=loss(input,target)
print(output) # tensor(3.)

两个输入类型必须一致，reduction是损失函数一个参数，有三个值：'none’返回的是一个向量(batch_size)，'sum’返回的是和，'mean’返回的是均值。

均方误差损失函数

均方误差损失即MSELoss，计算公式是预测值和真实值之间的平方和的平均数，计算模型预测输出output和目标target之差的平方，可选返回同维度的张量或者标量。
$loss(x,y)=\frac{1}{N}\sum_{i=1}^{N}|x-y|^2$

torch.nn.MSELoss(reduce=True,size average=True,reduction='mean')

reduce：返回值是否为标量，默认为True。
size_average：当reduce=True时有效。为True时，返回的loss为平均值；为False时，返回的loss为各样本的loss值之和。

import torch
import torch.nn as nn
loss=nn.MSELoss(reduction='mean')
input=torch.tensor([1.0,2.0,3.0,4.0])
target=torch.tensor([4.0,5.0,6.0,7.0])
output=loss(input,target)
print(output) # tensor(9.)

交叉熵损失函数

交叉熵损失（Cross Entropy Loss）函数结合了nn.LogSoftmax()和nn.NLLLoss()两个函数，在做分类训练的时候非常有用。
交叉熵的概念，它用来判定实际输出与期望输出的接近程度。也就是说，用它来衡量网络的输出与标签的差异，利用这种差异通过反向传播来更新网络参数。交叉熵主要刻画的是实际输出概率与期望输出概率的距离，也就是交叉熵的值越小，两个概率分布就越接近，假设概率分布p为期望输出，概率分布q为实际输出，计算公式如下：
$q)=-\sum_x p(x) \times logq(x)$

torch.nn.CrossEntropyLoss(weight=None, size_average=None,ignore_index=-100,reduce=None,reduction='mean')

weight(tensor)：n个元素的一维张量，分别代表n类权重，如果训练样本很不均衡的话，则非常有用，默认值为None。
size_average：当reduce=True时有效。为True时，返回的loss为平均值；为False时，返回的loss为各样本的loss值之和。
ignore_index：忽略某一类别，不计算其loss，并且在采用size_average时，不会计算那一类的loss值。
reduce：返回值是否为标量，默认为True。

import torch.nn as nn
entroy=nn.CrossEntropyLoss(reduction='mean')
input=torch.tensor([[-0.011,-0.022,-0.033,-0.044]])
target=torch.tensor([0])
output=entroy(input,target)
print(output)

余弦相似度损失

余弦相似度损失（Cosine SimilarityLoss）通常用于度量两个向量的相似性，可以通过最大化这个相似度来进行优化。
$\begin{array} { r } { \mathrm { l o s s } ( x , y ) = \left\{ \begin{array} { l l } { \mathrm { l } - \mathrm { c o s } ( x _ { 1 } , x _ { 2 } ) , \quad } & { y = 1 } \\ { \mathrm { m a x } ( 0 , \mathrm { c o s } ( x _ { 1 } , x _ { 2 } ) - \mathrm { m a r g i n } ) , \quad } & { y = - 1 } \end{array} \right. } \end{array}$
torch.nn.functional.cosine_similarity是 PyTorch 提供的用于计算两个张量之间余弦相似度（Cosine Similarity）的函数。余弦相似度衡量的是两个向量在方向上的相似程度，取值范围为 [-1, 1]，值越大表示方向越相似。

torch.nn.functional.cosine_similarity(x1,  x2, dim=1,  eps=1e-8)

参数	类型	说明
`x1`	`Tensor`	第一个输入张量
`x2`	`Tensor`	第二个输入张量
`dim`	`int`	计算相似度的维度，默认 `dim=1`表示对每个样本计算特征向量的相似度。
`eps`	`float`	防止除零的小数值，默认 `1e-8` 防止分母为零（当某个向量的 L2 范数为 0 时）

常见用途

文本/图像相似度计算（如对比学习、检索任务）。
损失函数设计（如 1 - cosine_similarity 用于最小化方向差异）。
特征匹配（如计算两个嵌入向量的相似度）。

计算两个向量的余弦相似度

输入要求：x1 和 x2 必须具有 相同的形状（shape）。如果输入是 1D 张量（向量），需要先 unsqueeze(0) 变成 2D（矩阵）才能计算。例如：

import torch
import torch.nn.functional as F

a = torch.tensor([1.0, 2.0, 3.0])
b = torch.tensor([4.0, 5.0, 6.0])

# 需要 unsqueeze(0) 变成 2D
similarity = F.cosine_similarity(a.unsqueeze(0), b.unsqueeze(0), dim=1)
print(similarity)  # 输出：tensor([0.9746])

计算两个矩阵的余弦相似度（逐行计算）

import torch
import torch.nn.functional as F
x1 = torch.tensor([[1.0, 2.0], [3.0, 4.0]])
x2 = torch.tensor([[5.0, 6.0], [7.0, 8.0]])

similarity = F.cosine_similarity(x1, x2, dim=1)
print(similarity)  # 输出：tensor([0.9689, 0.9974])

计算两个 batch 数据的余弦相似度

import torch
import torch.nn.functional as F
batch_a = torch.tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
batch_b = torch.tensor([[4.0, 5.0, 6.0], [7.0, 8.0, 9.0]])

similarity = F.cosine_similarity(batch_a, batch_b, dim=1)
print(similarity)  # 输出：tensor([0.9746, 0.9989])