PyTorch 学习 -6- 损失函数

又见苍岚

COLMAP PatchMatch Stereo 算法详解事件驱动的状态机框架：从理论到工程实践 Git 在国内网络环境下无法 Push 的排查与修复 —— 配置 Clash 代理分段五次多项式插值原理详解路径插值方法深度对比研究 Claude Code 使用指南 OpenClaw 记忆管理与技能创建指南 CBS(Conflict-Based Search)算法详解 A* 算法及其变种详解 OpenClaw 配置多 Agents Windows Powershell 无法加载文件，因为在此系统上禁止运行脚本问题的解决方案 MaxClaw 安装流程大模型 AI 名词介绍 AList 网盘聚合工具简介 Protobuf 简介与测试 Claude Code 简介以及 GLM 4.7 模型接入 Github 歌词下载工具 163MusicLyrics Python __getattr__ 懒加载 Python TypedDict 机器人仿真平台 Gazebo 安装记录机器人仿真平台 Gazebo 简介多机器人路径规划问题(Multi-Agent Path Finding, MAPF)简介 Python exifread 读取修改过的 jpeg 信息错误问题修复 3D 坐标系变换的理解 3D 旋转矩阵基本概念 MongoDB Compass 介绍 Python 环境管理工具 uv Flutter 开发指南 Snipaste 安装下载与黑屏问题解决方案全局路径规划算法记录 2025 Python 版本性能测试 Flutter Hello World Flutter 安装环境配置 Ubuntu VMware 硬盘扩容后 SMBus Host controller not enabled 报错问题解决 Python NetworkX 教程 Docker GPU 报错 - Failed to initialize NVML Unknown Error 解决方案 Python matplotlib 图表绘制 cuda-toolkit 安装替代 Cuda 与 Cudnn Jinja2 Python 利用 docxtpl 和 Jinja2 生成基于模板的 Word 文档 Docker 实现 CPU 核心隔离 LoFTR 基于 Transformer 的特征提取匹配算法 OmniGlue 特征匹配 SuperGlue 使用图神经网络学习特征匹配 Ubuntu 下将 xlsx 文件按照 sheet 转换为图片 Python 使用 SQLAlchemy Python FastAPI 教程 openwrt 软路由配置安装 Nav2 地图文件（PGM/YAML）规范标准 3D OBJ 模型转换为 glb 瓦片格式 Python 源码 Redis 数据库介绍 Ubuntu 22.04 内核自动升级导致 MongoDB 7.0.12 错误记录 ubuntu 20.04 安装 ROS Noetic ubuntu 18.04 安装 ROS Melodic VMware Workstation Pro 个人免费版下载、安装、使用指南 Hybrid A-star 路径规划 Reeds-Shepp 曲线 Dubins 曲线 Linux kvm 虚拟机网络不通的问题解决方法 Ubuntu 自动内存清理 BiliBili 缓存视频转 mp4 Python 求解线性规划 3D Gaussian Splatting 官方源码实践记录 ImageMagick 教程 Ubuntu 22.04 安装 Colmap 对数几率 odds Ubuntu nmcli 网络管理工具使用指南 SuperPoint 自监督深度学习特征点提取 SyncTV Music Tag Web 在线音乐信息整理工具 ncm 格式转 mp3 MusicBrainz 音乐元数据百科数据库 Ubuntu 网络流量监控工具私人云音乐平台 Navidrome 入门手眼标定四元数（Quaternions） OHTTPS 实现免费自动 https 证书申请、更新、部署 ubuntu 22.04 安装 CloudCompare 单机 KVM 虚拟机冷迁移 Ubuntu 22.04 使用 mdadm 实现软 raid 小鱼一键安装 ROS-humble Fluid -46- 基于 Simpletex API 构建公式识别页面公式识别 API 简介 -- Simpletex 使用 Python web 部署库 waitress 3D Gaussian Splatting for Real-Time Radiance Field Rendering Ubuntu Swap 简介与空间扩展 Ubuntu 24.04 安装 forticlient Clash Verge 使用 MongoDB 7.0.17 集群 Docker 构建源码 Error code - 2013. Lost connection to MySQL server during query 问题解决 Python 日志记录库 loguru 使用指北 Python 实现 Web 日志查看服务 MySQL LOAD DATA LOCAL INFILE 极速数据加载 Image size exceeds limit of 89478485 pixels 解决方案 Docker 使用 NVIDIA GPU 驱动错误解决阿里云 docker 镜像仓库 Ubuntu中没有wired connected的解决方案 MinIO 简介 subconverter 代理订阅格式转换修复 node –openssl-legacy-provider is not allowed in NODE_OPTIONS 错误

PyTorch 学习 -6- 损失函数

Yiwei Zhang · 2023-07-18 · via 又见苍岚

模型学习的根源在于需要知道当前模型的问题出在哪，为模型优化指明方向和距离就需要依靠损失函数，本文介绍 Pytorch 的损失函数。

参考深入浅出PyTorch ，系统补齐基础知识。

本节目录

在深度学习中常见的损失函数及其定义方式
PyTorch中损失函数的调用

二分类交叉熵损失函数

1	`torch.nn.BCELoss(weight=None, size_average=None, reduce=None, reduction='mean')`

功能：计算二分类任务时的交叉熵（Cross Entropy）函数。在二分类中，label是{0,1}。对于进入交叉熵函数的input为概率分布的形式。一般来说，input为sigmoid激活层的输出，或者softmax的输出。

主要参数：

weight: 每个类别的loss设置权值

size_average: 数据为bool，为True时，返回的 loss 为平均值；为False时，返回的各样本的 loss 之和。这个参数已经被重命名为 reduction，将在将来的版本中删除。请使用 reduction 参数代替。

reduce: 数据类型为bool，为True时，loss的返回是标量。

核心实现：

1 2	`def forward(self, input: Tensor, target: Tensor) -> Tensor: return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)`

因此使用时，第一个参数为输入数据 $d$，第二个参数是目标数据 $t$，则 loss 为：
$$
loss=t\log \frac{1}{d}+(1-t)\log\frac{1}{1-d}
$$

import torch
from torch import nn
import numpy as nploss = nn.BCELoss()
m = nn.Sigmoid()
data = torch.tensor([0.0], requires_grad=True)
target = torch.ones(1)
l = loss(m(data), target)
print(l)
print(np.log(2))
pass
-->
tensor(0.6931, grad_fn=<BinaryCrossEntropyBackward0>)
0.6931471805599453

交叉熵损失函数

1	`torch.nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')`

功能：计算交叉熵函数

主要参数：

weight:每个类别的loss设置权值。

size_average:数据为bool，为True时，返回的loss为平均值；为False时，返回的各样本的loss之和。

ignore_index:忽略某个类的损失函数。

reduce:数据类型为bool，为True时，loss的返回是标量。

import torch
import torch.nn as nnx_input=torch.randn(3,5)#随机生成输入 
print('x_input:\n',x_input) 
y_target=torch.tensor([4,2,0])#设置输出具体值 print('y_target\n',y_target)
#计算输入softmax，此时可以看到每一行加到一起结果都是1
softmax_func=nn.Softmax(dim=1)
soft_output=softmax_func(x_input)
print('soft_output:\n',soft_output)
#在softmax的基础上取log
log_output=torch.log(soft_output)
print('log_output:\n',log_output)
#对比softmax与log的结合与nn.LogSoftmaxloss(负对数似然损失)的输出结果，发现两者是一致的。
logsoftmax_func=nn.LogSoftmax(dim=1)
logsoftmax_output=logsoftmax_func(x_input)
print('logsoftmax_output:\n',logsoftmax_output)
#pytorch中关于NLLLoss的默认参数配置为：reducetion=True、size_average=True
nllloss_func=nn.NLLLoss(reduction="none")
nlloss_output=nllloss_func(logsoftmax_output,y_target)
print('nlloss_output:\n',nlloss_output)
#直接使用pytorch中的loss_func=nn.CrossEntropyLoss()看与经过NLLLoss的计算是不是一样
crossentropyloss=nn.CrossEntropyLoss(reduction="none")
crossentropyloss_output=crossentropyloss(x_input,y_target)
print('crossentropyloss_output:\n',crossentropyloss_output)

输出：

x_input:
 tensor([[-1.7327, -0.1885, -0.7649,  0.8701,  0.4981],
        [-2.1903,  0.5137, -0.3262,  0.1239,  0.0126],
        [ 0.8400,  1.4696, -0.2860, -2.8149, -0.3208]])
soft_output:
 tensor([[0.0321, 0.1505, 0.0846, 0.4338, 0.2990],
        [0.0241, 0.3595, 0.1552, 0.2435, 0.2178],
        [0.2825, 0.5302, 0.0916, 0.0073, 0.0885]])
log_output:
 tensor([[-3.4380, -1.8939, -2.4702, -0.8352, -1.2072],
        [-3.7271, -1.0231, -1.8630, -1.4128, -1.5242],
        [-1.2643, -0.6346, -2.3902, -4.9191, -2.4250]])
logsoftmax_output:
 tensor([[-3.4380, -1.8939, -2.4702, -0.8352, -1.2072],
        [-3.7271, -1.0231, -1.8630, -1.4128, -1.5242],
        [-1.2643, -0.6346, -2.3902, -4.9191, -2.4250]])
nlloss_output:
 tensor([1.2072, 1.8630, 1.2643])
crossentropyloss_output:
 tensor([1.2072, 1.8630, 1.2643])

L1损失函数

1	`torch.nn.L1Loss(size_average=None, reduce=None, reduction='mean')`

功能： 计算输出y和真实标签target之间的差值的绝对值。

我们需要知道的是，reduction参数决定了计算模式。有三种计算模式可选：none：逐个元素计算。 sum：所有元素求和，返回标量。 mean：加权平均，返回标量。如果选择none，那么返回的结果是和输入元素相同尺寸的。默认计算方式是求平均。

计算公式如下： $L_{n}=\left|x_{n}-y_{n}\right| $

import torch
import torch.nn as nndata = torch.randn([2,4], requires_grad=True)
target = torch.empty([2,4]).random_(2)
print(data)
print(target)
loss = nn.L1Loss(reduction="none")
res = loss(data, target)
print(res)
pass

输出：

tensor([[ 0.7438, -0.7181,  1.7000,  0.2125],
        [-0.8243,  1.0593, -1.5408, -0.9641]], requires_grad=True)
tensor([[0., 1., 1., 1.],
        [1., 0., 1., 1.]])
tensor([[0.7438, 1.7181, 0.7000, 0.7875],
        [1.8243, 1.0593, 2.5408, 1.9641]], grad_fn=<AbsBackward0>)

MSE损失函数

1	`torch.nn.MSELoss(size_average=None, reduce=None, reduction='mean')`

功能： 计算输出y和真实标签target之差的平方。

和L1Loss一样，MSELoss损失函数中，reduction参数决定了计算模式。有三种计算模式可选：none：逐个元素计算。 sum：所有元素求和，返回标量。默认计算方式是求平均。

计算公式： $ l_{n}=\left(x_{n}-y_{n}\right)^{2} $

loss = nn.MSELoss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
output = loss(input, target)
output.backward()print('MSE损失函数的计算结果为',output)
-->
MSE损失函数的计算结果为 tensor(1.6968, grad_fn=<MseLossBackward>)

平滑L1 (Smooth L1)损失函数

1	`torch.nn.SmoothL1Loss(size_average=None, reduce=None, reduction='mean', beta=1.0)`

功能： L1的平滑输出，其功能是减轻离群点带来的影响

reduction参数决定了计算模式。有三种计算模式可选：none：逐个元素计算。 sum：所有元素求和，返回标量。默认计算方式是求平均 mean。

提醒： 之后的损失函数中，关于reduction 这个参数依旧会存在。所以，之后就不再单独说明。

计算公式如下：

$$ \operatorname{loss}(x, y)=\frac{1}{n} \sum_{i=1}^{n} z_{i} 其中， z_{i}=\left\{\begin{array}{ll}0.5\left(x_{i}-y_{i}\right)^{2}, & \text { if }\left|x_{i}-y_{i}\right|<1 \\ \left|x_{i}-y_{i}\right|-0.5, & \text { otherwise }\end{array}\right. $$

loss = nn.SmoothL1Loss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
output = loss(input, target)
output.backward()print('SmoothL1Loss损失函数的计算结果为',output)
-->
SmoothL1Loss损失函数的计算结果为 tensor(0.7808, grad_fn=<SmoothL1LossBackward>)

平滑L1与L1的对比

这里我们通过可视化两种损失函数曲线来对比平滑L1和L1两种损失函数的区别。

inputs = torch.linspace(-10, 10, steps=5000)
target = torch.zeros_like(inputs)loss_f_smooth = nn.SmoothL1Loss(reduction='none')
loss_smooth = loss_f_smooth(inputs, target)
loss_f_l1 = nn.L1Loss(reduction='none')
loss_l1 = loss_f_l1(inputs,target)
plt.plot(inputs.numpy(), loss_smooth.numpy(), label='Smooth L1 Loss')
plt.plot(inputs.numpy(), loss_l1, label='L1 loss')
plt.xlabel('x_i - y_i')
plt.ylabel('loss value')
plt.legend()
plt.grid()
plt.show()

可以看出，对于smoothL1来说，在 0 这个尖端处，过渡更为平滑。

目标泊松分布的负对数似然损失

1	`torch.nn.PoissonNLLLoss(log_input=True, full=False, size_average=None, eps=1e-08, reduce=None, reduction='mean')`

功能： 泊松分布的负对数似然损失函数，针对的是神经网络输出为泊松分布参数 $\lambda$ 时损失计算的情况。由于输出为 $\lambda$ 而不是概率值，因此需要将该值转化为概率。

主要参数：

log_input：输入是否为对数形式，决定计算公式。

full：计算所有 loss，默认为 False。表示loss计算是否保留 $log(y_{n}!)%$ 如果保留使用

- 当 $ \mathrm{y}_{\mathrm{n}} \leq 1, \log \left(\mathrm{y}_{\mathrm{n}} !\right) $ 近似为 0 - 当 $ \mathrm{y}_{\mathrm{n}}>1 $ ，使用斯特林公式(Stirling’s formula) ，$ \log \left(\mathrm{y}_{\mathrm{n}} !\right) $ 近似为 $ \mathrm{y}_{\mathrm{n}} * \log \left(\mathrm{y}_{\mathrm{n}}\right)-\mathrm{y}_{\mathrm{n}}+0.5 * \log \left(2 \pi \mathrm{y}_{\mathrm{n}}\right) $.

eps：修正项，避免 input 为 0 时，$log(input)$ 为 nan 的情况。

原理：

泊松分布概率计算公式：
$$
\mathrm{P}(\mathrm{Y}=\mathrm{k})=\frac{\lambda^{\mathrm{k}}}{\mathrm{k} !} \mathrm{e}^{-\lambda}
$$
对于包含 $N$ 个样本的 batch 数据 $D ( x , y )$，$y$ 是样本对应的类别标签，服从泊松分布。$x$ 与 $y$ 的维度相同。

当网络输出参数为 $x_n$ 时，若该样本对应的标签为 $y_n$:

若 $x$ 是神经网络的输出，且未进行对数化处理。第 $n$ 个样本对应的损失 $l_{n}$ 为：

$$ \mathrm{P}\left(\mathrm{Y}=\mathrm{y}_{\mathrm{n}}\right)=\frac{\mathrm{x}_{\mathrm{n}}^{\mathrm{y}_{\mathrm{n}}}}{\mathrm{y}_{\mathrm{n}} !} \mathrm{e}^{-\mathrm{x}_{\mathrm{n}}} $$ $$ \mathrm{l}_{\mathrm{n}}=-\log \mathrm{P}\left(\mathrm{Y}=\mathrm{y}_{\mathrm{n}}\right)=\mathrm{x}_{\mathrm{n}}-\mathrm{y}_{\mathrm{n}} \log \mathrm{x}_{\mathrm{n}}+\log \left(\mathrm{y}_{\mathrm{n}} !\right) $$

若 $x$ 是神经网络的输出，且进行过了对数化处理。第 $n$ 个样本对应的损失 $l_{n}$ 为：

$ \mathrm{x}_{\mathrm{n}} $ 替换为 $ \exp \left(\mathrm{x}_{\mathrm{n}}\right) $

$$ \mathrm{l}_{\mathrm{n}}=-\log \mathrm{P}\left(\mathrm{Y}=\mathrm{y}_{\mathrm{n}}\right)=\exp \left(\mathrm{x}_{\mathrm{n}}\right)-\mathrm{y}_{\mathrm{n}} \mathrm{x}_{\mathrm{n}}+\log \left(\mathrm{y}_{\mathrm{n}} !\right) $$

最后一项$ log(y_{n}!)$可以省略或者用斯特林公式(Stirling’s formula)近似。

数学公式：

当参数log_input=True： $ \operatorname{loss}\left(x_{n}, y_{n}\right)=e^{x_{n}}-x_{n} \cdot y_{n} $
当参数log_input=False：$ \operatorname{loss}\left(x_{n}, y_{n}\right)=x_{n}-y_{n} \cdot \log \left(x_{n}+\right. eps ) $

import torch
import matplotlib.pyplot as plt
import torch.nn as nnloss = nn.PoissonNLLLoss(reduction='none')
log_input = torch.randn(5, 2, requires_grad=True)
target = torch.empty(5,2).random_(5)
output = loss(log_input, target)
print('PoissonNLLLoss损失函数的计算结果为',output)
-->
PoissonNLLLoss损失函数的计算结果为 tensor([[1.8573, 2.2177],
        [1.9914, 1.0427],
        [4.5823, 0.8821],
        [4.5176, 1.0008],
        [2.6423, 0.3343]], grad_fn=<SubBackward0>)

KL散度

1	`torch.nn.KLDivLoss(size_average=None, reduce=None, reduction='mean', log_target=False)`

功能： 计算KL散度，也就是计算相对熵。用于连续分布的距离度量，并且对离散采用的连续输出空间分布进行回归通常很有用。

主要参数:

reduction：计算模式，可为 none/sum/mean/batchmean。

none：逐个元素计算。sum：所有元素求和，返回标量。
mean：加权平均，返回标量。
batchmean：batchsize 维度求平均值。

计算公式：

$$ \begin{aligned} D_{\mathrm{KL}}(P, Q)=\mathrm{E}_{X \sim P}\left[\log \frac{P(X)}{Q(X)}\right] & =\mathrm{E}_{X \sim P}[\log P(X)-\log Q(X)] \\ & =\sum_{i=1}^{n} P\left(x_{i}\right)\left(\log P\left(x_{i}\right)-\log Q\left(x_{i}\right)\right)\end{aligned} $$

使用流程：

使用时输入为 input 和 target，其中 target 相当于公式中的 $P$，此处的 target 为概率值， input 为概率的对数结果，因此其实计算的是 $\sum target \times (\ln target -input)$

import torch.nn as nn
import torch
import torch.nn.functional as Fx = torch.randn((1, 8))
y = torch.randn((1, 8))
# 先转化为概率，之后取对数
x_log = F.log_softmax(x,dim=1)
# 只转化为概率
y = F.softmax(y,dim=1)
kl = nn.KLDivLoss(reduction='batchmean')
out = kl(x_log, y)
print(x)
print(y)
print(out)
-->
tensor([[-0.9543, -0.4117,  0.0377, -0.3320,  0.2467, -0.4887,  0.1111,  1.2274]])
tensor([[0.0630, 0.0266, 0.0735, 0.2664, 0.1959, 0.1449, 0.1859, 0.0438]])
tensor(0.4630)

验证示例：

import torch
import torch.nn as nn
import mathdef validate_loss(output, target):
    val = 0
    for li_x, li_y in zip(output, target):
        for i, xy in enumerate(zip(li_x, li_y)):
            x, y = xy
            loss_val = y * (math.log(y, math.e) - x)
            val += loss_val
    return val / output.nelement()
torch.manual_seed(20)
loss = nn.KLDivLoss()
input = torch.Tensor([[-2, -6, -8], [-7, -1, -2], [-1, -9, -2.3], [-1.9, -2.8, -5.4]])
target = torch.Tensor([[0.8, 0.1, 0.1], [0.1, 0.7, 0.2], [0.5, 0.2, 0.3], [0.4, 0.3, 0.3]])
output = loss(input, target)
print("default loss:", output)
output = validate_loss(input, target)
print("validate loss:", output)
loss = nn.KLDivLoss(reduction="batchmean")
output = loss(input, target)
print("batchmean loss:", output)
loss = nn.KLDivLoss(reduction="mean")
output = loss(input, target)
print("mean loss:", output)
loss = nn.KLDivLoss(reduction="none")
output = loss(input, target)
print("none loss:", output)
-->
default loss: tensor(0.6209)
validate loss: tensor(0.6209)
batchmean loss: tensor(1.8626)
mean loss: tensor(0.6209)
none loss: tensor([[1.4215, 0.3697, 0.5697],
        [0.4697, 0.4503, 0.0781],
        [0.1534, 1.4781, 0.3288],
        [0.3935, 0.4788, 1.2588]])

MarginRankingLoss

1	`torch.nn.MarginRankingLoss(margin=0.0, size_average=None, reduce=None, reduction='mean')`

功能： 计算两个向量之间的相似度，用于排序任务。该方法用于计算两组数据之间的差异。

主要参数:

margin：边界值，$ x_{1} $ 与 $ x_{2} $ 之间的差异值。

reduction：计算模式，可为 none/sum/mean。

计算公式：

$ \operatorname{loss}(x 1, x 2, y)=\max (0,-y *(x 1-x 2)+\operatorname{margin}) $

loss = nn.MarginRankingLoss()
input1 = torch.randn(3, requires_grad=True)
input2 = torch.randn(3, requires_grad=True)
target = torch.randn(3).sign()
output = loss(input1, input2, target)
output.backward()print('MarginRankingLoss损失函数的计算结果为',output)
-->
MarginRankingLoss损失函数的计算结果为 tensor(0.7740, grad_fn=<MeanBackward0>)

多标签边界损失函数

1	`torch.nn.MultiLabelMarginLoss(size_average=None, reduce=None, reduction='mean')`

功能： 对于多标签分类问题计算损失函数。

主要参数:

reduction：计算模式，可为 none/sum/mean。

计算公式： $\operatorname{loss}(x, y)=\sum_{i j} \frac{\max (0,1-x[y[j]]-x[i])}{x \cdot \operatorname{size}(0)} $

其中对于所有的和都有并且其中, $ , i=0, \ldots, x \cdot \operatorname{size}(0), j=0, \ldots, y \cdot \operatorname{size}(0) $, 对于所有的 $ i $ 和 $ j $, 都有 $ y[j] \geq 0 $ 并且 $ i \neq y[j] $

loss = nn.MultiLabelMarginLoss()
x = torch.FloatTensor([[0.9, 0.2, 0.4, 0.8]])
# for target y, only consider labels 3 and 0, not after label -1
y = torch.LongTensor([[3, 0, -1, 1]])# 真实的分类是，第3类和第0类
output = loss(x, y)print('MultiLabelMarginLoss损失函数的计算结果为',output)
-->
MultiLabelMarginLoss损失函数的计算结果为 tensor(0.4500)

二分类损失函数

1	`torch.nn.SoftMarginLoss(size_average=None, reduce=None, reduction='mean')torch.nn.(size_average=None, reduce=None, reduction='mean')`

功能： 计算二分类的 logistic 损失。

主要参数:

reduction：计算模式，可为 none/sum/mean。

计算公式：$ \operatorname{loss}(x, y)=\sum_{i} \frac{\log (1+\exp (-y[i] \cdot x[i]))}{x \cdot \operatorname{nelement}()} $

其中, $ x . nelement ()$ 为输入 $ x $ 中的样本个数。注意这里 $ y $ 世有 1 和 -1 两种模式。

inputs = torch.tensor([[0.3, 0.7], [0.5, 0.5]])  # 两个样本，两个神经元
target = torch.tensor([[-1, 1], [1, -1]], dtype=torch.float)  # 该 loss 为逐个神经元计算，需要为每个神经元单独设置标签loss_f = nn.SoftMarginLoss()
output = loss_f(inputs, target)
print('SoftMarginLoss损失函数的计算结果为',output)
-->
SoftMarginLoss损失函数的计算结果为 tensor(0.6764)

多分类的折页损失

1	`torch.nn.MultiMarginLoss(p=1, margin=1.0, weight=None, size_average=None, reduce=None, reduction='mean')`

功能： 计算多分类的折页损失

主要参数:

reduction：计算模式，可为 none/sum/mean。

p：可选 1 或 2。

weight：各类别的 loss 设置权值。

margin：边界值

计算公式： $ \operatorname{loss}(x, y)=\frac{\sum_{i} \max (0, \operatorname{margin}-x[y]+x[i])^{p}}{x \cdot \operatorname{size}(0)} $

其中 $ x \in{0, \ldots, x \cdot \operatorname{size}(0)-1}, y \in{0, \ldots, y \cdot \operatorname{size}(0)-1} $, 对于所有 $i,j$，都有 $0 \leq y[j] \leq x \cdot \operatorname{size}(0)-1 $ 和 $ i \neq y[j] $.

inputs = torch.tensor([[0.3, 0.7], [0.5, 0.5]]) 
target = torch.tensor([0, 1], dtype=torch.long) loss_f = nn.MultiMarginLoss()
output = loss_f(inputs, target)
print('MultiMarginLoss损失函数的计算结果为',output)
-->
MultiMarginLoss损失函数的计算结果为 tensor(0.6000)

三元组损失

1	`torch.nn.TripletMarginLoss(margin=1.0, p=2.0, eps=1e-06, swap=False, size_average=None, reduce=None, reduction='mean')`

功能： 计算三元组损失。

三元组: 这是一种数据的存储或者使用格式。<实体1，关系，实体2>。在项目中，也可以表示为< anchor, positive examples , negative examples>

在这个损失函数中，我们希望去anchor的距离更接近positive examples，而远离negative examples

主要参数:

reduction：计算模式，可为 none/sum/mean。

p：可选 1 或 2。

margin：边界值

计算公式：

$L(a, p, n)=\max \left\{d\left(a_{i}, p_{i}\right)-d\left(a_{i}, n_{i}\right)+\operatorname{margin}, 0\right\}$ 其中, $ d\left(x_{i}, y_{i}\right)=\left\|\mathbf{x}_{i}-\mathbf{y}_{i}\right\| $.

triplet_loss = nn.TripletMarginLoss(margin=1.0, p=2)
anchor = torch.randn(100, 128, requires_grad=True)
positive = torch.randn(100, 128, requires_grad=True)
negative = torch.randn(100, 128, requires_grad=True)
output = triplet_loss(anchor, positive, negative)
output.backward()
print('TripletMarginLoss损失函数的计算结果为',output)-->
TripletMarginLoss损失函数的计算结果为 tensor(1.1667, grad_fn=<MeanBackward0>)

HingEmbeddingLoss

1	`torch.nn.HingeEmbeddingLoss(margin=1.0, size_average=None, reduce=None, reduction='mean')`

功能： 对输出的embedding结果做Hing损失计算

主要参数:

reduction：计算模式，可为 none/sum/mean。

margin：边界值

计算公式：

$$ l_{n}=\left\{\begin{array}{ll}x_{n}, & \text { if } y_{n}=1 \\ \max \left\{0, \Delta-x_{n}\right\}, & \text { if } y_{n}=-1\end{array}\right. $$

注意事项： 输入$x$应为两个输入之差的绝对值。

可以这样理解，让个输出的是正例 $y_n=1$,那么 loss 就是 $x$，如果输出的是负例 $y=-1$，那么输出的loss就是要做一个比较。

loss_f = nn.HingeEmbeddingLoss()
inputs = torch.tensor([[1., 0.8, 0.5]])
target = torch.tensor([[1, 1, -1]])
output = loss_f(inputs,target)print('HingEmbeddingLoss损失函数的计算结果为',output)
->
HingEmbeddingLoss损失函数的计算结果为 tensor(0.7667)

余弦相似度

1	`torch.nn.CosineEmbeddingLoss(margin=0.0, size_average=None, reduce=None, reduction='mean')`

功能： 对两个向量做余弦相似度

主要参数:

reduction：计算模式，可为 none/sum/mean。

margin：可取值[-1,1] ，推荐为[0,0.5] 。

计算公式：

$$ \begin{array}{l}\operatorname{loss}(x, y)=\left\{\begin{array}{ll}1-\cos \left(x_{1}, x_{2}\right), & \text { if } y=1 \\ \max \left\{0, \cos \left(x_{1}, x_{2}\right)-\operatorname{margin}\right\}, & \text { if } y=-1\end{array} \text { 其中， }\right. \\ \cos (\theta)=\frac{A \cdot B}{\|A\|\|B\|}=\frac{\sum_{i=1}^{n} A_{i} \times B_{i}}{\sqrt{\sum_{i=1}^{n}\left(A_{i}\right)^{2}} \times \sqrt{\sum_{i=1}^{n}\left(B_{i}\right)^{2}}}\end{array} $$

这个损失函数应该是最广为人知的。对于两个向量，做余弦相似度。将余弦相似度作为一个距离的计算方式，如果两个向量的距离近，则损失函数值小，反之亦然。

loss_f = nn.CosineEmbeddingLoss()
inputs_1 = torch.tensor([[0.3, 0.5, 0.7], [0.3, 0.5, 0.7]])
inputs_2 = torch.tensor([[0.1, 0.3, 0.5], [0.1, 0.3, 0.5]])
target = torch.tensor([1, -1], dtype=torch.float)
output = loss_f(inputs_1,inputs_2,target)print('CosineEmbeddingLoss损失函数的计算结果为',output)
-->
CosineEmbeddingLoss损失函数的计算结果为 tensor(0.5000)

CTC损失函数

1	`torch.nn.CTCLoss(blank=0, reduction='mean', zero_infinity=False)`

功能： 用于解决时序类数据的分类

计算连续时间序列和目标序列之间的损失。CTCLoss对输入和目标的可能排列的概率进行求和，产生一个损失值，这个损失值对每个输入节点来说是可分的。输入与目标的对齐方式被假定为 “多对一”，这就限制了目标序列的长度，使其必须是≤输入长度。

主要参数:

reduction：计算模式，可为 none/sum/mean。

blank：blank label。

zero_infinity：无穷大的值或梯度值为

# Target are to be padded
T = 50      # Input sequence length
C = 20      # Number of classes (including blank)
N = 16      # Batch size
S = 30      # Target sequence length of longest target in batch (padding length)
S_min = 10  # Minimum target length, for demonstration purposes# Initialize random batch of input vectors, for *size = (T,N,C)
input = torch.randn(T, N, C).log_softmax(2).detach().requires_grad_()
# Initialize random batch of targets (0 = blank, 1:C = classes)
target = torch.randint(low=1, high=C, size=(N, S), dtype=torch.long)
input_lengths = torch.full(size=(N,), fill_value=T, dtype=torch.long)
target_lengths = torch.randint(low=S_min, high=S, size=(N,), dtype=torch.long)
ctc_loss = nn.CTCLoss()
loss = ctc_loss(input, target, input_lengths, target_lengths)
loss.backward()
# Target are to be un-padded
T = 50      # Input sequence length
C = 20      # Number of classes (including blank)
N = 16      # Batch size
# Initialize random batch of input vectors, for *size = (T,N,C)
input = torch.randn(T, N, C).log_softmax(2).detach().requires_grad_()
input_lengths = torch.full(size=(N,), fill_value=T, dtype=torch.long)
# Initialize random batch of targets (0 = blank, 1:C = classes)
target_lengths = torch.randint(low=1, high=T, size=(N,), dtype=torch.long)
target = torch.randint(low=1, high=C, size=(sum(target_lengths),), dtype=torch.long)
ctc_loss = nn.CTCLoss()
loss = ctc_loss(input, target, input_lengths, target_lengths)
loss.backward()
print('CTCLoss损失函数的计算结果为',loss)
CTCLoss损失函数的计算结果为 tensor(16.0885, grad_fn=<MeanBackward0>)

参考资料

文章链接：
https://www.zywvvd.com/notes/study/deep-learning/pytorch/torch-learning/torch-learning-6/

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

又见苍岚

本节目录

二分类交叉熵损失函数

交叉熵损失函数

L1损失函数

MSE损失函数

平滑L1 (Smooth L1)损失函数

平滑L1与L1的对比

目标泊松分布的负对数似然损失

KL散度

MarginRankingLoss

多标签边界损失函数

二分类损失函数

多分类的折页损失

三元组损失

HingEmbeddingLoss

余弦相似度

CTC损失函数

参考资料