论文翻译 | Momentum Contrast for Unsupervised Visual Representation Learning(前三章)

news2025/7/22 8:47:38

前言: 上一次读恺明大神的文章还是两年前,被ResNet的设计折服得不行,两年过去了,我已经被卷死在沙滩上

Momentum Contrast for Unsupervised Visual Representation Learning

摘要

我们提出了针对无监督表征学习的方法MOCO,利用对比学习作为字典查找,我们建立了一个动态队列字典,和一个moving-averaged的编码器。这就可以实时的构建一个大的并且一致的字典来促进无监督的学习。MOCO在同样的线性协议(线性分类头)在IMAGENET上取得了很好的分类结果,并且学习到的特征可以很好的转移到下游任务中,MOCO在7个检测分割的任务中都 取得了最好的效果,有的甚至是大幅超过。这表明视觉任务中的无监督学习核有监督学习的差距在缩小。

We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a moving-averaged encoder. This enables building a large and consistent dictionary on-the-fly that facilitates contrastive unsupervised learning. MoCo provides competitive results under the common linear protocol on ImageNet classification. More importantly, the representations learned by MoCo transfer well to downstream tasks. MoCo can outperform its supervised pre-training counterpart in 7 detection/segmentation tasks on PASCAL VOC, COCO, and other datasets, sometimes surpassing it by large margins. This suggests that the gap between unsupervised and supervised representation learning has been largely closed in many vision tasks.

引言

无监督学习NLP领域很成功,但是在视觉领域,监督学习做预训练还是主流,无监督的方法落后。
当然这个现象的原因主要还是响应的信号空间不同。语言任务是离散的信号空间,有一个个的单词组成,就可以建立一个个标记的字典,无监督学习就是基于这些字典进行的。但是视觉任务把字典的建立看作是在连续高维空间的原始信号,并不是人类表达中的结构性信息

Unsupervised representation learning is highly successful in natural language processing, e.g., as shown by GPT and BERT [12]. But supervised pre-training is still dominant in computer vision, where unsupervised methods generally lag behind. The reason may stem from differences in their respective signal spaces. Language tasks have discrete signal spaces (words, sub-word units, etc.) for building tokenized dictionaries, on which unsupervised learning can be based. Computer vision, in contrast, further concerns dictionary building [54, 9, 5], as the raw signal is in a continuous, high-dimensional space and is not structured for human communication (e.g., unlike words).

一些研究也针对无监督学习任务,利用对比学习的方法做出了一些成果。不管是出于什么样的动机,这些方法其实都是考虑建造动态字典。 key在字典中是对原数据的抽样,并被网络的编码器抽取特征。无监督学习就是训练这样一个编码器来不断的进行字典的查找。一个编码好的查询子应该与对应匹配的key相似,与不匹配的key不相似。这种学习可以被表示成最小化一个对比损失。

Several recent studies [61, 46, 36, 66, 35, 56, 2] present promising results on unsupervised visual representation learning using approaches related to the contrastive loss [29]. Though driven by various motivations, these methods can be thought of as building dynamic dictionaries. The “keys” (tokens) in the dictionary are sampled from data (e.g., images or patches) and are represented by an encoder network. Unsupervised learning trains encoders to perform dictionary look-up: an encoded “query” should be similar to its matching key and dissimilar to others. Learning is formulated as minimizing a contrastive loss [29].

从这个角度说,我们假设构建这样一个字典需要,字典够大,并且一致。直觉上来看,一个更大的字典可以更好的对连续高维的视觉空间进行抽样,而字典里的key的特征都应该由同一个编码器编码,这样他们的对比才会一致。
然而,现有的方法或多或少都被这两个局限性限制了。

From this perspective, we hypothesize that it is desirable to build dictionaries that are: (i) large and (ii) consistent as they evolve during training. Intuitively, a larger dictionary may better sample the underlying continuous, high dimensional visual space, while the keys in the dictionary should be represented by the same or similar encoder so that their comparisons to the query are consistent. However, existing methods that use contrastive losses can be limited in one of these two aspects (discussed later in context).

我们提出了一种MOCO,动量对比学习,如图。我们利用队列来保持这个字典里的样本,最新编码的key的表示,而最久被更新的key被队列挤出去。利用队列的形式就可以和batchsize解耦,就可以构建更大的队列而不必受限于机器中有限制大小的batchsize大小。
其次,我们字典里的key都是来自新来的一些batch中的key ,而这些key是缓慢逐渐更新的,这是由于我们设计了一个动量来实现的。因此可以保持整个队列的一致性

We present Momentum Contrast (MoCo) as a way of building large and consistent dictionaries for unsupervised learning with a contrastive loss (Figure 1). We maintain the dictionary as a queue of data samples: the encoded representations of the current mini-batch are enqueued, and the oldest are dequeued. The queue decouples the dictionary size from the mini-batch size, allowing it to be large. Moreover, as the dictionary keys come from the preceding several mini-batches, a slowly progressing key encoder, implemented as a momentum-based moving average of the query encoder, is proposed to maintain consistency.
在这里插入图片描述

MOCO是一个构建动态字典来实现对比学习的机制,并且可以被用于代理任务。在本文,我们遵循一个简单的实例判别任务:一个query匹配一个key如果他们是来自同一张图片的编码。用这样的代理任务,MOCO展示了十分有竞争力的结果。

MoCo is a mechanism for building dynamic dictionaries for contrastive learning, and can be used with various pretext tasks. In this paper, we follow a simple instance discrimination task [61, 63, 2]: a query matches a key if they are encoded views (e.g., different crops) of the same image. Using this pretext task, MoCo shows competitive results under the common protocol of linear classification in the ImageNet dataset [11].

一个使用无监督学习的理应是为了进行下游任务的学习。我们展示了7个不同的下游任务,检测和分割,MOCO无监督预训练都在这几个数据集上好,有的还超过了不少。在实验里,我们用了一个亿的数据集进行续联,显示了MOCO可以在实际世界,亿级图片,没有被标记的场景中工作的更好,这些都真实了无监督学习在是视觉任务中可以替代掉有监督学习视觉任务的预训练模型。

A main purpose of unsupervised learning is to pre-train representations (i.e., features) that can be transferred to downstream tasks by fine-tuning. We show that in 7 downstream tasks related to detection or segmentation, MoCo unsupervised pre-training can surpass its ImageNet supervised counterpart, in some cases by nontrivial margins. In these experiments, we explore MoCo pre-trained on ImageNet or on a one-billion Instagram image set, demonstrating that MoCo can work well in a more real-world, billion image scale, and relatively uncurated scenario. These results show that MoCo largely closes the gap between unsupervised and supervised representation learning in many computer vision tasks, and can serve as an alternative to ImageNet supervised pre-training in several applications.

相关工作

无监督/自监督学习涵盖了两个方面。代理任务和损失函数。代理指的是这个任务的解决并不是真正的目的,而是它解决过程中呈现的好的数据表征才是真正要的东西。
而损失函数是独立于代理任务进行调研的。MOCO主要关注损失函数这些部分。我们在接下来两方面来讨论它。

Unsupervised/self-supervised learning methods generally involve two aspects: pretext tasks and loss functions. The term “pretext” implies that the task being solved is not of genuine interest, but is solved only for the true purpose of learning a good data representation. Loss functions can often be investigated independently of pretext tasks. MoCo focuses on the loss function aspect. Next we discuss related studies with respect to these two aspects.

损失函数
一个常用的损失韩式就是衡量一个固定的目标和预测之间的不同,比如L1L2loss,或者把这些摄入分到固定的某些类别中,然后用交叉熵损失,或margin-based损失。接下来讨论其他的替代方法也可以

Loss functions. A common way of defining a loss function is to measure the difference between a model’s prediction and a fixed target, such as reconstructing the input pixels (e.g., auto-encoders) by L1 or L2 losses, or classifying the input into pre-defined categories (e.g., eight positions [13], color bins [64]) by cross-entropy or margin-based losses. Other alternatives, as described next, are also possible.

对比损失衡量的是表征空间里匹配的对的相似度。是不是将输入匹配目标,对比损失将目标实时的在训练中变化,通过网络计算得到它的表征,一些工作中的无监督学习的核心就是对比学习,也是我们在第三章用到的

Contrastive losses [29] measure the similarities of sample pairs in a representation space. Instead of matching an input to a fixed target, in contrastive loss formulations the target can vary on-the-fly during training and can be defined in terms of the data representation computed by a network [29]. Contrastive learning is at the core of several recent works on unsupervised learning [61, 46, 36, 66, 35, 56, 2], which we elaborate on later in context (Sec. 3.1).

对比学习的损失衡量的是概率分布的区别。是无监督数据生成中的成功技巧。表征学习的对抗方法也在一些文献里有研究。生成对抗网络和噪声对比估计的联系也有在一些文章里提到。

Adversarial losses [24] measure the difference between probability distributions. It is a widely successful technique or unsupervised data generation. Adversarial methods for representation learning are explored in [15, 16]. There are relations (see [24]) between generative adversarial networks and noise-contrastive estimation (NCE) [28].

代理任务。一批代理任务被提出,比如在某些损害下恢复原输入,比如…
而一些代理任务通过某些方法形成伪标签,比如…

Pretext tasks. A wide range of pretext tasks have been proposed. Examples include recovering the input under some corruption, e.g., denoising auto-encoders [58], context autoencoders [48], or cross-channel auto-encoders (colorization) [64, 65]. Some pretext tasks form pseudo-labels by, e.g., transformations of a single
(“exemplar”) image [17], patch orderings [13, 45], tracking [59] or
segmenting objects [47] in videos, or clustering features [3, 4].

对比学习和代理任务,一些代理任务是基于对抗损失函数的形式的。比如,
实例判别方法和 exemplar-based task [17] and NCE相关。而CPC中的代理任务是上下文自动编码[48]的一种形式,而在对比多视点编码(CMC)[56]中,它与着色[64]有关。

Contrastive learning vs. pretext tasks. Various pretext tasks can be based on some form of contrastive loss functions. The instance discrimination method [61] is related to the exemplar-based task [17] and NCE [28]. The pretext task in contrastive predictive coding (CPC) [46] is a form of context auto-encoding [48], and in contrastive multiview coding (CMC) [56] it is related to colorization [64].

方法

3.1 对比学习作为字典查找
对比学习和相关发展,可以看做是训练一个字典查找编码器,接下来具体介绍

Contrastive learning [29], and its recent developments, can be thought of as training an encoder for a dictionary look-up task, as described next.

一个重新

Consider an encoded query q and a set of encoded samples {k0, k1, k2, …} that are the keys of a dictionary. Assume that there is a single key (denoted as k+) in the dictionary that q matches. A contrastive loss [29] is a function whose value is low when q is similar to its positive key k+ and dissimilar to all other keys (considered negative keys for q). With similarity measured by dot product, a form of a contrastive loss function, called InfoNCE [46], is considered in this paper:

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/395091.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

上门按摩预约APP源码-东郊到家源码(开发,PHP,平台搭建)

一、什么是上门按摩预约APP源码? 上门按摩预约APP源码是一款家政服务类型的APP,可以帮忙用户在家就能享受按摩的服务。APP源码分两端,一端是用户端,另外一端是技师端。采用的技术,前端是安卓IOS,后端是PHP&…

java_Day004

1.二维数组 二维数组的创建与初始化(java是支持规则数组和不规则数组的) 例:int[][] array {{1,2},{2,3}{3,4,5}}; 结构如下: 二维数组的遍历: 例子: Testpublic void test21(){int[][] array new int[…

C++学习记录——십이 vector

文章目录1、vector介绍和使用2、vector模拟实现insert和erase和迭代器失效补齐其他函数深浅拷贝难点思考1、vector介绍和使用 vector可以管理任意类型的数组&#xff0c;是一个表示可变大小数组的序列容器。 通过vector文档来看它的使用。 #include <iostream> #inclu…

集群、分布式的理解

一、单机模式小型系统相对简单&#xff0c;所有的业务全部写在一个项目中&#xff0c;部署服务到一台服务器上&#xff0c;所有的请求业务都由这台服务器处理&#xff0c;这就是单机模式。显然&#xff0c;当业务增长到一定程度的时候&#xff0c;服务器的硬件会无法满足业务需…

强化学习 | 课堂笔记 | 第三课 MP的便利性,随机逼近方法

一、回顾 一、值函数、贝尔曼方程、贝尔曼最优方程 二、最优值函数 三、ADP 3.1 VI 3.2 PI 四、ADP可以使用的条件 五、Q函数 六、解决问题的方案 &#xff08;指的是解决“四 ADP可以使用的条件”中的三个问题&#xff09; 二、期望的计算 一、Markov过程的便利性 1…

新搭建Gitlab代码仓代码如何导入

这里写目录标题一级目录1.本地代码如何导入新Gitlab2.怎么将旧Gitlab代码导入新Gitlab一级目录 1.本地代码如何导入新Gitlab 修改本地代码 .git 目录下面的config 文件&#xff0c;主要是url参数&#xff0c;将url指向新的Gitlab仓库地址 [core]repositoryformatversion 0f…

【1096. 花括号展开 II】

来源&#xff1a;力扣&#xff08;LeetCode&#xff09; 描述&#xff1a; 如果你熟悉 Shell 编程&#xff0c;那么一定了解过花括号展开&#xff0c;它可以用来生成任意字符串。 花括号展开的表达式可以看作一个由 花括号、逗号 和 小写英文字母 组成的字符串&#xff0c;定…

E900V21C(S905L-armbian)安装armbian-Ubuntu(WiFi)

基本上是s905L芯片的刷机都是如此&#xff0c;包括Q7等 在网上寻找好多的教程关于e900v21c的刷机包和教程都少的可怜&#xff0c;唯一的就是这个&#xff1a;山东联通版创维E900V21C盒子刷入Armbiam并安装宝塔和Docker&#xff0c;但他是不能用WiFi和蓝牙的然后就是寻找s90l的…

C++基础了解-01-基础语法

基础语法 一、基础语法 C 程序可以定义为对象的集合&#xff0c;这些对象通过调用彼此的方法进行交互。现在让我们简要地看一下什么是类、对象&#xff0c;方法、即时变量。 对象 - 对象具有状态和行为。例如&#xff1a;一只狗的状态 - 颜色、名称、品种&#xff0c;行为 -…

【LeetCode每日一题】——334.递增的三元子序列

文章目录一【题目类别】二【题目难度】三【题目编号】四【题目描述】五【题目示例】六【解题思路】七【题目提示】八【题目进阶】九【时间频度】十【代码实现】十一【提交结果】一【题目类别】 贪心算法 二【题目难度】 中等 三【题目编号】 334.递增的三元子序列 四【题…

Vue3视频播放组件(Video)

Vue2视频播放组件 可自定义设置以下属性&#xff1a; 视频文件url&#xff08;videoUrl&#xff09;&#xff0c;必传&#xff0c;支持网络地址https和相对地址 视频封面url&#xff08;videoCover&#xff09;&#xff0c;默认为null&#xff0c;支持网络地址https和相对地…

【nacos2.2.1本地启动】

nacos2.2.1本地启动填坑之行 下载nacos代码 nacos文档地址&#xff1a;https://nacos.io/zh-cn/docs/quick-start-spring.html github地址下载代码&#xff1a;https://github.com/alibaba/nacos.git appllo文章&#xff1a;https://blog.51cto.com/muxiaonong/3933418 下…

UEFI学习(三)-创建一个dxe driver-UDK2017

创建一个dxe driver 创建UEFI DXE driver DXE驱动的运行阶段 DXE驱动创建 创建UEFI DXE driver 在edk2中&#xff0c;我们可以了解到它有非常多种类的模块&#xff0c;每种模块运行于不同阶段&#xff0c;上一阶段&#xff0c;我们尝试了一下标准应用程序的工程模块&#xff0c…

Centos7超详细安装教程

Centos 7适合初入门的带图形化的界面系统安装 本文是基于VMware虚拟机&#xff0c;centos7 64位安装教学 文章目录Centos 7适合初入门的带图形化的界面系统安装一、软件准备二、VMware新建适配虚拟机三、Centos 安装四、基础检查一、软件准备 VMware 虚拟机安装 官网下载链接&…

Redis 做延迟消息队列

背景 看到消息队列&#xff0c;我们肯定会想到各种MQ&#xff0c;比如&#xff1a;RabbitMQ&#xff0c;acivityMQ、RocketMQ、Kafka等。 但是&#xff0c;当我们需要使用消息中间件的时候&#xff0c;并非每次都需要非常专业的消息中间件&#xff0c;假如我们只有一个消息队…

问一下ChatGPT:DIKW金字塔模型

经常看到这张DIKW金字塔模型图&#xff0c;还看到感觉有点过份解读的图&#xff0c;后面又加上了insight&#xff0c;impact等内容。 Data&#xff1a;是数据&#xff0c;零散的、无规则的呈现到人们眼前&#xff0c;如果你只看到这些数字&#xff0c;如果没有强大的知识背景&a…

STM32之DMA

DMA介绍DMA(Direct MemoryAccess&#xff0c;直接存储器访问)提供在外设与内存、存储器和存储器、外设与外设之间的高速数据传输使用。它允许不同速度的硬件装置来沟通&#xff0c;而不需要依赖于CPU&#xff0c;在这个时间中&#xff0c;CPU对于内存的工作来说就无法使用。DMA…

音乐、音效素材库,好听的BGM都在这~

推荐6个超好用的音频素材网站&#xff0c;免费可商用&#xff0c;热门BGM配乐都能找到&#xff0c;自媒体视频剪辑必备&#xff01;建议收藏&#xff01; 1、菜鸟图库 https://www.sucai999.com/audio.html?vNTYxMjky 菜鸟图库素材非常多&#xff0c;包含了设计、办公、自媒体…

详解FPGA:人工智能时代的驱动引擎观后感

详解FPGA&#xff1a;人工智能时代的驱动引擎观后感 本书大目录 第一章 延续摩尔定律 第二章 拥抱大数据的洪流 第三章 FPGA在人工智能时代的独特优势 第四章 更简单也更复杂——FPGA开发的新方法 第五章 站在巨人肩上——FPGA发展新趋势 文章目录详解FPGA&#xff1a;人工智能…

Redis技术详解

Redis技术详解 Redis是一种支持key-value等多种数据结构的存储系统。可用于缓存&#xff0c;事件发布或订阅&#xff0c;高速队列等场景。支持网络&#xff0c;提供字符串&#xff0c;哈希&#xff0c;列表&#xff0c;队列&#xff0c;集合结构直接存取&#xff0c;基于内存&…