The Warmup Trick for Training Deep Neural Networks

jdhao's digital space

Conversion between base64 and OpenCV or PIL Image 腾讯云对象存储博客图床开启 CDN 加速(不需要购买额外域名) Search and Replace in Multiple Files in Vim/Neovim Change Table Column Width in LaTeX Image or Table Side by Side in LaTeX LaTeX 并排显示图像或表格 Firenvim: Neovim inside Your Browser Content inside HTML tags missing in Latest Hugo? Creating Markdown Front Matter with Ultisnips Labelme JSON 标注格式转 voc XML 格式 Nifty Nvim Techniques That Make My Life Easier -- Series 6 macOS 下如何为视频制作字幕 Running Command Asynchronously inside Neovim Resolving Merge Conflict after Git Stash Pop Pylint: command not found? A Hands-on Experience with Neovim's Built-in LSP Support How to Convert PDF to Images with Imagemagick 互联网上常用缩略语集锦 File Backup in Neovim Converting PDF Pages to Images with Poppler Nifty Nvim Techniques That Make My Life Easier -- Series 5 Neovim Configuration for System-wide Use How to sort a list of tuple or list in Python -- lambda or itemgetter? Building A Vim Statusline from Scratch 人类第一颗原子弹爆炸始末 Distributed Training in PyTorch with Horovod Learning Expect Programming Essential Knowledge about SSH Nifty LaTeX Techniques -- Series 1 更改 Adsense 邮寄地址，重新寄送 PIN Mintty Tips and Configurations Generating Table of Contents for Markdown with Tagbar Convert Python Script to Exe on Windows with Pyinstaller Ubuntu on Windows Missing after Windows Update 使用代理加速 Mac 终端下载速度 My Experience with Several Zsh Plugin Managers 深圳租房小记 How to Install zplug inside Docker Container Why don't settings inside bashrc or bash_profile take effect? Setting Up Locale in Linux 谷歌 Adsense 申请及在 Hugo 中的配置 How to Write Algorithm Pseudo Code in LaTeX Nifty Nvim Techniques That Make My Life Easier -- Series 4 A Few Grammar Questions in Writing How to Read and Write Images with Unicode Paths in OpenCV Creating A Professional Table in LaTeX with booktabs How to Create Proper Folding for Vim/Nvim Configuration Linux Tips and Tricks -- s1 JPEG Image Orientation and Exif How Do I Show the Current File Path In Neovim? JPEG Image Quality in PIL Difference between view, reshape, transpose and permute in PyTorch Convert PIL or OpenCV Image to Bytes without Saving to Disk Fast Movement and Navigation Inside Vim or Neovim Unintuitive Behaviour of Case Sensitivity in Python glob Binding Keys in Zsh 几把机械键盘试用体验 Nvim Autocompletion with Deoplete Converting Markdown to Beautiful PDF with Pandoc Exclusive and Inclusive Motion in Neovim/Vim Nifty Nvim Techniques Which Make My Life Easier -- Series 3 Why Doesn't Jedi Autocompletion Work for Some Methods Vim-like Editing inside Browser Markdown 生成 HTML 时汉字之间出现多余空格问题小米 9 安装谷歌商店（Google Play Store）与相关配置 Create Mappings That Take A Count in Neovim Spell Checking in Nvim English Words Completion inside Neovim/Vim How to Use Python Inside Vim Script with Neovim Nifty Little Nvim Techniques to Make My Life Easier -- Series 2 Setting up Ultisnips for Neovim Mac 上罗技 M590 鼠标设置 Nifty Little Nvim Techniques to Make My Life Easier -- Series 1 A Complete Guide on Writing LaTeX with Vimtex in Neovim Manipulating Images with Alpha Channels in Pillow Sublime Text Regular Expression Cheat Sheet Cropping Rotated Rectangles from Image with OpenCV Boosting Your Productivity on Terminal with Zsh and Plugins 最新版 Rime 输入法使用 (2022 更新) Display Image with Pillow inside Ubuntu on Windows Faster Directory Navigation with z.lua Cmder Advanced Configurations Nvim-qt Settings on Windows 10 Tmux Plugin Install and Management How to Debug Python Code in Terminal Markdown Writing and Previewing in Neovim -- A Complete Guide Line Number Settings for More Efficient Movement in Neovim 两个大规模中文语料库介绍以及处理 Windows 系统下几款程序员不可不用的神器我的 2018 阅读清单 A Complete Guide to Neovim Configuration for Python Development How Is Newline Handled in Python and Various Editors? Two Issues Related to ImageFont Module in PIL 在 Listary 中调用 GoldenDict 或欧路词典查单词 Reading and Writing Text Files on Windows The Mathematics behind Font Shapes --- Bézier Curves and More 快速识别图片字体：字体识别工具介绍 Deoplete Failed to Load at Startup after Updating Python neovim Package What Is The Difference between pip, pip3 and pip3.6 Shipped with Anaconda3? Windows 10 系统下 Neovim 安装与配置

2020-08-14 · via jdhao's digital space

Warmup is a training technique often used in training deep neural networks. In this post, I will try to explain what is warmup, and how does it work.

Warmup was originally proposed in this paper: Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour. It gives a good explanation on why warmup is needed, and explains different strategies of warmup.

Why do we need warmup#

Suppose that we use learning rate $\eta$ on a single GPU with batch size $n$, when we train the network on 8 GPUs, now the batch size becomes $8n$. The learning rate also needs to change to suit the distributed training scenario. The author find that in practice, the linear scaling of learning rate works pretty well. For example, when we use initial learning rate 0.01 for one GPU, we may use an initial learning rate of 0.08 for distributed training, i.e., 0.01*8.

However, to use linear scaling of learning rate, certain condition have to be met¹. On the initial training stage, due to the rapid change of network parameters, the condition that makes linear scaling work does not hold any more. So, in the initial training stage, the authors propose warmup to tackle this issue.

The basic idea is that we should use a small learning rate than the value calculated by linear scaling policy. There are two strategies for warmup:

constant: Use a low learning rate than 0.08 for the initial few epochs.
gradual: In the first few epochs, the learning rate is set to be lower than 0.08 and increased gradually to approach 0.08 as epoch number increases. In maskrcnn, a linear warmup strategy is used for control warmup factor in the initial learning stage.

After the warmup epochs, the learning rate strategy would return to normal. You can change the learning rate based on the task at hand.

Warmup applications#

Warmup in ResNet#

In Deep residual learning, when training a 110-layer ResNet on CIFAR-10 (section 4.2), the authors used constant warmup to ease the initial training iterations:

In this case, we find that the initial learning rate of 0.1 is slightly too large to start converging. So we use 0.01 to warm up the training until the training error is below 80% (about 400 iterations), and then go back to 0.1 and continue training.

How does linear warmup work in maskrcnn#

In maskrcnn-benchmark, there is some config parameters about warmup in solver (WARMUP_FACTOR, WARMUP_ITERS, WARMUP_METHOD). The warmup method used by maskrcnn-benchmark can be found here:

def get_lr(self):
    warmup_factor = 1
    if self.last_epoch < self.warmup_iters:
        if self.warmup_method == "constant":
            warmup_factor = self.warmup_factor
        elif self.warmup_method == "linear":
            alpha = float(self.last_epoch) / self.warmup_iters
            warmup_factor = self.warmup_factor * (1 - alpha) + alpha
    return [
        base_lr
        * warmup_factor
        * self.gamma ** bisect_right(self.milestones, self.last_epoch)
        for base_lr in self.base_lrs
    ]

In the above code, self.last_epoch is the current training iteration (because maskrcnn-benchmark use iteration instead of the usual epoch to measure the training process). self.warmup_iters is the number of iterations for warmup in the initial training stage. self.warmup_factors are a constant (0.333 in this case).

Only when current iteration number is below self.warmup_iters, will the warmup_factor be used. Otherwise, it will be 1 and not affect the learning rate.

When current iteration is below warmup_iters and warmup method is linear. The warmup factor used is calculated as follows:

warmp_factor = 0.667 * (current_iter/warmup_iters) + 0.333

So as current iteration approaches warmup_iters, warmup_factor will gradually approach 1. As a result, the learning rate used will approach base learning rate.

References#

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

jdhao's digital space

Why do we need warmup#

Warmup applications#

Warmup in ResNet#

How does linear warmup work in maskrcnn#

References#