Accelerate Batched Image Inference in PyTorch

jdhao's digital space

Conversion between base64 and OpenCV or PIL Image 腾讯云对象存储博客图床开启 CDN 加速(不需要购买额外域名) Search and Replace in Multiple Files in Vim/Neovim Change Table Column Width in LaTeX Image or Table Side by Side in LaTeX LaTeX 并排显示图像或表格 Firenvim: Neovim inside Your Browser Content inside HTML tags missing in Latest Hugo? Creating Markdown Front Matter with Ultisnips Labelme JSON 标注格式转 voc XML 格式 Nifty Nvim Techniques That Make My Life Easier -- Series 6 macOS 下如何为视频制作字幕 Running Command Asynchronously inside Neovim Resolving Merge Conflict after Git Stash Pop Pylint: command not found? A Hands-on Experience with Neovim's Built-in LSP Support How to Convert PDF to Images with Imagemagick 互联网上常用缩略语集锦 File Backup in Neovim Converting PDF Pages to Images with Poppler Nifty Nvim Techniques That Make My Life Easier -- Series 5 Neovim Configuration for System-wide Use How to sort a list of tuple or list in Python -- lambda or itemgetter? Building A Vim Statusline from Scratch 人类第一颗原子弹爆炸始末 Distributed Training in PyTorch with Horovod Learning Expect Programming Essential Knowledge about SSH Nifty LaTeX Techniques -- Series 1 更改 Adsense 邮寄地址，重新寄送 PIN Mintty Tips and Configurations Generating Table of Contents for Markdown with Tagbar Convert Python Script to Exe on Windows with Pyinstaller Ubuntu on Windows Missing after Windows Update 使用代理加速 Mac 终端下载速度 My Experience with Several Zsh Plugin Managers 深圳租房小记 How to Install zplug inside Docker Container Why don't settings inside bashrc or bash_profile take effect? Setting Up Locale in Linux 谷歌 Adsense 申请及在 Hugo 中的配置 How to Write Algorithm Pseudo Code in LaTeX Nifty Nvim Techniques That Make My Life Easier -- Series 4 A Few Grammar Questions in Writing How to Read and Write Images with Unicode Paths in OpenCV Creating A Professional Table in LaTeX with booktabs How to Create Proper Folding for Vim/Nvim Configuration Linux Tips and Tricks -- s1 JPEG Image Orientation and Exif How Do I Show the Current File Path In Neovim? JPEG Image Quality in PIL Difference between view, reshape, transpose and permute in PyTorch Convert PIL or OpenCV Image to Bytes without Saving to Disk Fast Movement and Navigation Inside Vim or Neovim Unintuitive Behaviour of Case Sensitivity in Python glob Binding Keys in Zsh 几把机械键盘试用体验 Nvim Autocompletion with Deoplete Converting Markdown to Beautiful PDF with Pandoc Exclusive and Inclusive Motion in Neovim/Vim Nifty Nvim Techniques Which Make My Life Easier -- Series 3 Why Doesn't Jedi Autocompletion Work for Some Methods Vim-like Editing inside Browser Markdown 生成 HTML 时汉字之间出现多余空格问题小米 9 安装谷歌商店（Google Play Store）与相关配置 Create Mappings That Take A Count in Neovim Spell Checking in Nvim English Words Completion inside Neovim/Vim How to Use Python Inside Vim Script with Neovim Nifty Little Nvim Techniques to Make My Life Easier -- Series 2 Setting up Ultisnips for Neovim Mac 上罗技 M590 鼠标设置 Nifty Little Nvim Techniques to Make My Life Easier -- Series 1 A Complete Guide on Writing LaTeX with Vimtex in Neovim Manipulating Images with Alpha Channels in Pillow Sublime Text Regular Expression Cheat Sheet Cropping Rotated Rectangles from Image with OpenCV Boosting Your Productivity on Terminal with Zsh and Plugins 最新版 Rime 输入法使用 (2022 更新) Display Image with Pillow inside Ubuntu on Windows Faster Directory Navigation with z.lua Cmder Advanced Configurations Nvim-qt Settings on Windows 10 Tmux Plugin Install and Management How to Debug Python Code in Terminal Markdown Writing and Previewing in Neovim -- A Complete Guide Line Number Settings for More Efficient Movement in Neovim 两个大规模中文语料库介绍以及处理 Windows 系统下几款程序员不可不用的神器我的 2018 阅读清单 A Complete Guide to Neovim Configuration for Python Development How Is Newline Handled in Python and Various Editors? Two Issues Related to ImageFont Module in PIL 在 Listary 中调用 GoldenDict 或欧路词典查单词 Reading and Writing Text Files on Windows The Mathematics behind Font Shapes --- Bézier Curves and More 快速识别图片字体：字体识别工具介绍 Deoplete Failed to Load at Startup after Updating Python neovim Package What Is The Difference between pip, pip3 and pip3.6 Shipped with Anaconda3? Windows 10 系统下 Neovim 安装与配置

2022-03-18 · via jdhao's digital space

I have a web service where the images come in a batch so I have to do inference for several images in PIL format at a time. Initially, I use a naive approach and just transform the images one by one, then combine them to form a single tensor and do the inference. The inference becomes really slow when I have a batch with more than 30 images. The inference time is about 1-2 seconds per batch.

Should we set cudnn.benchmark to True?#

Some blog posts have recommend an easy way to speed your inference: setting torch.backends.cudnn.benchmark to True. By setting this option to True, cudnn will try to find the fastest convolution algorithm for your input shape. However, this only works when the input shape to the model does not change. If the input shape changes, the time cost will actually be worse¹.

Dataset and DataLoader for inference#

After some debugging, I found that data transformation may be the bottleneck. In the naive approach, the data processing for the images are done sequentially, sometime like this:

processed_imgs = [transform(im) for im in pil_imgs]

Actually we can use DataLoader from torch to accelerate the image processing speed. We need to define a Dataset and DataLoader for the inference.

class InferDataset(torch.utils.data.Dataset):
    def __init__(self, pil_imgs):
        super(InferDataset, self,).__init__()

        self.pil_imgs = pil_imgs
        self.transform = make_transform() # some infer transform

    def __len__(self):
        return len(self.pil_imgs)

    def __getitem__(self, idx):
        img = self.pil_imgs[idx]

        return self.transform(img)


infer_data = InferDataset(pil_imgs)
infer_loader = torch.utils.data.DataLoader(infer_data,
                                           batch_size=64,
                                           shuffle=False,
                                           num_workers=4,
                                           pin_memory=True)
with torch.no_grad():
    for data in infer_loader:
        data = data.cuda()
        output = model(data)
        # ... more processing

Use torch.cuda.synchronize() for correct benchmarking#

Note that the torch cuda operations are asynchronous, which will return without waiting to finish. To time a cuda operation correctly, we need to use torch.cuda.synchronize() to wait for the operation to finish. So the timing code should be like this:

torch.cuda.synchronize()
start = time.time()
# your cuda operations go here, for example, out = mode(input)
torch.cuda.synchronize()

end = time.time()
print(f"elapse: {end-start}")

Important parameters#

The parameters that impact the speed most are batch_size and num_workers.

If GPU memory permits, using a large batch size will be faster since we have fewer iterations to run. The exact value for batch size should be benchmarked on your system.

The parameter num_worker means the number of worker processes used for fetching data. When it is 0, only the main processes will be used, which will be slow. However, it does not mean more workers will definitely lead to faster processing speed. We need to benchmark and choose a suitable value. Generally, it should not exceed the number of CPU cores we have. For example, I found that setting num_workers to 1 works the fastest for me.

The parameter pin_memory=True will reduce the time cost for transferring data from your CPU to GPU (detail here), thus accelerating data processing. So in generally, it should be always used.

Conclusion#

With all these optimizations, I was able to reduce the batched image inference time from 2 seconds to about 100 ms.

References#

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

jdhao's digital space

Should we set cudnn.benchmark to True?#

Dataset and DataLoader for inference#

Use torch.cuda.synchronize() for correct benchmarking#

Important parameters#

Conclusion#

References#