深度学习实验记录

推荐订阅源

Engineering at Meta

博

博客园_首页

Help Net Security

WordPress大学

让小产品的独立变现更简单 - ezindie.com

freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More

Vulnerabilities – Threatpost

Netflix TechBlog - Medium

Palo Alto Networks Blog

CTFtime.org: upcoming CTF events

Vercel News

Threat Intelligence Blog | Flashpoint

Kaspersky official blog

MIT News - Artificial intelligence

Schneier on Security

Threat Research - Cisco Blogs

IT之家

Google Developers Blog

Google DeepMind News

Full Disclosure

博客园 - 浅蓝

Anaconda docker c++ Ubuntu18.04安装Tensorflow1.14GPU matplotlib中color可用的颜色 TensorFlow升级到1.13 配置VPN - 浅蓝 Ubuntu16.04 安装Tensorflow1.7过程记录二：安装CUDA及Tensorflow tensorflow 源码编译 Ubuntu16.04 安装Tensorflow1.7过程记录一：安装显卡驱动 ubuntu16.04安装tensorflow1.3 深度学习开源代码链接如何高效的学习 TensorFlow 代码? 谷歌发布了 T2T（Tensor2Tensor）深度学习开源系统学习Tensorflow的LSTM的RNN例子 tensorflow nan TensorFlow数据读取 TensorFlow笔记之常见七个参数 tf-slim-mnist

深度学习实验记录

浅蓝 · 2017-10-03 · via 博客园 - 浅蓝

# **IMPORTANT**
# Please note that this learning rate schedule is heavily dependent on the
# hardware architecture, batch size and any changes to the model architecture
# specification. Selecting a finely tuned learning rate schedule is an
# empirical process that requires some experimentation. Please see README.md
# more guidance and discussion.
#
# With 8 Tesla K40's and a batch size = 256, the following setup achieves
# precision@1 = 73.5% after 100 hours and 100K steps (20 epochs).
# Learning rate decay factor selected from http://arxiv.org/abs/1404.5997.

打开TensorBoard: tensorboard --logdir=/tmp/imagenet_train

imagenet 训练数据1000k,Inception v3 network在1060上训练batch_size=32，32 examples/sec，

20小时跑了70k step后共训练数据32*70k=2100k,2 epochs的训练数据，loss从13降到8，并且降低的趋势走*了。

55小时跑了204k step后共训练数据32*204k=6400k,6 epochs的训练数据，loss从13降到7，从120k开始趋势接**了。

4天1小时（97H）跑了360k step后共训练数据32*360k=10000k, 10 epochs的训练数据，loss还是7左右, loss从120k开始趋势接**了。

Eval: precision @ 1 = 0.5584 recall @ 5 = 0.8052 [50016 examples]

类似的问题：https://stackoverflow.com/questions/38259166/training-tensorflow-inception-v3-imagenet-on-modest-hardware-setup 他也没达到最优：

2016-06-06 12:07:52.245005: precision @ 1 = 0.5767 recall @ 5 = 0.8143 [50016 examples]
2016-06-09 22:35:10.118852: precision @ 1 = 0.5957 recall @ 5 = 0.8294 [50016 examples]
2016-06-14 15:30:59.532629: precision @ 1 = 0.6112 recall @ 5 = 0.8396 [50016 examples]
2016-06-20 13:57:14.025797: precision @ 1 = 0.6136 recall @ 5 = 0.8423 [50016 examples]

On a small hardware set up like yours, it will be difficult to achieve maximum performance. Generally speaking for CNN's, the best performance is with the largest batch sizes possible. This means that for CNN's the training procedure is often limited by the maximum batch size that can fit in GPU memory.

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。