深度学习实验记录

推荐订阅源

Forbes - Security

The Blog of Author Tim Ferriss

Recent Announcements

Cyber Security Advisories - MS-ISAC

The Register - Security

Hackread – Cybersecurity News, Data Breaches, AI and More

Threat Intelligence Blog | Flashpoint

雷峰网

Palo Alto Networks Blog

GRAHAM CLULEY

Cloudbric

CTFtime.org: upcoming CTF events

MongoDB | Blog

Full Disclosure

Google DeepMind News

Recent Commits to openclaw:main

Check Point Blog

爱范儿

The GitHub Blog

cs.AI updates on arXiv.org

WeLiveSecurity

Threat Research - Cisco Blogs

Unit 42

Netflix TechBlog - Medium

Heimdal Security Blog

TaoSecurity Blog

Cybersecurity and Infrastructure Security Agency CISA

Tenable Blog

Blog

Securelist

Hacker News: Front Page

Google Online Security Blog

Google Developers Blog

博客园 - 浅蓝

Anaconda docker c++ Ubuntu18.04安装Tensorflow1.14GPU matplotlib中color可用的颜色 TensorFlow升级到1.13 配置VPN - 浅蓝 Ubuntu16.04 安装Tensorflow1.7过程记录二：安装CUDA及Tensorflow tensorflow 源码编译 Ubuntu16.04 安装Tensorflow1.7过程记录一：安装显卡驱动 ubuntu16.04安装tensorflow1.3 深度学习开源代码链接如何高效的学习 TensorFlow 代码? 谷歌发布了 T2T（Tensor2Tensor）深度学习开源系统学习Tensorflow的LSTM的RNN例子 tensorflow nan TensorFlow数据读取 TensorFlow笔记之常见七个参数 tf-slim-mnist

深度学习实验记录

浅蓝 · 2017-10-03 · via 博客园 - 浅蓝

# **IMPORTANT**
# Please note that this learning rate schedule is heavily dependent on the
# hardware architecture, batch size and any changes to the model architecture
# specification. Selecting a finely tuned learning rate schedule is an
# empirical process that requires some experimentation. Please see README.md
# more guidance and discussion.
#
# With 8 Tesla K40's and a batch size = 256, the following setup achieves
# precision@1 = 73.5% after 100 hours and 100K steps (20 epochs).
# Learning rate decay factor selected from http://arxiv.org/abs/1404.5997.

打开TensorBoard: tensorboard --logdir=/tmp/imagenet_train

imagenet 训练数据1000k,Inception v3 network在1060上训练batch_size=32，32 examples/sec，

20小时跑了70k step后共训练数据32*70k=2100k,2 epochs的训练数据，loss从13降到8，并且降低的趋势走*了。

55小时跑了204k step后共训练数据32*204k=6400k,6 epochs的训练数据，loss从13降到7，从120k开始趋势接**了。

4天1小时（97H）跑了360k step后共训练数据32*360k=10000k, 10 epochs的训练数据，loss还是7左右, loss从120k开始趋势接**了。

Eval: precision @ 1 = 0.5584 recall @ 5 = 0.8052 [50016 examples]

类似的问题：https://stackoverflow.com/questions/38259166/training-tensorflow-inception-v3-imagenet-on-modest-hardware-setup 他也没达到最优：

2016-06-06 12:07:52.245005: precision @ 1 = 0.5767 recall @ 5 = 0.8143 [50016 examples]
2016-06-09 22:35:10.118852: precision @ 1 = 0.5957 recall @ 5 = 0.8294 [50016 examples]
2016-06-14 15:30:59.532629: precision @ 1 = 0.6112 recall @ 5 = 0.8396 [50016 examples]
2016-06-20 13:57:14.025797: precision @ 1 = 0.6136 recall @ 5 = 0.8423 [50016 examples]

On a small hardware set up like yours, it will be difficult to achieve maximum performance. Generally speaking for CNN's, the best performance is with the largest batch sizes possible. This means that for CNN's the training procedure is often limited by the maximum batch size that can fit in GPU memory.

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。