惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Recorded Future
Recorded Future
Microsoft Security Blog
Microsoft Security Blog
Recent Commits to openclaw:main
Recent Commits to openclaw:main
The Register - Security
The Register - Security
The GitHub Blog
The GitHub Blog
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
人人都是产品经理
人人都是产品经理
量子位
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
有赞技术团队
有赞技术团队
Stack Overflow Blog
Stack Overflow Blog
H
Help Net Security
Apple Machine Learning Research
Apple Machine Learning Research
The Cloudflare Blog
B
Blog RSS Feed
小众软件
小众软件
博客园 - 叶小钗
H
Hackread – Cybersecurity News, Data Breaches, AI and More
博客园 - 聂微东
博客园_首页
B
Blog
雷峰网
雷峰网
S
SegmentFault 最新的问题
N
Netflix TechBlog - Medium
D
Docker
博客园 - 司徒正美
博客园 - 【当耐特】
大猫的无限游戏
大猫的无限游戏
博客园 - Franky
MongoDB | Blog
MongoDB | Blog
U
Unit 42
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
腾讯CDC
F
Fortinet All Blogs
aimingoo的专栏
aimingoo的专栏
Martin Fowler
Martin Fowler
Jina AI
Jina AI
WordPress大学
WordPress大学
D
DataBreaches.Net
V
V2EX
V
Visual Studio Blog
Know Your Adversary
Know Your Adversary
P
Privacy & Cybersecurity Law Blog
F
Full Disclosure
G
Google Developers Blog
Engineering at Meta
Engineering at Meta
The Hacker News
The Hacker News
Security Archives - TechRepublic
Security Archives - TechRepublic
IT之家
IT之家
P
Privacy International News Feed

博客园 - 浅蓝

Anaconda docker c++ Ubuntu18.04安装Tensorflow1.14GPU matplotlib中color可用的颜色 TensorFlow升级到1.13 配置VPN - 浅蓝 Ubuntu16.04 安装Tensorflow1.7过程记录二:安装CUDA及Tensorflow tensorflow 源码编译 Ubuntu16.04 安装Tensorflow1.7过程记录一:安装显卡驱动 深度学习实验记录 ubuntu16.04安装tensorflow1.3 深度学习开源代码链接 如何高效的学习 TensorFlow 代码? 谷歌发布了 T2T(Tensor2Tensor)深度学习开源系统 学习Tensorflow的LSTM的RNN例子 TensorFlow数据读取 TensorFlow笔记之常见七个参数 tf-slim-mnist
tensorflow nan
浅蓝 · 2017-06-12 · via 博客园 - 浅蓝

https://github.com/tensorflow/tensorflow/issues/3212

NaNs usually indicate something wrong with your training. Perhaps your learning rate is too high, perhaps you have invalid data. Maybe you have an invalid operation like a divide by zero. Tensorflow refusing to write any NaNs is giving you a warning that something has gone wrong with your training.

If you  still suspect there is an underlying bug, you need to provide us a reproducible test case (as small as possible), plus information about what environment (please see the issue submission template).

https://stackoverflow.com/questions/33712178/tensorflow-nan-bug?newreg=c7e31a867765444280ba3ca50b657a07

Actually, it turned out to be something stupid. I'm posting this in case anyone else would run into a similar error.

cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))

is actually a horrible way of computing the cross-entropy. In some samples, certain classes could be excluded with certainty after a while, resulting in y_conv=0 for that sample. That's normally not a problem since you're not interested in those, but in the way cross_entropy is written there, it yields 0*log(0) for that particular sample/class. Hence the NaN.

Replacing it with

cross_entropy = -tf.reduce_sum(y_*tf.log(tf.clip_by_value(y_conv,1e-10,1.0)))

https://stackoverflow.com/questions/33922937/why-does-tensorflow-return-nan-nan-instead-of-probabilities-from-a-csv-file

Try throwing in a few of these.  Instead of this line:

tf_softmax = tf.nn.softmax(tf.matmul(tf_in,tf_weight) + tf_bias)

Try:

tf_bias = tf.Print(tf_bias, [tf_bias], "Bias: ")
tf_weight = tf.Print(tf_weight, [tf_weight], "Weight: ")
tf_in = tf.Print(tf_in, [tf_in], "TF_in: ")
matmul_result = tf.matmul(tf_in, tf_weight)
matmul_result = tf.Print(matmul_result, [matmul_result], "Matmul: ")
tf_softmax = tf.nn.softmax(matmul_result + tf_bias)

to see what Tensorflow thinks the intermediate values are.  If the NaNs are showing up earlier in the pipeline, it should give you a better idea of where the problem lies.  Good luck!  If you get some data out of this, feel free to follow up and we'll see if we can get you further.

Updated to add:  Here's a stripped-down debugging version to try, where I got rid of the input functions and just generated some random data:

https://stackoverflow.com/questions/38810424/how-does-one-debug-nan-values-in-tensorflow

There are a couple of reasons WHY you can get a NaN-result, often it is because of too high a learning rate but plenty other reasons are possible like for example corrupt data in your input-queue or a log of 0 calculation.

Anyhow, debugging with a print as you describe cannot be done by a simple print (as this would result only in the printing of the tensor-information inside the graph and not print any actual values).

However, if you use tf.print as an op in bulding the graph (tf.print) then when the graph gets executed you will get the actual values printed (and it IS a good exercise to watch these values to debug and understand the behavior of your net).

However, you are using the print-statement not entirely in the correct manner. This is an op, so you need to pass it a tensor and request a result-tensor that you need to work with later on in the executing graph. Otherwise the op is not going to be executed and no printing occurs. Try this:

Z = tf.sqrt(Delta_tilde)
Z = tf.Print(Z,[Z], message="my Z-values:") # <-------- TF PRINT STATMENT
Z = Transform(Z) # potentially some transform, currently I have it to return Z for debugging (the identity)
Z = tf.pow(Z, 2.0)