I assume your PBS datastores resides on a filesystem offered on an iscsi device. Is this filesystem shared with other datastores or other, unrelated data? Note that by default PBS performs an fssync at the end of a backup, to assure data is persisted to disk. Might be the case that this takes a lot on your filesystem, and the PVE client side runs into a timeout?
I'll let it go to see if the errors persist.
Unfortunately, since they're very random, it could take weeks to see if it's resolved.
make sure to restart the PBS services as well, there is currently a bug which prevents the sync level from taking immediate effect. also, if acceptable performance wise i would recommend to rather use file instead of none. To be on the safe side.
Please share once again the backup task log from PVE and the corresponding one from PBS. And share the output of pveversion -v from the node the VM is running on.
Please help, failed backups are continuing.
I don't know where to go to investigate anymore to find the problem.
PBS is a really overkill server (a PowerEdge R740xd2 with two Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz and 128GB RAM), as stated in the first post.
Furthermore, the monitor is not even heavily used.
This is a graph of CPU usage during backup failures (influxDB + Grafana).
Please tell me where I can find other useful information to get help.
Looking at the logs I think the timeout is on the PVE side, not PBS.
If you create local backups, do the timeouts also occur?
If CPU load on PVE is low, what is the IO / Memory Pressure when the timeouts occur?
Looking at the logs I think the timeout is on the PVE side, not PBS.
If you create local backups, do the timeouts also occur?
If CPU load on PVE is low, what is the IO / Memory Pressure when the timeouts occur?
Very low too, I would even say insignificant.
This is a graph of CPU usage of a PVE during backup failures (influxDB + Grafana).
If I open discussions here in the forum it is because the company where I work, unfortunately, has no intention of purchasing support.
I will continue to push for it but now I have to do without it.
Anyway, with "where I can find other useful information" I meant inside the PBS or PVE systems, logs or something like that.
I see no reason that could justify these failures.