
























Wouldn't this prevent the nvidia GPU from being power gated since it's never "idle"? So like your battery life regresses? |
Good point, you might wanna disable it on battery, but I'm guessing a lot of people have their laptop plugged in practically 24x7. |
It's easy enough to 'offline' swap space on Linux normally so I suspect that would work fine, as long as you didn't instantly run out of RAM when doing so. |
It's called prerendering tabs or something :V At 4k it adds up, especially since if you look into DevTools Chrome actually renders more at least vertically if not across major scroll regions. |
Except swap is, like, opposite of RAM disk. That said, still an nice and fun concept. Though caching got better since I assume :) |
So can VRAM actually be used like regular RAM? E.g. if I have a 16GB module and my GPU has 16GB VRAM, could it be made so that my system reports 32GB RAM? What would be the implications of that? |
Isn't the problem with write back cache mode just due to the GPU being unable to invalidate cache lines in the CPU? |
I assume on Linux you could use something like daxctl to tell the kernel to treat the vRAM as normal RAM, but I think this would be Intel/AMD only. |
I don't think it would help. It's not just a software issue that can be fixed in the kernel, the hardware fundamentally isn't part of the cache coherency system of the CPU. |
It's true that Linux kernel is the throughput bottleneck. Unfortunately, the optimizations described above aren't sufficient to get within even 10% of hardware bandwidth. Even if the swap system overhead drops to just a data copy, the memory management layer prevents swap from scaling to higher bandwidths. The issue is not data movement; it is in the page unmapping step (which requires expensive TLB shootdowns). Larger kernel changes are required. My group wrote a paper on this: https://dl.acm.org/doi/10.1145/3731569.3764842 Linux's swap system is undergoing some large refactors lately. Hopefully some insights either from our work or Hermit (NSDI '23) can make it in to the mainline. I think Hermit's `rmap` optimization in particular is a candidate for upstream use. |
Swapping to a NVMe will also consume PE cycles on your NAND, ie wearing it out over time. RAM/VRAM don’t degrade from use. |
Given my dev machine has 32GB of RAM and 32GB of VRAM that sits mostly idle when I'm not running AI models, this is not that bad of an idea. |
What's this trade off about? I thought it was a simple 2 dims are probably better than 4, but unsure how you'd ever land on 48? |
Not that simple. 4 dimms were getting higher clocks on 2 CCD Ryzen models (12 & 16 cores) compared to those with one CCD. Motherboard topology is a factor too. |
I’ve got 64GB with a 3950x working great, although the speeds are not high. Just 3200MHz, IIRC. |
Exactly, that's the tradeoff. I have one consumer machine running 192 GB but the latency and bandwidth is terrible compared to when it runs 48 GB. |
It's fine for dense models where you need them in VRAM, less so for MoE where you're offloading layers to ram. But 32/32 is pretty good for both in the popular ~30b range right now. |
Yeah, I used to map my 8 megabytes of video memory through the mtd back in the day, it helped build those .. you know .. X11 drivers .. ;) Man, that brings back memories. |
Have you heard of the "Radeon Pro SSG" ?? It must have failed because I never heard of an update to this GPU. But AMD definitely made a GPU with 4x NVMe SSDs attached to the GPU. |
>Sequential throughput: ~1.3 GB/s sounds VERY low, also, wouldn't random read/write speed be MUCH more relevant here? |
I'm more interested in the opposite. Nvidia linux drivers crash when you try to address more VRAM than you have. It'd be nice if they didn't. |
They already do that on windows and it kinda sucks. If you are targeting something like LMStudio or ComfyUI, both of those have superior methods to do exactly this. |
https://news.ycombinator.com/item?id=40697318 This HN comment and the linked post brought up a lot of good points. The main takeaway is that swap should primarily be considered a mechanism for equality of reclamation, not for emergency extra memory, where equality of reclamation means file-backed pages and anonymous pages are subject to similar criteria for being evicted from physical memory. I used to have zero swap on my Linux desktop and this convinced me to add at least a small swap partition. |
That’s like the complete opposite advice. Chris said the lowest recommended swappiness is 1. I have it set to 100. |
>S4 suspend Is not popular in general, so yes. But also no - I don't use swap ever, if I have to go over the RAM (32GB being low, with 64GB the norm), might as well consider the system dead. |
For me opening huge datasets, e.g. many gigabytes worth of profiling data, combined with other stuff running on the system, can end up pushing things to swap. |
RAM disks have always fascinated me. In a different timeline every PC has a 100gb of RAM and 50TB HDDs are the norm. |
Finally a use for the expensive ram when it's not needed in workloads! Now if it could be dynamically used and vacated on other GPU workloads? |
The catch is volatility: one CUDA process reclaims the VRAM, and your swap just evaporates. |
I think you can definitely improve the throughput/iops by using BAR vs treating it like a file store/mount through cuda which adds a lot of overhead. |
The general principle is that what is involved in paging should not be paged itself. Wiring the memory of that whole daemon is then a trivial solution to the problem. |
Nice idea, but I'm sure a ton of things can go wrong with it. It needs extensive edge case handling in order to be usable widely. |
I mean, you prompted something useful out of an AI, good job. But then use that to ask for donation? Feels weird, man. |
Wouldn't it be faster to swap to vram if you are sitting there with 8gigs of it unused than swapping to ssd and burning its write cycles, assuming you absolutely need swap |
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。