



















user_data/mcp.json, using the same format as Claude Desktop and Cursor. [Tutorial]preserve_thinking chat template parameter: New UI checkbox and --preserve-thinking CLI flag to control whether thinking blocks from prior turns are kept in the context.--draft-min 48 by default for draftless speculative decoding.Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip/extract, and run.
Note
NVIDIA GPU: If nvidia-smi reports CUDA Version >= 13.1, use the cuda13.1 build. Otherwise, use cuda12.4.
ik_llama.cpp is a llama.cpp fork with new quant types. If unsure, use the llama.cpp column.
| GPU/Platform | llama.cpp | ik_llama.cpp |
|---|---|---|
| NVIDIA (CUDA 12.4) | Download (766 MB) | Download (1.1 GB) |
| NVIDIA (CUDA 13.1) | Download (686 MB) | Download (1.19 GB) |
| AMD/Intel (Vulkan) | Download (196 MB) | — |
| AMD (ROCm 7.2) | Download (499 MB) | — |
| CPU only | Download (178 MB) | Download (194 MB) |
| GPU/Platform | llama.cpp | ik_llama.cpp |
|---|---|---|
| NVIDIA (CUDA 12.4) | Download (747 MB) | Download (1.09 GB) |
| NVIDIA (CUDA 13.1) | Download (696 MB) | Download (1.21 GB) |
| AMD/Intel (Vulkan) | Download (208 MB) | — |
| AMD (ROCm 7.2) | Download (307 MB) | — |
| CPU only | Download (190 MB) | Download (217 MB) |
| Architecture | llama.cpp |
|---|---|
| Apple Silicon (arm64) | Download (156 MB) |
| Intel (x86_64) | Download (162 MB) |
user_data folder with the one in your existing install. All your settings and models will be moved.Starting with 4.0, you can also move user_data one folder up, next to the install folder. It will be detected automatically, making updates easier:
text-generation-webui-4.0/ text-generation-webui-4.1/ user_data/ <-- shared by both installs
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。