





















This tokenizer/transformers issue has been going on for years now. This was built 2026-04-28 19:28 (UTC) and python-tokenizers at 2026-04-28 21:24 (UTC). So we basically had 2 hours where the install could work, and anyone updating after that date is unable to update for 4 days now. I get that everyone is busy and this is free work, and I appreciate all that, but this is not normal. There should be some coordination between those packages. It's a known issue that pops-up every few months and not normal.
I have met maximun characters limit, so you'd better read these two comment
Edit PKGBUILD
source=(
"python-transformers-$pkgver.tar.gz"::"https://github.com/huggingface/transformers/archive/refs/tags/v$pkgver.tar.gz"
"remove-tokenizers-upper-bound.patch"
"rename-arguments.patch"
)
sha256sums=('39c29ea1a0533c8667106cb005064c64ab2fcd95fb91ccb95922a032da1de395' 'SKIP' 'SKIP')
add these to rename-arguments.patch
diff --git a/src/transformers/convert_slow_tokenizer.py b/src/transformers/convert_slow_tokenizer.py
index 1d96d1c..43934ca 100644
--- a/src/transformers/convert_slow_tokenizer.py
+++ b/src/transformers/convert_slow_tokenizer.py
@@ -482,7 +482,7 @@ class HerbertConverter(Converter):
tokenizer.decoder = decoders.BPEDecoder(suffix=token_suffix)
tokenizer.post_processor = processors.BertProcessing(
sep=(self.original_tokenizer.sep_token, self.original_tokenizer.sep_token_id),
- cls=(self.original_tokenizer.cls_token, self.original_tokenizer.cls_token_id),
+ cls_token=(self.original_tokenizer.cls_token, self.original_tokenizer.cls_token_id),
)
return tokenizer
@@ -553,7 +553,7 @@ class RobertaConverter(Converter):
tokenizer.decoder = decoders.ByteLevel()
tokenizer.post_processor = processors.RobertaProcessing(
sep=(ot.sep_token, ot.sep_token_id),
- cls=(ot.cls_token, ot.cls_token_id),
+ cls_token=(ot.cls_token, ot.cls_token_id),
add_prefix_space=ot.add_prefix_space,
trim_offsets=True, # True by default on Roberta (historical)
)
@@ -1455,7 +1455,7 @@ class CLIPConverter(Converter):
# Hack to have a ByteLevel and TemplateProcessor
tokenizer.post_processor = processors.RobertaProcessing(
sep=(self.original_tokenizer.eos_token, self.original_tokenizer.eos_token_id),
- cls=(self.original_tokenizer.bos_token, self.original_tokenizer.bos_token_id),
+ cls_token=(self.original_tokenizer.bos_token, self.original_tokenizer.bos_token_id),
add_prefix_space=False,
trim_offsets=False,
)
diff --git a/src/transformers/models/clip/tokenization_clip.py b/src/transformers/models/clip/tokenization_clip.py
index 018c630..739bc22 100644
--- a/src/transformers/models/clip/tokenization_clip.py
+++ b/src/transformers/models/clip/tokenization_clip.py
@@ -116,7 +116,7 @@ class CLIPTokenizer(TokenizersBackend):
self._tokenizer.post_processor = processors.RobertaProcessing(
sep=(str(eos_token), self.eos_token_id),
- cls=(str(bos_token), self.bos_token_id),
+ cls_token=(str(bos_token), self.bos_token_id),
add_prefix_space=False,
trim_offsets=False,
)
diff --git a/src/transformers/models/herbert/tokenization_herbert.py b/src/transformers/models/herbert/tokenization_herbert.py
index eb05431..2e5bfa2 100644
--- a/src/transformers/models/herbert/tokenization_herbert.py
+++ b/src/transformers/models/herbert/tokenization_herbert.py
@@ -104,7 +104,7 @@ class HerbertTokenizer(TokenizersBackend):
self._tokenizer.post_processor = processors.BertProcessing(
sep=(self.sep_token, 2),
- cls=(self.cls_token, 0),
+ cls_token=(self.cls_token, 0),
)
diff --git a/src/transformers/models/layoutlmv3/tokenization_layoutlmv3.py b/src/transformers/models/layoutlmv3/tokenization_layoutlmv3.py
index cda7c0b..652db6c 100644
--- a/src/transformers/models/layoutlmv3/tokenization_layoutlmv3.py
+++ b/src/transformers/models/layoutlmv3/tokenization_layoutlmv3.py
@@ -227,7 +227,7 @@ class LayoutLMv3Tokenizer(TokenizersBackend):
self._tokenizer.post_processor = processors.RobertaProcessing(
sep=(sep, sep_token_id),
- cls=(cls, cls_token_id),
+ cls_token=(cls, cls_token_id),
add_prefix_space=add_prefix_space,
trim_offsets=True,
)
diff --git a/src/transformers/models/roberta/tokenization_roberta.py b/src/transformers/models/roberta/tokenization_roberta.py
index 40b4e78..ccd699f 100644
--- a/src/transformers/models/roberta/tokenization_roberta.py
+++ b/src/transformers/models/roberta/tokenization_roberta.py
@@ -169,7 +169,7 @@ class RobertaTokenizer(TokenizersBackend):
)
self._tokenizer.post_processor = processors.RobertaProcessing(
sep=(str(sep_token), self.sep_token_id),
- cls=(str(cls_token), self.cls_token_id),
+ cls_token=(str(cls_token), self.cls_token_id),
add_prefix_space=add_prefix_space,
trim_offsets=trim_offsets,
)
I'm still waiting for their update. They said they will support tokenizer 0.23.1 in the next version.
Add these line to PKGBUILD
prepare() {
cd "transformers-$pkgver"
patch -Np1 -i "${srcdir}/remove-tokenizers-upper-bound.patch"
patch -Np1 -i "${srcdir}/rename-arguments.patch"
}
and these to remove-tokenizers-upper-bound.patch
diff --git a/src/transformers/dependency_versions_table.py b/src/transformers/dependency_versions_table.py
index 1a721ca..f4ca49d 100644
--- a/src/transformers/dependency_versions_table.py
+++ b/src/transformers/dependency_versions_table.py
@@ -74,7 +74,7 @@ deps = {
"tomli": "tomli",
"tiktoken": "tiktoken",
"timm": "timm>=1.0.23",
- "tokenizers": "tokenizers>=0.22.0,<=0.23.0",
+ "tokenizers": "tokenizers>=0.22.0",
"torch": "torch>=2.4",
"torchaudio": "torchaudio",
"torchvision": "torchvision",
It just removes the check. I'm not sure if 0.23.1 is compatible with transformers.
Came here to say the same thing as @malium. Not sure how to fix this...?
It seems that python-tokenizers got bump up to 0.23.1 but python-transformers is not compatible with that version:
Traceback (most recent call last):
File "<string>", line 1, in <module>
import transformers
File "/home/user/.cache/yay/python-transformers/src/transformers-5.7.0/src/transformers/__init__.py", line 30, in <module>
from . import dependency_versions_check
File "/home/user/.cache/yay/python-transformers/src/transformers-5.7.0/src/transformers/dependency_versions_check.py", line 56, in <module>
require_version_core(deps[pkg])
~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^
File "/home/user/.cache/yay/python-transformers/src/transformers-5.7.0/src/transformers/utils/versions.py", line 116, in require_version_core
return require_version(requirement, hint)
File "/home/user/.cache/yay/python-transformers/src/transformers-5.7.0/src/transformers/utils/versions.py", line 110, in require_version
_compare_versions(op, got_ver, want_ver, requirement, pkg, hint)
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.cache/yay/python-transformers/src/transformers-5.7.0/src/transformers/utils/versions.py", line 43, in _compare_versions
raise ImportError(
f"{requirement} is required for a normal functioning of this module, but found {pkg}=={got_ver}.{hint}"
)
ImportError: tokenizers>=0.22.0,<=0.23.0 is required for a normal functioning of this module, but found tokenizers==0.23.1.
Try: `pip install transformers -U` or `pip install -e '.[dev]'` if you're working with git main
@shayaknyc I have bumped pkgrel of python-safetensors in order to trigger python-safetensors rebuild (or just notify that the package should be rebuilt).
@lightdot - yes, you are correct. I manually had to recompile python-safetensors, and then this just worked. Thank you!
@shayaknyc, I suspect that python-safetensors package in your build environment wasn't rebuilt after python was updated to 3.14. This should be done manually, package version didn't change.
Anyone else getting these build errors?
...
...
adding 'transformers/utils/versions.py'
adding 'transformers-5.0.0.dist-info/licenses/LICENSE'
adding 'transformers-5.0.0.dist-info/METADATA'
adding 'transformers-5.0.0.dist-info/WHEEL'
adding 'transformers-5.0.0.dist-info/entry_points.txt'
adding 'transformers-5.0.0.dist-info/top_level.txt'
adding 'transformers-5.0.0.dist-info/RECORD'
removing build/bdist.linux-x86_64/wheel
Successfully built transformers-5.0.0-py3-none-any.whl
==> Starting check()...
Traceback (most recent call last):
File "<string>", line 1, in <module>
import transformers
File "/home/user/git/python-transformers/src/transformers-5.0.0/src/transformers/__init__.py", line 30, in <module>
from . import dependency_versions_check
File "/home/user/git/python-transformers/src/transformers-5.0.0/src/transformers/dependency_versions_check.py", line 16, in <module>
from .utils.versions import require_version, require_version_core
File "/home/user/git/python-transformers/src/transformers-5.0.0/src/transformers/utils/__init__.py", line 22, in <module>
from .auto_docstring import (
...<10 lines>...
)
File "/home/user/git/python-transformers/src/transformers-5.0.0/src/transformers/utils/auto_docstring.py", line 30, in <module>
from .generic import ModelOutput
File "/home/user/git/python-transformers/src/transformers-5.0.0/src/transformers/utils/generic.py", line 47, in <module>
from ..model_debugging_utils import model_addition_debugger_context
File "/home/user/git/python-transformers/src/transformers-5.0.0/src/transformers/model_debugging_utils.py", line 29, in <module>
from safetensors.torch import save_file
ModuleNotFoundError: No module named 'safetensors'
==> ERROR: A failure occurred in check().
Aborting...
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。
Thanks @lalala_233 that works. But it's crazy that minor version bump (0.23.0 => 0.23.1) is so incompatible...