






















This is my first attempt at running an eval of this nature so would love some methodology feedback.
I can't guarantee the sources weren't already in the model's inputs without getting novel translations from native speakers, but from my experience using the top models, they feel very accurate. Even encountering somewhat obscure texts from a relatively small language the translations generally beat Google Translate for proper idiomatic meaning.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。