























Abstract:Multimodal $\Delta\Delta G$ predictors integrating protein language models with inverse-folding representations achieve strong in-distribution accuracy on the Megascale dataset but exhibit limited robustness on out-of-distribution (OOD) proteins, persistent forward-reverse bias on paired-mutation benchmarks, and under-representation of rare stabilizing mutations. Existing approaches address these limitations primarily through additional architectural components, leaving optimization-level intervention comparatively underexplored. We introduce a constraint-aware optimization framework combining Balanced Mean Squared Error, a Siamese anti-symmetric regularizer, and a novel OOD-margin consistency loss on the per-position feature representation, requiring no architectural changes to the SPURS backbone. Across eleven benchmarks and three random seeds, the framework improves Spearman correlation on S669 from 0.486 to 0.540 ($\sigma=0.002$ across seeds), matching the published SPURS baseline (0.50) without architectural modification, and on S461 from 0.653 to 0.711, with consistent smaller gains on five additional OOD datasets. A controlled diagnostic on Ssym reveals that anti-symmetric training does not eliminate systematic forward-reverse bias, indicating that gains arise through implicit regularization rather than exact thermodynamic constraint enforcement.
From: A Shiv Ram [view email]
[v1]
Sat, 6 Jun 2026 11:04:52 UTC (3,798 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。