





















Abstract:Proteins encode diverse functions within complex three-dimensional structures, yet most deep learning representations remain highly entangled, obscuring the biophysical signals that underlie function. Here we introduce ProtDiS, a knowledge-guided framework that decomposes pretrained protein micro-environment embeddings into biologically grounded and task-relevant dimensions. Inspired by the information bottleneck principle, ProtDiS learns representations that balance informativeness and compression, yielding structural features that are more specific, independent, and information-efficient, and achieving consistent improvements across twelve downstream tasks, with the largest gains under structure-based splits. Protein- and residue-level analyses further show that ProtDiS differentiates proteins with similar folds but divergent functions and captures fine-grained biophysical signals critical. These findings suggest that knowledge-guided decomposition provides a general and interpretable approach for structuring latent spaces in protein structural modeling. The source code and implementation details are publicly available at this https URL.
| Comments: | 28 pages, 17 figures, icml 2026 regular |
| Subjects: | Biomolecules (q-bio.BM); Machine Learning (cs.LG) |
| Cite as: | arXiv:2605.23960 [q-bio.BM] |
| (or arXiv:2605.23960v1 [q-bio.BM] for this version) | |
| https://doi.org/10.48550/arXiv.2605.23960 arXiv-issued DOI via DataCite |
From: Mingqing Wang [view email]
[v1]
Tue, 12 May 2026 07:12:12 UTC (7,683 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。