
























Abstract:Inference from interaction maps, such as centromere identification from genome-wide chromosome conformation capture techniques -- notably Hi-C -- can be formulated as a generic inverse problem: infer a set of parameters given a map summarizing pairwise interactions between entities through blocks of variable numbers and sizes. In this work, we introduce a data-driven approach that leverages shared structure between these maps, such as global alignment between localized patterns, while handling the variability in number and size of entities arising in real-world data. Our approach relies on a transformer architecture capable of handling such variability and a custom simulator to generate abundant, yet computationally cheap synthetic data for training. Applied to the problem of centromere localization, the method accurately recovers their genomic positions across a wide range of species of various genome sizes.
| Subjects: | Machine Learning (cs.LG); Quantitative Methods (q-bio.QM) |
| Cite as: | arXiv:2605.21617 [cs.LG] |
| (or arXiv:2605.21617v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2605.21617 arXiv-issued DOI via DataCite (pending registration) |
From: Eloïse Touron [view email]
[v1]
Wed, 20 May 2026 18:28:43 UTC (10,152 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。