





















Abstract:Identifying the scientific source behind a social media claim requires matching short, informal, and often multilingual claims against large collections of scientific publications, where semantically related papers may act as challenging distractors or false negatives during training. We present our submission to CheckThat! 2026 Task 1 on multilingual scientific-source retrieval, focusing on how hard-negative mining should be adapted to multi-stage retrieval pipelines for scientific-source retrieval. We propose cluster-aware hard-negative mining strategies that exploit the semantic structure of retrieved candidate pools in order to construct more informative training negatives for dense retrieval and reranking. Our experiments show that different hard-negative structures induce different retrieval behaviors. Localized cluster negatives tend to favor precision-oriented retrieval, whereas broader non-gold semantic negatives provide stronger candidate coverage and more consistent reranking performance across languages. We further study multiple LLM-based evidence-selection formulations, including direct classification, pairwise comparison, and listwise reranking prompts, and find that constrained classification prompts provide the most reliable final document selection. The final system combines a dense retriever, a multilingual cross-encoder reranker, and a selective LLM-based disagreement resolver, ranking 6th among 37 submissions in the shared task evaluation. Overall, our results suggest that hard-negative mining should be treated as a stage-aware design problem rather than as a single retrieval optimization strategy.
| Comments: | Technical report for CLEF 2026 CheckThat! Task 1 shared task submission. 13 pages, 14 tables |
| Subjects: | Information Retrieval (cs.IR) |
| MSC classes: | 68T50 |
| ACM classes: | H.3.3; I.2.7 |
| Cite as: | arXiv:2605.24236 [cs.IR] |
| (or arXiv:2605.24236v1 [cs.IR] for this version) | |
| https://doi.org/10.48550/arXiv.2605.24236 arXiv-issued DOI via DataCite (pending registration) |
From: Juli Bakagianni [view email]
[v1]
Fri, 22 May 2026 21:24:14 UTC (19 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。