



















Abstract:This project report presents a hybrid AI-assisted workflow for extracting and reintegrating archival metadata from League of Nations index cards. The project is situated in the broader context of the Total Digital Access to the League of Nations Archives project (LONTAD). Rather than attempting full OCR of the underlying archival collections, the workflow targets the index cards themselves as documentary access points to files, series, archival descriptions, and digital objects. The project evolved from a layout-aware pipeline combining YOLO, TrOCR, and local LLM post-correction to a hybrid architecture using a fine-tuned vision-language model for broad extraction while retaining specialized OCR for file and series identifiers.
From: Florian CAFIERO [view email] [via CCSD proxy]
[v1]
Thu, 4 Jun 2026 10:03:41 UTC (1,572 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。