此篇乃吾所建微服务之续。欲观系列前文,可检吾前篇。此。
言简意赅吾设一每周之管道,自新闻之出处,抽其新至十篇之文,择其能立可证伪之论,补其阙漏之要旨,觅二独立之证,启待纳之请示。
claims/與proofs/之檔。凡此諸事,皆運行於數段Python腳本、二項自製OpenRouter技能,及二項GitHub Actions之間。
吾前文已述,如何結合Firecrawl、OpenRouter API及Github Action工作流程,自動化攝取頂級新聞來源。今文將復施此法,以處理新聞來源之聲明及其證據。
初遇之难,在于寻法以取新闻源所发近文。幸得之。NewsData.IO(NewsData.io)此供API以索搜、采集及追踪寰宇新闻。NewsData.io之免费层级赐予吾每日200 API积分,足敷一周遍寻12源(NewsData.io)。目下姑且如是淡然自适
1️⃣ 汲取最新文章 —— newsdata_io.py
首事乃为于NewsData.io latest之端点,作一薄裹。其引十之新文,于所定域中。
# Free‑tier‑friendly query params
CATEGORY = "environment,technology,world"
LANGUAGE = "en"
REMOVE_DUPLICATE = "1"
SIZE = "10"
DATATYPE = "news,research,analysis,pressRelease"
NEWSDATA_API_BASE_URL = os.getenv(
"NEWSDATA_API_BASE_URL", "https://newsdata.io/api/1"
)
NEWSDATA_API_KEY = os.environ["NEWSDATA_API_KEY"]
def get_claims(src_domain_url: str):
"""
Call the NewsData.io `/latest` endpoint for a specific domain.
Returns a list of article dicts (or None on error).
"""
endpoint = f"{NEWSDATA_API_BASE_URL}/latest"
params = {
"category": CATEGORY,
"language": LANGUAGE,
"removeduplicate": REMOVE_DUPLICATE,
"size": SIZE,
"datatype": DATATYPE,
"apikey": NEWSDATA_API_KEY,
"domainurl": src_domain_url,
}
response = requests.get(endpoint, params=params, timeout=10)
if response.status_code != 200:
print(
f"Error: couldn't fetch claims for {src_domain_url}: {response.status_code}"
)
# Show suggestions if the API knows a better domain
resp_body = response.json()
if (
resp_body.get("results") is not None
and resp_body.get("results")[0].get("suggestion") is not None
):
print(
f"Suggested domain url(s) for {src_domain_url}: {resp_body['results'][0]['suggestion']}"
)
return None
return response.json()["results"]
何以为要:免费套餐每日仅赐200积分,故吾辈求索轻简:一域之限,十果之取,类目窄收。DATATYPE之滤,意存广纳;后时则删非可证者。
2️⃣ 滤求可证之断——断证之能
吾今之计分,唯伪证可辨者方计之。故于 NewsData API 所返之伪证,吾独择其最合 可辨伪之伪证 准者而存之.
吾以大言(LLMs)为之分类,详言之,吾将 NewsData API 所返之对象数组,转送于 OpenRouter API。
[{
"article_id": "40305aa160787297dd3f9cc15faa8637",
"link": "https://www.theguardian.com/us-news/2026/may/22/kansas-bird-nest-truck",
"title": "Federally protected bird’s nest holds up sale of Ford truck in Kansas",
"description": "A robin built a nest on a Ford-F-250’s tire and laid its eggs in it; a law prohibits removing it while inhabited by bird brood A truck sold by a Kansas dealership cannot be taken from the lot by its new owner because a family of robins is living atop one of the vehicle’s tires. The relatively novel situation has gained widespread attention after the dealership in the Kansas community of Olathe wrote about it on its Facebook page – and it perhaps taught many that active robin nests are protected by federal law from the US. Continue reading...",
"keywords": [
"birds",
"kansas",
"wildlife",
"ford",
"animals",
"us news",
"law (us)"
],
"creator": [
"josé olivares"
],
"language": "english",
"country": [
"united states of america"
],
"category": [
"top",
"environment"
],
"datatype": "news",
"pubDate": "2026-05-22 19:03:07",
"pubDateTZ": "UTC",
"fetched_at": "2026-05-22 19:32:47",
"image_url": "https://i.guim.co.uk/img/media/c9e972eb2d494c4a9c713a7b5550f0fa9efcae1f/0_503_1536_1229/master/1536.jpg?width=140&quality=85&auto=format&fit=max&s=ad95c6dcdf71df9bc3461b683effe424",
"video_url": null,
"source_id": "theguardian",
"source_name": "The Guardian",
"source_priority": 106,
"source_url": "https://www.theguardian.com",
"source_icon": "https://n.bytvi.com/theguardian.jpg",
"duplicate": false
}]
为助分类,吾撰一AI代理技能。其令大语言模型:
- 遍访各文章URL,以网络搜索之器。
- 察其文是否载有可客观证其为真或为伪之主张。
- 还原文仅一 JSON 对象,合乎其则。
CLAIM_FILTER_PROMPT = (
"Use web search tool to visit the link for each article, access the content and then assess if it is a falsifiable claim."
"Out of these 10 articles, only return 1 article that best fits the falsifiable claim criterion."
"Prefer claims that have been made by the news source directly."
"Keep the json structure of the claims the same as the original schema in the input. Do not add, remove, or modify any key or value."
"Only output the plain json array string that I can safely unmarshal."
"Do not format the string. Do not output anything else."
)
req_content = (
"Following is a list of 10 articles published by the same news outlet. Each article is represented by a json string type element in the array"
f"\n\n{claims}\n\n"
f"{CLAIM_FILTER_PROMPT}"
)
filtered_claims = openrouter.req_w_addons(
req_content, skill=falsifiable_claim_skill, tools=[openrouter.WEB_SEARCH_TOOL]
)
结果:一独,结构良善之主张,具所有必场,以成主张纳文之册。
吾今仅择一证,俟核验摄取之务稳、OpenRouter应答之信,则增其数,亦频其入.
3️⃣ 补遗缺失之述——速为概要之过
NewsData.io时或返"null"于description之域。遇此,吾询OpenRouter 概要此文,限五百字内。
CLAIM_SUMMARY_PROMPT = (
"Use web search tool to visit the link to the article and access its content."
"Summarize the article in under 500 characters."
"Return only the summary without any additional text."
)
req_content = (
"Following is the url to an article published by a news media outlet."
f"\n\n{claim['link']}\n\n"
f"{CLAIM_SUMMARY_PROMPT}"
)
claim_summary = openrouter.req_w_addons(
req_content, tools=[openrouter.WEB_SEARCH_TOOL]
)
claim["description"] = claim_summary
今每项主张,皆具简明可读之述,纵源API未填。
4️⃣ 每周主张吸纳 - GitHub Actions 工作流
通体运行于 GitHub Actions。一周一更。工作流程检视仓库,安装依赖,运行ingest_claims.py,遂开 Pull Request。
终局:一个人电脑(PC)每周日现新据牒,备审。
五、取证吞证.py+流程
既入一证,吾欲据以证之或辨之。其证核之问,令大智之语寻二独立之源,而标其名以布尔之supports_claim。
为助大语言模型于网搜中觅证以辅或驳所引之论,吾复撰一技,声索核实,其能行如下之事:
- 撷取并验证可证伪之主张取媒体文章/帖子之URL,辨其内容之核心、可验之主张,务求具体,可证其为真或为伪。
- 施行定向网搜: 以时序查询之法,寻得至多二份高质外部文书,直证或驳斥所陈之论,严察时序之关联(契合论之时间范畴)
- 返可核之证文URL:输出JSON数组,列所获URL,附布尔标识(支持声明:真/假),示每文献是否印证或驳斥原声明,优先取官方/权威之出处,次及意见之文。
CLAIM_VERIFICATION_PROMPT = (
"Use web search tool to access the claim link, fetch the content and process it."
"Use the web search tool again to look for proofs in the form of official statements, press releases, or reports from reputable sources to prove the claim right or wrong conclusively."
"Ensure that the proofs belong to the same timeline as the claim. Do not include outdated sources."
"Output links to the 2 sources that prove the claim right or wrong and specify as a boolean whether they support the claim or not."
"The output format should be a json array with each element being a json object corresponding to a source supporting or refuting the claim."
"Each json element should follow the following schema: {\"uri\": \"string\", \"supports_claim\": boolean}"
)
req_content = (
"Following is a link to a falsifiable claim by a news media outlet as an article"
f"\n\n{claim['uri']}\n\n"
f"{CLAIM_VERIFICATION_PROMPT}"
)
claim_proofs = openrouter.req_w_addons(
req_content, skill=claim_verification_skill, tools=[openrouter.WEB_SEARCH_TOOL]
)
证伪流程: 若如申索吸纳之流程,吾每周亦运行证物吸纳之流程。其运行于ingest_proofs.py脚本,于新支创建证文书,复自是支创建请于之。main枝。
结果:一个人电脑每周日现新证以证所诉,待审。
六、开路者之可靠性增进
吾之OpenRouter API使用日增,故免费层级之模型清单,缩为如下:
FREE_MODELS_DOC = [
"google/gemma-4-31b-it:free",
"nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free",
"openrouter/free"
]
此三模态恒得佳果,而居免费之阶。
退避重试之环启行路者之文(__JHSNS_SEG_64947a0f_105__)
复增渐延之候于重试,以减逢伺侧之失之机。
for i in range(1, OPENROUTER_MAX_RETRIES + 1):
status, body = helper.post_request(...)
if status in (0, 429) or 500 <= status < 600:
print(f"OpenRouter API returned status {status}, retrying...", file=sys.stderr)
time.sleep(10 * i)
continue
# success handling...
效验:五百之误率骤降,周间API之费恒在免费级之限下。
吾甚悦此OpenRouter之仪表盘。
权衡与局限
| 方面 | 权衡 |
|---|---|
| 免费套餐限制 | 200 NewsData.io 信用点/日限制吾等仅能每运行使用一域名。欲扩展至多平台,需付费方案或更巧妙的批量处理. |
| 每源仅一主张 | 吾故独取“优”之可证伪之论,以简其务。后之工将支持每文多列高质之论。 |
| 大语言模型之幻象 | 纵有论证之能,犹可现陈旧之链。时序之觉助之,然非灵丹。 |
| 证之质 | 证文自取,然大事之辩,犹宜人察。 |
八、结论&继之以往
以缀合之NewsData.io(NewsData.io), 定制可证伪之主张且声索核验之技,OpenRouter(具渐次缓退与免费层级轮换之模),及GitHub Actions,吾构一省费至极、全然自动化之管,化质朴之讯为条理井然、可证伪之断言文书,并附佐证之据。
来者何事?
- 于断言与证物添信度之分数,俾下游之用者得衡验据。
- 自周而改为日,纳取于高量之出处,俟重试之理昭若坚石。
- 负性之试,盖吾未睹断言为谬之例,疑诸大智之能者慎之如履薄冰。
若尔好奇,其全码存于萨提亚透镜/源代码仓库。若有建议或疑问于吾,但可投于评论区。















