惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Google DeepMind News
Google DeepMind News
F
Fortinet All Blogs
阮一峰的网络日志
阮一峰的网络日志
Apple Machine Learning Research
Apple Machine Learning Research
爱范儿
爱范儿
WordPress大学
WordPress大学
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
J
Java Code Geeks
罗磊的独立博客
S
SegmentFault 最新的问题
V
V2EX
V
Visual Studio Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
美团技术团队
博客园 - 三生石上(FineUI控件)
Stack Overflow Blog
Stack Overflow Blog
Y
Y Combinator Blog
MyScale Blog
MyScale Blog
D
Docker
Google DeepMind News
Google DeepMind News
Blog — PlanetScale
Blog — PlanetScale
M
Microsoft Research Blog - Microsoft Research
Martin Fowler
Martin Fowler
S
Secure Thoughts
B
Blog
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
www.infosecurity-magazine.com
www.infosecurity-magazine.com
Recent Announcements
Recent Announcements
MongoDB | Blog
MongoDB | Blog
C
Cisco Blogs
C
CERT Recently Published Vulnerability Notes
T
True Tiger Recordings
GbyAI
GbyAI
P
Proofpoint News Feed
P
Privacy International News Feed
Jina AI
Jina AI
The Cloudflare Blog
I
Intezer
AWS News Blog
AWS News Blog
Hacker News - Newest:
Hacker News - Newest: "LLM"
S
Security Archives - TechRepublic
NISL@THU
NISL@THU
The Register - Security
The Register - Security
Recent Commits to openclaw:main
Recent Commits to openclaw:main
P
Palo Alto Networks Blog
S
Schneier on Security
L
LINUX DO - 热门话题
C
CXSECURITY Database RSS Feed - CXSecurity.com
Security Latest
Security Latest
C
Cybersecurity and Infrastructure Security Agency CISA

The Stack

UK's national web policing platform getting a £72m update NVIDIA sets sights on CPU market, while GPU sales near $1tn "Not about AI": Intuit cuts 3,000 jobs to flatten management 5 need-to-know announcements from Google I/O 2026 Neo4J's CPO on the power of graphs and EA's RAG approach China hackers using Discord, Microsoft Graph to target European governments Google Cloud goes rogue again, cuts off $2m/month customer without warning GitHub breached: 3,800 of its private repos exfiltrated Contractor Nightwing blamed for CISA secrets breach Standard Chartered is cutting 8,000 jobs cites AI automation NATO C2 chief: Cyber now "primary warfighting domain" MongoDB 8.3 delivers steep gains. Thank LTO and PGO says CPO STACKUP: The Stack's weekly tech startups and funding wrap Microsoft researchers: LLMs degrade “artifact fidelity” How insurer Aviva migrated 1.3PB off Oracle Vercel soft-launches machine-friendly language Zero Digital sovereignty is all the rage in Europe. How does it define it? UK regulators sound alarm over frontier AI threat UK gov urges public sector to keep its code open, despite AI Claude's new pricing policy: Why now and what changes? Over 10 threat groups are now feasting on your Cisco SD-WAN Everything you need to know from the expo floor of Gartner’s D&A Summit Cisco reports record revenue, slashes 4,000 jobs Alibaba will "overshoot" on AI spending plans despite losses HMRC brings in Quantexa for data cleanse ahead of AWS move Q&A: Nscale CPO Dan Bathurst on the future of neoclouds AI agents spur GitLab layoffs and architecture overhaul UK Sovereign AI invests in Google's Isomorphic Labs Salesforce rebuilds AI chat stack for 100k concurrents Coinbase blames Amazon MSK edge-case during AZ outage May Patch Tuesday brings nasty zero-click Outlook bug £900m NHS framework opens door for AI in diagnostics, robots Mistral AI among npm, PyPI packages hit by Mini Shai Hulud Google: any-device edge agentic rollout is just about sorted Linux distro Debian mandates reproducible builds Sony touts new 'Mockingbird' tooling, fine-tuned models, and AI use for hair STACKUP: The Stack's weekly tech startups and funding wrap OpenAI's network protocol MRC could be a game-changer Data centre fire knocks IBM's cloud service offline AWS outage halts Coinbase trades Cloudflare cuts 1100 jobs, cites 600% increase in AI use Intel's Heracles chip puts FHE back in the limelight EU kicks AI Act's strictest regs down the road UK open-source contributions tick upward, but... Arm sees $2bn CPU demand before shipping a single chip AXA's fraught journey to the cloud starts to pay off Allianz hands cyber insurance off to San Francisco-based specialist 'Coalition' AMD: Server CPU demand has gone ballistic The PHP Licence is dead. Long live BSD-3-Clause! Google uses speculative decoding to speed up Gemma 4 by 3x Aon spent near $300m on AI in 2025 Europe's tech giants worth €1.1 trn blast 'rigid' bloc rules US gov wants new materials for robotics, compute built in Mythos pushes US toward AI rules it said would kill AI Coinbase layoffs: CEO sees fleets of agents replacing staff A disclosure row as Linux kernel bug exploitation starts STACKUP: The Stack's weekly tech startups and funding wrap The Big Interview: BAE Systems' CDO Johanna Hutchinson The Big Interview: BAE Systems' CDO Johanna Hutchinson Home Office eyes £296m refresh for biometrics platform Ubuntu maintainer Canonical: We’re under attack Improving AI accuracy with GraphRAG The Big Interview: Jim Adams Cloud wins vs CapEx woes: AWS, Microsoft and Google report earnings Linux bug “Copy Fail”: Short Python script gives root on… everything? IPv8: Lone engineer proposes simpler Internet addressing, triggers outcry IPv8: Lone engineer proposes simpler Internet addressing, triggers outcry cPanel is under attack GitHub CTO: We’re going multicloud amid AI stress What's new about Cisco's sovereign infrastructure push? UK gov't teases new "AI hardware plan" Attack path visibility concerns outstrip AI among security leaders EY hustles to hire 'Forward Deployed Engineers' US government has "misspent" $3 trillion. Auditors want a dedicated unit to tackle the problem Singapore and Latvia punch above their weight in NATO cyber battle NixOS: the lesser-known atomic OS France tapped for sovereign migration Streaming architecture and speculative decoding: How companies are unlocking cheaper AI OpenAI’s hottest models are coming to Bedrock New Chinese open models challenge closed Western top tier STACKUP: The Stack's weekly tech startups and funding wrap “The CPU is reinserting itself”, will reach parity with GPUs for AI workloads: Tan France’s shift from US technology starts with 250 "DINUM" machines and NixOS GitHub bug messed up customer code; COO plays down incident FIRESTARTER backdoor used for persistence on Cisco boxes in "widespread campaign China's cyber skills are on par with US, Dutch intelligence agency warns Disgruntled Claude users get an explanation – and another option with ChatGPT-5.5 The NCSC is worried about HDMI-based attacks, should you be? How Spotify used agents to migrate 1,800 data pipelines and save 10 weeks of dev work Apache ActiveMQ bug chain gives pre-auth RCE, is getting exploited Big Blue says its sleeper AI tool 'Bob' boosted dev productivity 45% NCSC's first hardware product targets the security of your monitor “Shiny pilots be damned!” UnitedHealth wants a 2:1 return on its $1.5b AI spend Still little AI return for UK businesses, Accenture survey finds, and many reasons to worry Atlassian to train AI on data from all but its top-paying users Command Line: Inference is breaking the bank, GitHub throttles AI use, GEICO's rethinking its network GitHub pauses Copilot sign ups, tightens limits as agents stretch capacity Lovable spews user data; says chats were MEANT to be publicly accessible - kind of. Tim Cook retires, hands over the reins of Apple to its hardware boss Inference budgets are overrunning by "orders of magnitude" - what now? Europe awards €180m in "sovereign" cloud contracts, touts SEAL levels
AI agents are ignoring government data: There's good reason
Elena Simper · 2026-05-14 · via The Stack

The UK has set out an ambitious plan for artificial intelligence, but there’s a problem, writes Professor Elena Simperl, Director of Research, The Open Data Institute. Disjointed and fragmented datasets in public-sector repositories are so difficult to use that AI agents choose to avoid them and search the internet for statistics from other, less reliable sources instead.

The Open Data Institute recently created and published a ready-to-use prototype of the National Data Library (NDL). The prototype, called NDL-Lite, featured 38GB of data from six public-sector sources, aggregating, processing, and standardising over 100,000 files into a central resource, accessible to researchers and innovators from around the world using cutting-edge agentic AI. The prototype demonstrates that a nation-wide cross-government data asset could be created quickly and cheaply, but several barriers must be addressed to move it from a promising prototype to a workable reality.

Access to high-quality public-sector data will determine whether the UK can support innovation, empower public services, and compete globally. The ODI prototype shows that this can be done. In one example, an AI agent combined official data on electric vehicle charging infrastructure with Hansard debate transcripts to produce a short report on business opportunities in rural EV charging in under a minute.

Missing metadata, and inconsistent formats

But many public datasets cannot support meaningful work by human data users, let alone AI agents, whether for financial analysis, customer service, or policy research. During testing, the ODI found that many official datasets, particularly on data.gov.uk, are difficult for both humans and AI systems to interpret. Titles are often vague or misleading, metadata is sparse or missing, formats are inconsistent, and even basic standards, such as date formats, vary. Datasets that look useful at first glance often turn out to be too narrow, old or badly structured to support the kind of analyses users are trying to carry out.

Crime data is one example. In one experiment, a search for crime statistics returned datasets that appeared relevant, but many were local authority releases rather than national data that could be compared or combined. One important Home Office crime dataset on data.gov.uk had not been updated since 2018, making it unsuitable for current AI use. A newer version exists, but it could not be accessed through the Office of National Statistics’ API, which meant it could not be properly integrated into the prototype.

This meant that AI agents chose to avoid public-sector datasets and searched the internet for statistics from other, less reliable sources instead; quite the opposite of what the government would want. Public data is supposed to provide a trusted, authoritative foundation for AI use in research, policy making and product development; it cannot be acceptable for systems to bypass it in favour of whatever they can scrape from less reliable sources online.

ONS data was clean and standardised

In contrast, although datasets from the Office for National Statistics (ONS) lacked attached metadata, the clear titles made it easy to find, understand, and use up-to-date economic statistics. The datasets are clean and standardised, making them interoperable and reusable across many research domains. The difference is not in the volume of data, but in its quality and structure.

Data.gov.uk is working hard to tackle the problems of historic underinvestment. While budgets remain tight, data grouped around core national teams is being curated and cleaned, and a data manual for future best practices is being developed. However, there is still some way to go.  

Since 2024, the ODI has consistently recommended that any work on the NDL must begin with the quality of the underlying public sector datasets. This means improving inconsistent formats, missing metadata, misleading dataset titles, fragmented statistics and outdated datasets.

There is also a growing need to prepare data for AI applications, including the use of machine-readable metadata standards such as Croissant and clearer documentation around dataset limitations and potential biases, which AI systems can otherwise amplify. As a critical data provider for AI, the government has a responsibility to ensure its data is both usable and available equitably. Without this, UK based start-ups and academic institutions will lose out to large, entrenched entities from the US and China. And along with this, developers need the code and tooling to use the data. That’s why we published the NDL-lite dataset with purpose-built infrastructure to enable access by humans and AI agents alike. 

NDL has huge potential, but...

The National Data Library has huge potential, allowing tools that can write code, analyse datasets and generate insights in seconds to work across the UK’s economic, scientific and policy data, unlocking new insights and faster decision-making. NDL-Lite demonstrates that building an initial prototype is not especially difficult. The tools are widely available, and the code can be developed openly over time.

The harder task is ensuring the data is reliable, usable and accessible. Without that, AI agents are likely to simply ignore the National Data Library. If ministers want the National Data Library to deliver real value, the immediate priority is not branding or another round of strategy papers - it is making government datasets fit for purpose. 

Join the discussion on LinkedIn