惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

酷 壳 – CoolShell
酷 壳 – CoolShell
H
Hacker News: Front Page
P
Palo Alto Networks Blog
T
ThreatConnect
Apple Machine Learning Research
Apple Machine Learning Research
博客园_首页
T
True Tiger Recordings
P
Privacy & Cybersecurity Law Blog
B
Blog
IT之家
IT之家
Last Week in AI
Last Week in AI
F
Full Disclosure
Hacker News: Ask HN
Hacker News: Ask HN
C
Comments on: Blog
Microsoft Azure Blog
Microsoft Azure Blog
C
Cybersecurity and Infrastructure Security Agency CISA
Microsoft Security Blog
Microsoft Security Blog
博客园 - 【当耐特】
N
News and Events Feed by Topic
NISL@THU
NISL@THU
腾讯CDC
雷峰网
雷峰网
Security Latest
Security Latest
李成银的技术随笔
M
Microsoft Research Blog - Microsoft Research
L
LangChain Blog
L
Lohrmann on Cybersecurity
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
C
Check Point Blog
Y
Y Combinator Blog
Recent Announcements
Recent Announcements
博客园 - Franky
N
News | PayPal Newsroom
V
V2EX
A
About on SuperTechFans
The Register - Security
The Register - Security
月光博客
月光博客
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Google Online Security Blog
Google Online Security Blog
MyScale Blog
MyScale Blog
Cisco Talos Blog
Cisco Talos Blog
Vercel News
Vercel News
WordPress大学
WordPress大学
C
Cyber Attacks, Cyber Crime and Cyber Security
The Hacker News
The Hacker News
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
爱范儿
爱范儿
A
Arctic Wolf
L
LINUX DO - 最新话题
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More

Google adds end-to-end Gmail encryption to Android, iOS devices for enterprises – Computerworld

Beth Tschida takes over at Jamf as AI transforms Apple in the enterprise Google talks ‘singularity’ while scaling up agentic AI for enterprises Copilot Chat: Your hub for document creation and analysis 10 Android Circle to Search superpowers you probably never noticed EnterpriseClaw wants to bring governance to the OpenClaw era The Big Four accounting firms are now hiring more AI specialists than accountants Arxiv: Researchers who submit AI-generated junk could get 1-year suspension Coming Bright Up: Apple's AI moment looms How Apple turned circular manufacturing into a competitive edge Why ‘open AI’ models are gaining ground on LLMs Microsoft May security patch fails for some due to boot partition size glitch Microsoft to retire ‘Together Mode,’ its virtual meeting space for Teams 5 ways to curb AI sprawl without stifling innovation For May, Patch Tuesday means 139 updates — but no zero-days Here’s one career emerging from the AI shift: ‘forward-deployed engineers’ Why Apple needs Intel — and America needs them both Microsoft business software faces UK antitrust probe over bundling, AI lock-in The trouble with emotion-reading AI Apple’s App Store model for AI How Southwest Airlines is putting endpoint operations on autopilot Nearly every enterprise is investing in AI, but only 5% say their data is ready Jobs lost to AI could reappear elsewhere — and solidify AI-focused roles Cyberattack: First they come for Foxconn, then they come for you Microsoft’s new AI system finds 16 Windows flaws, including four critical RCEs 8 critical questions about the Googlebook, Android, and ChromeOS Who’s the winner in the new Microsoft-OpenAI deal? WWDC: From NeXTStep for Apple to Apple’s next step for AI OpenAI introduces Daybreak cyber platform, takes on Anthropic Mythos Arm’s software chief sees human language as the new way to program IMF warns of the potential for AI attacks on global financial systems The European Commission eyes rules to restrict US cloud services Apple needs to fix admin authentication in ABM No hire, no fire: Employers get picky on tech skills amid AI disruption Apple vs. social engineering: Terminal paste trap blocked AI clones: the good, the bad, and the ugly LinkedIn illegally blocking free accounts from seeing 'who's viewed your profile' data, group alleges EU lawmakers strike provisional deal to soften AI Act WWDC 2026: How Apple can take a great leap in AI US government agency to safety test frontier AI models before release Chrome's AI features can take up to 4GB of space on your computer ServiceNow continues its AI transformation with an integrated experience Apple Intelligence hype cost the company $250M Give yourself an on-demand Android taskbar Edge browser leaves passwords exposed in plain text, says researcher Ask Jeeves bites the dust Apple can't make chips fast enough, but that's only part of the story AI-led job cuts don’t always mean stronger ROI — Gartner Stealthy malware abuses Microsoft Phone Link to siphon SMS OTPs from enterprise PCs Microsoft, Google push AI agent governance into enterprise IT mainstream Microsoft now has more than 20M paying Copilot users AI is more accurate than doctors in emergency diagnoses — study Start small, but start now: How to bring AI into your small business Apple is preparing to spend, but not necessarily on AI 10 quick productivity tips for Microsoft 365 mobile apps Relying on LLMs is nearly impossible when AI vendors keep changing things AI agents can bypass guardrails and put credentials at risk, Okta study finds Windows shell spoofing vulnerability puts sensitive data at risk Apple breaks records, admits it can’t make Macs fast enough Spotlight report: Transforming software development with AI - Whitepaper Repository - 25 great uses for an old Android device AI chatbots need ‘deception mode’ Are we ready to give AI agents the keys to the cloud? Cloudflare thinks so Friendlier chatbots can be less reliable, study says Gartner sees untamed growth in agentic AI Apple reportedly abandons Vision Pro AI venture funding to shoot up this year as bubble looms Scaling up a tech startup in Europe is hard — 'EU Inc.' aims to help Apple will be behind on AI — until it isn’t EU lawmakers fail to agree on watered-down AI Act, talks pushed to May Android reminders, reinvented Who’s the better CEO, Apple’s Tim Cook or Microsoft’s Satya Nadella? AWS unveils trio of key AI strategy announcements SAS makes AI governance the centerpiece of its agent strategy Can Apple’s new CEO turn things around? Enterprises need to think beyond GPUs for agentic AI, analysts say Fleet hopes to be the MDM provider for the AI Era Xiaomi releases MIT‑licensed MiMo models for long‑running AI agents Why simplicity is the silent driver of hybrid workplace success Why security matters in the meeting room Can everyday IT decisions turn sustainability from intent into impact? Why the meeting room has become the true test of hybrid work Why smart meeting rooms are becoming strategic IT assets How collaboration technology defines the next phase of hybrid work Microsoft, OpenAI change contract terms–again OpenAI plans its own ‘iPhone killer’ Your AI strategy is all wrong Meta's compute grab continues with agreement to deploy tens of millions of AWS Graviton cores Germany's sovereign AI hope changes hands Agent Mode is now available in Microsoft Word, Excel, and PowerPoint CISA last in line for access to Anthropic Mythos Former OpenAI research scientist launches new AI model for Tencent Adobe bets on AI agents to stay at the center of marketing workflows Microsoft to offer voluntary retirement buyouts to about 7% of the US workforce Google Keep cheat sheet: How to get started The AI workplace paradox: Higher productivity, higher anxiety The agentic AI frenzy increases as more vendors stake their claims Gartner: Global IT spending to grow by 13.5% this year Apple may be the only laptop vendor to grow in 2026 Tim Cook’s legacy: a successful CEO who stumbled over AI Claude Mythos signals a new era in AI-driven security, finding 271 flaws in Firefox
AI is ready to take over Python programming, but not much else
2026-05-13 · via Google adds end-to-end Gmail encryption to Android, iOS devices for enterprises – Computerworld

Tests of how well 19 large language models (LLMs) complete and perform complicated multi-step tasks has shown that they are both error-prone and, in many cases, unreliable.

The findings are contained a preprint paper, LLMs Corrupt Your Documents When You Delegate, written by Microsoft researchers  Philippe Laban, Tobias Schnabel and Jennifer Neville based on a benchmark they created called DELEGATE-52 that allowed them to simulate workflows that might be part of a knowledge worker’s tasks. The paper is currently under review.

They said that the benchmark contains 310 work environments across 52 professional domains including coding, crystallography, genealogy and music sheet notation. Each environment consists of real documents totaling around 15K tokens in length, and five to 10 complex editing tasks that a user might ask an LLM to perform.

And, they stated in the paper’s abstract: “Our analysis shows that current LLMs are unreliable delegates: they introduce sparse but severe errors that silently corrupt documents, compounding over long interaction.”

Those mistakes are significant, they said. “The findings show that current LLMs introduce substantial errors when editing work documents, with frontier models (Gemini 3.1 Pro, Claude 4.6 Opus, and GPT 5.4) losing an average 25% of document content over 20 delegated interactions, and an average degradation across all models of 50%.”

Benchmark exercise receives a thumbs up

Brian Jackson, principal research director at Info-Tech Research Group, found the findings very interesting. “Putting a list of LLMs to the test across different work domains yields a lot of useful insights,” he said. “I think this type of benchmark exercise could be helpful to enterprise developers who are looking to leverage agentic AI to automate specific workflows and understand the limits of what can be achieved.”

However, he said, “what we shouldn’t conclude from this is that, because these foundation models caused document degradation after 20 edits, they can’t be used to automate work in a certain field. It just means they can’t do all of the work as they are currently constructed.”

But, Jackson stated, “in an enterprise environment where having an accurate output is crucial, you wouldn’t take that approach. You would design the automation flow with stronger guardrails in place to prevent errors. This could be done by using multiple agents that play different roles, such as one that makes the edits and another that checks for errors and makes corrections.”

Sanchit Vir Gogia, chief analyst at Greyhound Research, said, “the Microsoft paper should be read as a serious warning about delegated AI, not as a claim that enterprise AI has failed. That distinction matters. The paper is still a preprint, so it deserves careful handling, but its central question is exactly the one CIOs should be asking: can AI preserve the integrity of complex work over repeated delegation?”

The study, he said, is stronger than what he described as “the usual AI benchmark theatre,” because it tests work products, not just looking at clever one-off answers. “It uses reversible editing tasks, domain-specific evaluators, and a round-trip method to see whether a document returns intact after repeated edits. In too many cases, it does not.”

That is the point, explained Gogia. “This is not merely about hallucinations. It is about artefact integrity.”

AI is ‘not yet trustworthy enough’

He added that the headline finding is “uncomfortable: even the strongest models corrupt about a quarter of document content by the end of long workflows, while average degradation across all tested models reaches roughly 50%. The paper also finds that performance varies sharply by domain. Python is the only domain where most models are ‘ready,’ and the best model reaches that threshold in only 11 of 52 domains.”

AI is not failing because it cannot write, said Gogia, it is failing because it cannot yet preserve.

The study, he pointed out, “is especially useful because it shows how errors accumulate. Bigger documents worsen outcomes. Longer interaction worsens outcomes. Distractor files worsen outcomes. Short tests flatter the system, while longer workflows expose it. That maps rather neatly to the enterprise world, where work is messy, files are stale, context is noisy and the most important documents are rarely the simplest ones.”

The honest conclusion, he said, “is not that AI should be kept out of enterprise workflows. It is that delegated AI is not yet trustworthy enough to be left alone with consequential artefacts.”

When AI edits an important document such as a contract, a ledger, a policy, a codebase, a board paper, or a compliance record, Gogia warned, the enterprise still owns the damage.

Mitigation approaches

In order to prevent that damage, Jackson suggested, enterprises can do additional training and fine-tuning of models to be better adapted to their specific workflows: “These foundation models are very good at doing a lot of different tasks, but less good at doing one specific task very well. So, enterprises that want to achieve that may need to improve the models themselves by training on their own data.”

For example, “[the Microsoft paper] points out one multi-agent setup that led to more degradation instead of less, so the method to detect degradation must be well-designed to be effective,” he said. “Another approach that some enterprise platforms have introduced is a way to deterministically verify the output for accuracy using mathematical verification. So, knowing what domains prove more difficult for a single LLM to automate is useful, as developers can plan to add more verification steps to the process.”

He said, “depending on the model, for example, if it’s totally open source or if it’s proprietary, you can have more flexibility in terms of how much you can customize it. So, an enterprise developer might look at these results, pick the LLM best at automating their desired domain, and then send it in for additional training to master the process.”

People do not disappear

According to Gogia, the paper also shows something more precise than ‘AI still needs people.’ “It shows that AI changes the human layer from production to supervision, validation, and accountability. That is a rather different operating model from the one being sold in many boardroom conversations.”

People, he said, “do not disappear. Their work moves. This is the uncomfortable part for enterprises chasing headcount reduction. The people best placed to catch AI errors are often the same people organizations are hoping to replace, reduce, or redeploy. Remove too much domain expertise from the workflow, and the enterprise also removes the people who know when the AI has quietly damaged the work.”

Expertise becomes more valuable, not less, said Gogia: “The paper reinforces this because stronger models do not merely delete content. They often corrupt it. Weaker models are easier to catch when they visibly drop material. Frontier models are more awkward because the content remains present but becomes wrong, distorted, or subtly altered. That requires knowledgeable review, not casual inspection.”

This article originally appeared on CIO.com.

SUBSCRIBE TO OUR NEWSLETTER

From our editors straight to your inbox

Get started by entering your email address below.