LLM’s code is just untrusted text. Until you validate it.

People often ask me what the best programming language is to use with LLMs. One of the strongest options, in my opinion, is Rust. However, before choosing it, you first need to ask yourself whether it’s worth picking a language with manual memory management instead of one with a garbage collector.

And regardless of that decision, the first thing is to understand that LLMs are not deterministic systems.

They should be used for what they actually are: smart suggesters.

You just don’t have to trust suggesters by default.
They read input text.

They responds with output text.

Disclaimer: it’s not code. Not yet.

Even if it looks like code, it compiles as if it was actual code, trust me.

It’s not.

The rule is:

code is in untrusted state, until you validate it.

untrusted state

No matter of the language.
Rust is not different. Its compiler strictness, doesn’t allow you to break the rule.

But let’s take a step back, in order to understand why have to evaluate other languages, not only Rust.

Manual memory management forces you to obsess over ownership & lifetimes at every step.

The bugs:

use-after-free
double-free
leaks
dangling pointers
overflows

… are brutal to debug, especially in large or concurrent codebases. Code tends to be complex, and sometimes application logic is more related to the way memory is allocated, not only at the way data is processed. It means there are tons of elements you have to consider in the I/O design, which tends to be exponentially complicated in complex infrastructures, where you can have async processes, threads, locks, concurrent accesses.

In this scenario, people often choose GC languages, because they remove an entire class of problems. You allocate and move on. Cleaner code, simpler APIs, safer concurrency.

But GC has costs. Latency and memory.

For systems, games, kernels, databases, manual control still wins.

On the other side, modern GCs are so highly optimized that, for most software, you can choose them unless your main target is the extreme performance, over productivity and simplicity.

And Rust is a modern language that tries to thread the needle with the borrow checker. It gives you extreme performances at the cost of massive complexity.

Rust teams fight the compiler all day long. Consider for instance, a typical Rust+Tokio project: Rust’s strictness + Tokio’s zero-cost async creates a combination that is extremely safe & fast when done right. But mentally expensive, even with apparently small details like lifetime management.

This is exactly why some people prefer simpler GC languages.

But there’s a lot of people that choose Rust because of its pattern-heavy nature, and as a consequence, for its reputation of being LLM-friendly.

What they often miss, however, is that this same strictness brings its own inner complexities: the ownership model, borrow checker, and lifetime management can add significant cognitive overhead and development friction that goes beyond what most teams anticipate. If you only consider how easy it is to generate Rust code, but you don’t evaluate properly how much time you’ve to spend to review and validate it, you’re introducing a new generation of technical debt.

This is one of the most underestimated issues: LLMs are excellent at generating code, but they don’t truly understand simplicity or the deeper abstractions in software. They’re just very capable text generators, helping you to get sophisticated scaffolding without any inherent logic, intuition, or understanding of consequences.

It’s entirely up to you to steer the process: to keep things simple, eliminate unnecessary code, and avoid hidden side effects. And in Rust, believe me, it takes time and effort.

What worries me is how many developers underestimate this aspect, treating this generated “text” as if it were the reliable output of a deterministic system, so there is no need to double check it.

They speak of LLMs as just another clean layer in the stack, comfortably sitting between the human and the machine.

It’s not.

It’s probabilistic text, the result of statistical patterns learned from a vast, uncontrolled corpus of human writing.

This shift has quietly made humans the bottleneck. We no longer need to type code to create software. Or at least, typing is not the only way.
The act of writing it used to force us to think, design, and truly understand what we were building, though. And this cannot be skipped, being delegated to text-generators machines.

They just don’t do it.

Now we write a vague prompt with some requirements and expect the machine to figure out what we actually need. This works surprisingly well for simple, common tasks, which is why it’s fair to call LLMs a powerful autocomplete.

"Given the id parameter, write the SQL statement that updates the user's email"

SQL prompt

Easy. Isn’t it?

Real world development is rarely that linear, though. It often involves complex context that’s extremely hard to describe in natural language, often splitted in various physical and logic spaces where engineering happens.

In those cases, many experienced engineers still find it faster and more precise to write the code directly rather than struggling to find the perfect words to describe what they want.

That’s exactly why formalism exists.

A pure scientist probably prefer to write:

Y=λf. (λx. f (x x)) (λx. f (x x))

Fixed-point combinator

… instead of producing dozens of pages describing fixed-point combinators that turn any function into a recursive version of itself, without needing named functions or loops.

So when they say that:

"In the AI age the problem is no more the code"

AI claim

…they’re omitting some critical details.

Problem is no more related to code itself, as a sequence of letters/words/lines.

“Coding”, in the AI age, is a marketing simplification.

"Coding" !== "Writing software" ∈ "Software engineering"

Writing code, as the act of typing commands, is related to thousands of sw engineering activities. Written code can have ~0 external context, or it can be something that depends and/or affects billions of elements outside of the namespace where it happens.

This is the difference.

And it changes everything.

If you blindly rely on generated text, and you don’t consider or underestimated implications, you’re probably using AI the wrong way.
And no, test coverage is no more enough. Now more than ever.
Test coverage assumptions are (were) a mutual contract between the team and the system. Starting from a bunch of functional requirements, engineers designed the system, and iteratively write tests in order to prove the absence of a predefined set of possible bugs.

Assume they create a system that draws a square, then assert that it has four sides of equal length meeting at 90° angles.

– What if the system creates a rectangle?

– What if it’s a cube instead?
– What if there is an extra-line, in some cases? Tests you defined on the first iterations based on the 1st LLM’s output, cannot consider if/where AI added some extra logic that can cause some unpredictable cases like that.

So every time AI generates some new text, you should evaluate everything, and update test suite accordingly.

This can be convenient in some cases, inefficient in other cases, risky most of the cases, time consuming always.

And you shouldn’t skip this step.

The rule is to:

never ever consider LLM's output as if it was a compiler artifact

text generator

It’s just text that eventually compiles.

It becomes code only after you validate it.

So, to recap, it’s a trust boundary model, for LLM assisted development, where the states are:

…and the flow is:

Raw output from an LLM is always treated as untrusted text (MU-TXT). Only after explicit human review, validation, and acceptance is it promoted to trusted code (MT-TXT → HVC) and allowed to enter the codebase via commit and branch merge.

The loop emphasizes that code generation is an iterative human-driven process, not a LLM/Language/AI based automation.

That’s why the problem is not related to the language you choose, as well as you shouldn’t choose a PL just because it has a good LLM support in that historical period of time. It’s a human issue, not a technical one.

Before validation, you don’t own that codebase. So please choose a language that allows you to be the always owner, not matter how much complex things get in the future.

Conclusions:

It’s a Human In The Loop flow

Whatever the language you choose, the review step has to be considered critical in the development lifecycle. If your management tends to underestimate it and/or tends to trust too much on agents’ related automations, consider it as a red flag.

Ownership is a cognitive responsibility.

It isn’t just “who wrote it”, it’s who holds the mental model. The person who owns the codebase is the one who can answer:

Why does this code exist?
What tradeoffs were made, and why?
Where does it break under pressure?
What would need to change if requirements shifted?

A good technical management, never allows shipping a single line of code the team produced, if not fully understood.

AdP

推荐订阅源

Hacker News - Newest: "LLM"