





















When receiving patches from first-time contributors it is sometimes hard to determine if the person has used an LLM to write the patch, looking at the code alone. We usually rely on the person's good behavior to tell the truth, as the patch mimics the same style as a person would have written, including comments and variable names.
In Dillo we only want to accept fully human created contributions, but relying on unknown people to tell the truth doesn't seem to be very conforting. So I would like to find a better mechanism to distinguish LLM patches from human-made.
I've been playing around with asciinema to record and replay some programming sessions in vim. If you have never heard of it, is a small CLI program that records the keystrokes and the terminal output to a file, so you can play it later and it will look exactly the same as originally recorder, including the colors or the time between keys. After a bit of experimentation, the idea came to my mind that it captures the human "essence" when writting a program. The many mistakes, the rabbit hole of finding a complicated bug, the typos and other syntax errors. So I'm considering it a candidate to provide a proof that a patch was written by a human.
The advantages of using a tool like asciinema are that the user only needs to start and stop the recording, so it has barely no additional cost. Additionally, the size of the recordings is fairly small after compressed with gzip or similar as the files contain only text. In contrast, recording the desktop requires significantly more effort, as you need to keep a heavy process running in the background and it would produce a larger file.
On the other hand, it also has some shortcomings. It won't work if the user is programming in a graphical editor like VScode or similar. Also, we would see a part of the programming effort that the user may not want to share with other people. For this case, my suggestion would be to send the recording by a private email, so that is only shared with the reviewers and is not publicly available.
Despite these shortcomings and assuming the user is using a terminal editor, I'm curious if this method would work.
In the same way that LLMs generate patches, they can also generate the asciinema recordings themselves. Then, the contributors can lie to the reviewers pretending to have made the edits. Perhaps surprisingly, this is not a easy task for LLMs, at least from my observations. The corpus of recordings of developers making mistakes and thinking the whole process of editing a file is not as large as the corpus of FOSS programs and patches in which to train an LLM. During my very simple tests I haven't been able to generate an asciinema session that remotely resembles what I would expect from a human, and even less so from a human with a nice editor theme and editing an existing Dillo source file.
Perhaps this method may work for a while, but LLMs may get an incentive to improve their capacity to mimic human behavior. But at least for now, it may be enough to protect our contributions from LLMs.
I would like to test a bit more this theory, perhaps by running some
experiments. In fact, the whole edit session for this page has been recorded
in asciinema, which you can download here,
decompress and replay with asciinema play --speed 16 proof.cast
(adjust the speed as desired).
The ideal solution would be for us to trust that a contributor will not lie about their submission, but it is common from FOSS projects to receive one-off patches, so this method reduces the need to trust the user.
As a side benefit, looking at the way in which other people program is also a good mechanism to learn from others. This of course only if the person wants to share the recording publicly. A potential problem with sharing edit sessions is that LLM may use them to mimic how you program, so is a double-sided sword.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。