Human powered enshitification long predates AI powered slop

April 19, 2026

2026-01 shot at Antarctica. The soaked Gentoo penguin surrounded by their pink poop is a perfect resemblance of my daily mood at work.

本文总计 1.97k 字, 阅读约需要 4 分钟

I opened youtube on my phone today trying to play something while cooking. A short form video that I definitely did not click jumped out and started playing at full volume. I have been warned about this “feature” but it was the first time seeing it myself. I immediately swiped left to exit the app.

I’m calling this a “feature”, not even suspecting it’s a bug. Because I, worked in this industry for over 10 years, can totally imagine how such a sloppy “feature” got proposed and greenlit by dozens of people inside the company ranging from individual contributors to VPs, launched to hundreds of millions of people and ruining their experience even pushing some off the service, and still found its way to a “success story” presentation 6 months later. And I really don’t want to believe that the general public is already brain rot to the degree that such a disturbing feature is actually gonna be the winning variant of the A/B test.

The data-driven #

If you’re not familiar with the concept of A/B test, it’s basically throwing bunch of stuff at the wall and see what sticks. When you have millions of stuff to throw at millions of walls, you quickly find the best sticky material.

The concept is not new, but modern technology makes experimenting much easier and cheaper. Every time you open an app or website, there are probably dozens if not hundreds of different experiments running, trying to figure out which color of the button makes you less likely to drop out of a checkout, which algorithm keep you hooked on the service longer and spend more money, etc.

If you hear the word “data-driven”, usually you’d think it’s objective, deterministic, systematic, and rational. Well, that’s the goal, but as companies grow, services scale, complexity increases… that’s not always the case.

You see, a big company has many goals. Take Youtube as example, “view hours” is likely a company wide “topline” metric, so as “monthly active user”, “average user ads revenue”. Each individual team will work on more granular metrics that contribute, or at least not hurt the topline metric.

How do you tell wether a feature contributes or hurts the metrics? Here comes the A/B test. Users will be randomly sliced into different buckets and get different treatments. For example for a team working on short-form video adoption, one experiment can look like (These of course are over-simplification. And I have not worked for Youtube. But I have worked in this industry long enough to know the idea):

Group A, say 1% of US youtube user, get no short-form video.
Group B, another 1%, got 4 short-forms video inserted to top of their feed.
Group C, another 1%, got 4 short-form video inserted to their feed every 4 other long form videos.

Then, running this experiment for a while looking at each group’s short-form video view hours, adoption rate, etc, you can pick which treatment performs the best, and decide to roll out that treatment to all but a small “holdout” group that get excluded from all of the experiments of their team, so that looking back you can measure the long term aggregated impact of this team/org.

The mess #

Seems nice and clean and data-driven? Not when you now add up all the teams in the company and all of the layers of measurement and different goals throughout the years. Realistically the measurement could very well looks like for a single org:

Topline 1
├── Holdout 1
├── Test A
│   ├── Sub A1
│   ├── Sub A2
│   └── Holdout 2
└── Test B
    ├── Holdout 3
    ├── Test D
    │   ├── Sub D1
    │   └── Sub D2
    ├── Test E
    │   └── Test F
    │       └── Test G
    │           ├── Holdout 4
    │           └── Test H
    └── Test C ...
...

You can imagine this tree grow much messier, bigger and deeper as more and more teams are added to the mix.

Now what has that to do with the shitty auto-play shorts feature I mentioned at the beginning? Imagine your team’s goal is to increase first time shorts adoption, aka make more people watch youtube shorts. Your org’s topline probably is total shorts view hours, percentage of youtube user that has watched shorts in past X days, average finish rate, etc.

You inserted more and more shorts into feed. That won some people. Next quarter you tried auto play shorts in feed. But team working on finish rate noticed that finish rate tanked because now your team flood the data with auto-played few seconds of video that users swiped pass. The two teams have weeks long arguments, and came to the conclusion that auto-played shorts should be lighter weighted/excluded in adoption and finish rate holdout. Your impact got squashed, but that’s ok because you already got your performance rating 2 months ago.

Now comes a new quarter with new goals to improve adoption rate even more. Your team decided to full-screen auto-play a “high-confident” recommended short video as soon as user open the app. There’s concerns, but fortunately “we’re data-driven” so we can let the data speak.

You pushed out the experiment to 0.1% user who’s low-hanging fruit target audience that are highly likely to continue watch the shorts, just to be cautious. Numbers looks great. You start to roll-out even more to higher-hanging fruits to solidify the win. Lower confidence video got pushed to lower-intended users like me, I rage quit the app. View hours decreased, daily active user decreased because I was so turned off by the shitty experience that I spent the night writing a blog instead of watching youtube, poor content creator’s finish rate tanked because their video got shoveled in uninterested user’s throat.

The slop #

Now your experiment starts to hurt your org and the company’s topline. Do you roll-back and call it a ~~defeat~~ lesson learned? Of course not. A ML engineer spent 2 months polishing the algorithm and built a model to target the user, a DS spent weeks analyzing the potential gain, a PM made a 12-page product memo and 30-page slides to convince leadership this was a good idea to compete with tiktok, 1 data engineer built the pipeline, 2 mobile eng implemented the player behavior and logging, web and tv eng already started working on similar feature to fast-follow feature parity, shorts infra team spent millions of company dollar to pre-scale and ready for the win, all of their performance review depends on the feature rollout, of course you cannot just give up?

Leadership had meetings. Shorts VP made it the org’s highest priority to ensure the success of the project. PMs came up with methods to educate the user to make the feature less intrusive. Engineers built whole settings to let you fine-tune your shorts time. ML engineers improves intention targeting. Data scientist comes up with fancy ways to gerrymander the experiment setup. ~~Day~~ Year saved. Shorts still got shoveled to majority of people’s throat.

This is probably an exaggeration for this relatively small feature (or is it?), but a larger scale disastrous feature launch totally just happened not long ago. Introducing the revolutionary “auto-dubbing”, a feature that utilizes most advanced AI models to auto-generated realistic translated audio to bridge any language barrier.

Sounds great, but when I first heard the uncanny valley-ish English-dubbed video unpromptedly from Chinese channel I subscribed, I almost dropped my phone, and spent 10 seconds trying to find where can I turn it off, 2 minutes searching how to turn it off for good, and 5 minutes writing a toot on Mastodon to rant. It was especially bad for Chinese videos because Chinese’s information density is higher than English, so auto-dubbed English audio of a Chinese video has a robotically and comically fast speed.

For a few months, you can only turn off auto-dubbing by individual video. Obviously bi-lingual person like me doesn’t make the cut for “MVP” or “P0” of this project, but our experience definitely was ruined enough that if you searched “youtube auto-dub” all the posts you can find are people complaining or asking how to turn it off. Then youtube finally have a setting to let you choose watch languages, but it took them a while to get it fully working.

As a tech worker myself, I know for a fact that there’s no lack of multilingual people working for youtube. Yet this feature still got pass to all of the multilingual software engineers, product managers and data scientists and made it to hundreds of millions of users.

And this is not just a “company adopting AI for the sake of it as much as possible” problem. Long before the current AI bubble youtube has a multilingual problem. As a native Chinese speaker working in English, my operation system is set to English, and I watch/read both. Youtube always seems to be confused by my language setting, auto-turn on english caption on Chinese video no matter how many times I turn it off. I mean this is google, you have all my data, how hard can it be to accept that people know 2 languages?

The shit #

Oh they sure do know. They just don’t care. Because on the topline metrics we don’t matter. We’re not worth the extra time that could be spent on newer fancier feature that looks better on the performance review.

And I understand them. Would I raise this concern in one google doc comment or even several meetings? Yes. Am I willing to spent hours even days pulling the data supporting that this is a bigger problem? Maybe. Would I fight my PM for multiple meetings that the multi-language setting or auto-play opt-in instead of opt-out should be a P0 and add weeks of workload for the team and myself and potentially risk entire team’s performance for that 2% of the multilingual users? Uh, no. At the end of the day it’s just a job to entertain. It’s not like we’re working on making it easier for children to buy crypto (Yes it’s a real product. And yes I would quit that job).

And it’s not just the AI bubble. It’s the 2008 subprime mortgage crisis. It’s the 2000’s dot-com bubble. And it’s not just tech. It’s every bubble before it, the radio, the railroad, the south sea… It’s The Office. It’s the OceanGate. It’s Theranos. It’s Titanic. It’s The human powered enshitification long predates AI powered slop.

There’s a slang in Chinese “草台班子”. Basically saying everything is bunch of imposters faking bullshit jobs, adding up together it looks fancy, but deep down it’s a mess. Like this meme you must have seen a version of:

source

Yeah, it’s totally true. Human society is definitely built on top of enshitification. Even life itself, if you really think about it, is basically nature throwing everything in a hot pot and see what sticks. And billions of years later, here comes us and all the wonderful creatures of this planet.

I don’t know what to make of it. Oh my good days, it helps me make peace with the meaninglessness of life. Oh my bad days, I just find everything meaningless.

Was gonna just write a little rant in Chinese about tech triggered by a youtube slop feature. Then realized because it’s work related it’s probably gonna be 50% English anyways. Then ended up word vomited this whole slop. Guess this is one of my bad days.

类似文章

The most unprofessional recruiter I’ve ever seen
career
rant
software engineer
生产力陷阱与无限增长幻象
productivity
rant
tech
10 年 5 份工作 4 次 gap
career
software engineer
patreon
Job hunting tracking template
career
productivity
software engineer
工作十年了
career
software engineer
复盘

如果您觉得本文对您有帮助，想支持我的博客创作，或者有特定的内容想要看到，或者想约 coffee chat 等，欢迎: 订阅 Patreon 参与博客选题和定制服务 在 Kofi 上给我买杯奶茶

❤️ 🤣 🤔 🤯

推荐订阅源