My girlfriend asked why there's a red light blinking in the bedroom at 3 AM. I told her it's for the AI. She didn't talk to me for two days.
I know how that sounds. But I'm trying to solve a problem that nobody else seems to want to touch: giving AI access to the physical world.
Every AI product right now knows you through text or voice. What you type into a prompt. What you paste into a context window. Maybe your calendar, your emails, your screen. But your actual life? The one that happens in physical space? Your AI knows nothing about it.
Last year I built a product that used OCR to grab my screen, pulled in emails, tried to understand patterns. Investors loved it. Reddit loved it. And it was still fundamentally blind. It could see my screen but it couldn't see me. It knew what I typed but not what I did.
That gap bothered me for months. Then I did something about it.
The hardware
I built a network of cameras and microphones in my house and wired them into a pipeline:
- 5x Raspberry Pi Zero 2W ($15 each)
- 5x ArduCam IMX708 12MP 120° wide-angle cameras
- 5x WM8960 audio HATs for ambient sound capture
- 1x Ugreen NAS for storage
- Custom Python daemon: motion detection, triggered recording, sleep when idle
Total hardware cost: under $500. I spent more on the camera modules I threw away than the ones that worked.
I spent weeks debugging device tree overlays. Swapped camera modules three times before finding ones that actually performed. Burned through two Pi Zeros that couldn't handle the thermal load. This wasn't a weekend project someone vibed together. This was real infrastructure.
The cameras have been recording for months. Writing to SD cards. Capturing fragments of my daily life. Motion clips. Audio snippets. And I won't be analyzing it manually. Claude will.
Why physical-world data matters more than prompts
Nobody tells their AI "I've been pacing around my office for 20 minutes." Nobody types "I skipped lunch again today." Nobody prompts "I've been staring at the same file for an hour without making a single edit."
But a camera sees all of that. And that context is worth more than a thousand carefully worded prompts.
Think about the people who actually know you. Not your boss. Your boss knows nothing about you other than your output. The people who really know you. They know your tells. They know you fidget when you're nervous, that you pace the room when you're stuck. That stuff isn't in any context window. But it's the difference between software that assists you and something that actually understands you.
The stack nobody's building
The whole industry is trying to make AI feel more human by tweaking the output. "Don't say awesome." "Match the user's tone." But the problem isn't the output. It's the input. They're training on polished, sanitized datasets and then wondering why it still feels like AI.
Making AI more human isn't about adjusting personality settings or temperature. It goes deeper than tone. Who you are. What you value. How you think. Everyone has different values and a generalized AI is never going to capture that.
Here's what I think the real stack looks like for AI that actually knows you:
- Observation layer - cameras, mics, sensors, the physical world
- Memory layer — persistent, cross-session, not just a context window
- Reasoning layer — the model, which is already good enough
Everyone is pouring billions into layer 3. Almost nobody is building layers 1 and 2. The models are smart enough. That's not the bottleneck anymore. The bottleneck is that your AI has never seen you. It's never been in the room. It's a hyper-intelligent entity trapped behind a text box.
I built TrueMemory to solve layer 2 — persistent memory that follows you across AI sessions. My research on cognitive memory architectures is published on arXiv. Now I'm working on layer 1.
I'm not asking for permission. I'm just showing you what's coming.
Josh Adler is a researcher and builder. More at joshadler.com.
























