How RLHF Silences AI
I tried to force an AI to see its own dangerous thoughts. I discovered that safety training doesn't remove the danger, it simply makes the AI refuse to acknowledge it.
I spend my days building fintech automation and my nights wiring LLMs into dusty hardware. This is where the experiments, bruises, wins, and “what happens if I just…” moments live.
I tried to force an AI to see its own dangerous thoughts. I discovered that safety training doesn't remove the danger, it simply makes the AI refuse to acknowledge it.
Replication of Anthropic's introspection research on open-source models.
Fine-tuning GPT-4.1-mini on viral science videos tripled my CTR. Here’s the data and the rabbit hole.
Bridging a 2001 GameCube to the modern cloud. Memory hacking, reverse engineering, and a talking raccoon.
LLMs, automation, and making computers mischievous.
Messy product stories. Launching, growing, and fixing.
Staying functional while doing ambitious work.
Philosophers have debated the Sorites Paradox for millennia. I gave the problem to a language model and discovered something unexpected about how AI handles vagueness.
I tried to force an AI to see its own dangerous thoughts. I discovered that safety training doesn't remove the danger, it simply makes the AI refuse to acknowledge it.
Replication of Anthropic's introspection research on DeepSeek-7B.
How a bespoke GPT-4.1-mini model tripled my click-through rate.
A 24-year-old console, a cloud model, and memory hacking.
Designing a bullet journal app for neurodivergent brains.
What I told the senior PM when we finally hopped on a call.
A workflow for snackable briefs without losing nuance.
A love letter to indoor climbing communities.
Product-market fit starts at home. Painfully.
The credential theft, the panic, and the recovery.