I vibe-coded an AI Radio Station from my phone, and it actually worked
I’ve always had a thing for making music.
Back in school, I even coded a custom DJ deck from my dorm room because the options I had felt limiting (Winamp still deserves respect).
Fast forward to today, and I had one question: what if a radio station could be orchestrated end-to-end with AI?
So I built DJ Rey: an AI-driven radio station, built directly from my phone.

What “AI-driven” actually means here
This is not just a playlist with a chatbot on top. It’s a coordinated media system with multiple live components:
- AI host transitions between tracks, with timing and style variation
- AI track generation for both station programming and live request fulfillment
- Caller inserts with voice ducking, loudness normalization, and seamless playout
- Request-to-track automation that can turn a caller request into a queued song
- Scheduled news blocks with a dedicated reporter voice
The technical architecture
Under the hood, the station runs as a modular real-time pipeline:
- Icecast for stream serving
- FFmpeg for playout, crossfades, ducking, and mastering chain logic
- Cloudflare Tunnel for secure exposure of station interfaces
- ElevenLabs for call-agent voice interactions and post-call events
- Custom Node/Python orchestration for queue management, overlays, and generation workflows
Human in the loop (on purpose)
One of the most important decisions: keep real people in the experience.
Callers can dial in, get actual airtime, and request songs. Those requests can be interpreted from structured fields or conversation context, then converted into generation prompts and queued once tracks are ready.
That blend of automation + human input gives the station personality instead of making it feel synthetic.
Why building through WhatsApp changed the workflow
A big reason this moved so fast: I used OpenClaw directly through WhatsApp.
The interaction model felt natural and productive: I could describe intent in plain language, iterate quickly, and keep momentum without constant context switching across tools.
For implementation depth and orchestration, I used OpenAI’s GPT-5.3 Codex. It handled surprisingly complex system tasks while still feeling intuitive from a chat-first workflow.
What this shows
To me, the bigger takeaway is this: media products are becoming programmable systems.
You can now compose generation, voice, scheduling, control, and event ingestion into something that feels like a real station format—not a demo script.
This project felt like closing a loop: dorm-room DJ coding energy, rebuilt with AI-native infrastructure.
Try DJ Rey
Want to hear it in action?