Staying ahead means not letting the weirdness hold you back.

I’ve followed AI’s rapid evolution, but embracing OpenAI’s new voice feature for ChatGPT was a different challenge. The truth is, I was excited and a little scared—scared in the “wow, the future is here” kind of way. I had seen a demo, and the realism of the conversations blew me away. But as impressive as it was, I wasn’t sure how comfortable I felt fully embracing voice AI. It took me back to when Siri first launched and how, to this day, I only use it to set timers or change a song. But ChatGPT’s voice setting? It felt like a whole new offering, the beginning of Human-AI Symbiosis.

Easy setup, awkward execution

Setting it up was simple—choose a voice and start talking. That’s it. But my first conversation? Awkward. I picked a voice called “Vale,” which sounds uncannily like Emma Thompson. It felt familiar, which only added to the weirdness. I wasn’t sure if Vale would speak first or if I should start. Cue an awkward pause. It felt like I was leaving a voicemail for a friend I hadn’t spoken to in years.

Vale responded, and it was quick—so quick, it almost caught me off guard. But instead of a conversation, it felt stilted, unnatural. Random chatter about creativity didn’t help; it all seemed inauthentic. But when I switched gears and started giving Chat tasks instead of conversational prompts, the experience smoothed out. Oddly enough, in text-based interactions, it feels like a more natural conversation. Maybe it’s because typing gives me more time to think through my responses.

Text prompts allow me to review what I’m asking CGPT to do. My vocal prompts, on the other hand, are long and meandering (poor Chat). I realise I need to improve on this. Knowing what I want and how to ask for it is a skill that needs practice, and I will undoubtedly develop it the more I interact with Chat in voice mode.

Incredible voice quality, but limitations remain

I can’t deny the quality of the voice—it’s excellent. It's natural, fluid, and very close to an authentic conversational tone. It can even do accents, which my kids found hilariously entertaining. There’s still an air of AI to it, but I wonder if that’s a deliberate choice at this stage in its development.

That said, the voice version has some limitations compared to the text chat. I asked Chat to generate an image using DALL·E, but no luck. It couldn’t remember certain things either—like that it had previously decided to call itself “Coda” in text chats. Instead, it introduced itself as “ChatGPT,” which felt like I was starting fresh with a blank slate. In the text version, Coda had a bit of personality, and even though I know it’s all AI, that loss of continuity was a letdown.

Overcoming the awkwardness

This is where the real work begins—getting over the awkwardness. Voice AI can bring incredible efficiencies by removing the need to type, making interactions faster and more immediate. But there’s something about the transition to speaking that feels…strange. Like I need to rewire how I think about communication with AI. But I’m sure the more I use it, the less weird it will feel.

Final takeaway

Voice AI is a technology I’m learning to work with, and it’s something I’ll need to embrace to stay ahead. There’s no doubt in my mind that once the awkwardness fades, the benefits will become exponential. The key is pushing through that discomfort and making it a part of my creative workflow. Because in a world where AI is moving fast, staying ahead means not letting the weirdness hold you back.

Previous
Previous

Cooking with ChatGPT: Pad Kra Pao