ChatGPT will soon be able to ‘discuss’ images shared by users and hold ‘back-and-forth’ conversations using five voices.
ChatGPT has been updated with support for voice conversations and image recognition, OpenAI announced on Monday. The company’s AI-powered chatbot will soon be able to understand images captured or shared by users and provide details or related information across platforms where the chatbot is available. It will also be capable of back-and-forth conversation using OpenAI’s Whisper speech recognition tool and a new text-to-speech (TTS) technology from the company that is claimed to offer “human-like” audio on the company’s ChatGPT app for smartphones.
OpenAI revealed in a blog post that the company’s new image recognition capability for ChatGPT will be available on all platforms, while the voice conversations feature will be available on iOS and Android via an opt-in setting. These features will be available to ChatGPT Plus and Enterprise subscribers, and there’s no word on whether it will roll out to users on the free tier in the future.
The voice conversations coming to ChatGPT can be enabled by going to Settings > New Features and toggling the option to enable voice conversations. You can then select from five voices — OpenAI says it has worked with professional voice actors to offer the new feature. The ChatGPT app will be able to answer questions by converting your spoken queries into text that can be understood by the chatbot, and responses will be turned into audio using the company’s new TTS technology.
ChatGPT isn’t the only service that will use OpenAI’s new TTS technology — Spotify on Monday announced a new AI-based voice translation tool for podcast creators that can automatically translate a podcast from English to French, German, and Spanish. The tool is being tested with a few podcast hosts and translated episodes will be available to all users wherever Spotify is available, according to the streaming platform.
OpenAI says the new image recognition tool runs on the company’s multimodal GPT-3.5 and GPT-4 models and are capable of analysing images and text contained in photographs, screenshots, and documents. Users can either capture an image or share an existing one on their phone with ChatGPT to get insights from the chatbot.
ChatGPT will also allow users to share multiple images that can be discussed with the chatbot, according to OpenAI. If you want it to focus on a specific area, the built-in drawing tool will allow you to mark a part of the image. For example, drawing around a dislodged bicycle chain in a photo shared with ChatGPT might allow the chatbot to show you ways to fix the problem.