Why Google Solutions Products Pricing Ge몭ing Sta몭ed English Sign in
Get an inside look at new AI & ML capabilities to help you make sma몭er decisions. Register for Next '21.
Cloud Text-to-Speech Contact Us Get sta몭ed for free
JUMP TO
Text-to-Speech
Convert text into natural-sounding speech using an API powered by Google’s AI technologies.
Try it free Contact sales
Improve customer interactions with intelligent, lifelike responses
Engage users with voice user interface in your devices and applications
Personalize your communication based on user preference of voice and language
Google Cloud named a Leader in the 2020 Magic Quadrant for Cloud AI Developer Services
Learn more
BENEFITS
High 몭delity speech
Deploy Google’s groundbreaking technologies to generate speech with humanlike
intonation. Built based on DeepMind’s speech synthesis expertise, the API delivers
voices that are near human quality.
Widest voice selection
Choose from a set of 220+ voices across 40+ languages and variants, including
Mandarin, Hindi, Spanish, Arabic, Russian, and more. Pick the voice that works best
for your user and application.
One-of-a-kind voice
Create a unique voice to represent your brand across all your customer touchpoints,
instead of using a common voice shared with other organizations.
DEMO
Put Text-to-Speech into action
Type what you want, select a language then click “Speak It” to hear.
Text to speak:
Google Cloud Text-to-Speech enables developers to synthesize natural-sounding speech with 100+ voices,
Google Cloud Text-to-Speech enables developers to synthesize natural-sounding speech with 100+ voices,
available in multiple languages and variants. It applies DeepMind’s groundbreaking research in WaveNet
and Google’s powerful neural networks to deliver the highest 몭delity possible. As an easy-to-use API, you
can create lifelike interactions with your users, across many applications and devices. text ssml
Language / locale Voice type Voice name
English (United States) WaveNet en-US-Wavenet-D
Audio device pro몭le Speed: 1.00Pitch: 0.00
Default
Show JSON SPEAK IT
KEY FEATURES
Key features
Custom Voice (beta)
Train a custom voice model using your own audio recordings to create a unique and more natural sounding voice for
your organization. You can de몭ne and choose the voice pro몭le that suits your organization and quickly adjust to
changes in voice needs without needing to record new phrases.
WaveNet voices
Take advantage of 90+ WaveNet voices built based on DeepMind’s groundbreaking research to generate speech that
signi몭cantly closes the gap with human performance.
Voice tuning
Personalize the pitch of your selected voice, up to 20 semitones more or less from the default. Adjust your speaking
rate to be 4x faster or slower than the normal rate.
Text and SSML suppo몭
Customize your speech with SSML tags that allow you to add pauses, numbers, date and time formatting, and other
pronunciation instructions.
View all features
WHAT’S NEW
What's new
Sign up for Google Cloud newsletters to receive product updates, event information, special offers, and more.
VIDEO BLOG POST
How to conve몭 PDFs to Conversational AI drives be몭er
audiobooks with machine customer experiences
learning
Watch video Read the blog
DOCUMENTATION
Documentation
GOOGLE CLOUD BASICS
Text-to-Speech basics
A guide to the fundamental concepts of using the Text-to-Speech API.
Learn more
QUICKSTART
Quicksta몭: Using the command line
Set up your Google Cloud project and authorization and make a request for Text-to-Speech
to create audio from text.
Learn more
GOOGLE CLOUD BASICS
Suppo몭ed voices and languages
Browse guides and resources for this product.
Learn more
GOOGLE CLOUD BASICS
Custom Voice (beta) overview
Learn how you can create a unique and more natural-sounding voice with Custom Voice
using your own studio-quality audio recordings.
Learn more
TUTORIAL
WaveNet and other synthetic voices
Learn about the different synthetic voices available for use in Text-to-Speech, including the
premium WaveNet voices.
Learn more
TUTORIAL
Speaking addresses with SSML
This tutorial demonstrates how to use Speech Synthesis Markup Language (SSML) to speak
a text 몭le of addresses.
Learn more
Not seeing what you’re looking for?
View all product documentation
Explore more docs
Quicksta몭s
Get a quick intro to using this product.
How-to guides
Learn to complete speci몭c tasks with this product.
Tutorials
Browse walkthroughs of common uses and scenarios for this product.
APIs & references
View APIs, references, and other resources for this product.
RELEASE NOTES
Read about the latest releases for Text-to-Speech
USE CASES
Use cases
USE CASE
Voicebots in contact centers
Deliver a better voice experience for customer service with voicebots on Dialog몭ow that dynamically generate
speech, instead of playing static, pre-recorded audio. Engage with high-quality synthesized voices that give callers a
sense of familiarity and personalization.
0 Phone call Telephony 1 Transcribe SpeechtoText 2 Generate response
Partner API
3 Synthesize TexttoSpeech Dialogflow
API Virtual Agent
USE CASE
Voice generation in devices
Enable natural communications with your users by empowering your devices to speak humanlike voices as a text
reader. Build an end-to-end voice user interface together with Speech-to-Text and Natural Language to improve user
experience with easy and engaging interactions.
0 Unique secure IoT Core
identity
2 Cloud Functions
IoT Device TexttoSpeech
3 Transcribe API
User voice SpeechtoText 5 Synthesize
1 command API
4 Intent and entity extraction
AutoML
USE CASE
Accessible EPGs (Electronic Program Guides)
Easily have the EPGs read text aloud to provide a better user experience to your customers and meet accessibility
requirements for your services and applications. Try the EPG demo.
Easily implement text-to-speech functionality in EPGs to provide a better user experience to your customers and
meet accessibility requirements for your services and applications.
TexttoSpeech API cdn.example.com
TexttoSpeech Cloud CDN
4 Send text to API 9 Fetch audio
speechstoragebucket
6 Generate signed URL
Cloud Storage
1 User input epgui 2 Send text payload getspeechservice 3 Check cache for audio
10 Play audio Cloud Run 7 Return audio URL
Cloud Run 5 Store generated audio
0 Load signing key
8 Load audio Signed URL Key
Secrets Manager
View all technical guides
ALL FEATURES
All features
Custom Voice (beta) Train a custom speech synthesis model using your own
audio recordings to create a unique and more natural-
sounding voice for your organization. You can de몭ne
and choose the voice pro몭le that suits your
organization and quickly adjust to changes in voice
needs without needing to record new phrases. Learn
more.
Voice and language selection Choose from an extensive selection of 220+ voices
WaveNet voices across 40+ languages and variants, with more to come
soon.
Text and SSML suppo몭
Pitch tuning Take advantage of 90+ WaveNet voices built based on
Speaking rate tuning DeepMind’s groundbreaking research to generate
Volume gain control speech that signi몭cantly closes the gap with human
Integrated REST and gRPC APIs performance.
Audio format 몭exibility
Audio pro몭les Customize your speech with SSML tags that allow you
to add pauses, numbers, date and time formatting, and
other pronunciation instructions.
Personalize the pitch of your selected voice, up to 20
semitones more or less than the default.
Adjust your speaking rate to be 4x faster or slower than
the normal rate.
Increase the volume of the output by up to 16db or
decrease the volume up to -96db.
Easily integrate with any application or device that can
send a REST or gRPC request including phones, PCs,
tablets, and IoT devices (e.g., cars, TVs, speakers).
Convert text to MP3, Linear16, OGG Opus, and a
number of other audio formats.
Optimize for the type of speaker from which your
speech is intended to play, such as headphones or
phone lines.
PRICING
Pricing
Text-to-Speech is priced based on the number of characters sent to the service to be synthesized into audio each
month. The 몭rst 1 million characters for WaveNet voices are free each month. For Standard (non-WaveNet) voices, the
몭rst 4 million characters are free each month. After the free tier has been reached, Text-to-Speech is priced per 1
million characters of text processed.
If you pay in a currency other than USD, the prices listed in your currency on Google Cloud SKUs apply.
View pricing details
View pricing details
Take the next step
Start building on Google Cloud with $300 in free credits and 20+ always free products.
Try it free
Need help ge몭ing sta몭ed?
Contact sales
Work with a trusted pa몭ner
Find a pa몭ner
Continue browsing
See all products
Why Google Products and pricing Solutions Resources Engage
Choosing Google Cloud GCP pricing Infrastructure GCP documentation Contact sales
Trust and security modernization
Open cloud Google Workspace GCP quickstarts Find a Partner
Multicloud pricing Databases
Global infrastructure Google Cloud Become a Partner
Sustainability Maps Platform pricing Application Marketplace
Customers and case modernization Blog
studies See all products Google Workspace
Analyst reports Smart analytics Marketplace Events
Whitepapers
Artificial Intelligence Learn about cloud Podcast
computing
Security Developer Center
Support
Productivity & work Press center
transformation Code samples
Google Cloud on
Industry solutions Tutorials YouTube
DevOps solutions Training Google Cloud Tech on
YouTube
Small business solutions Certifications
Google Workspace on
See all solutions Google Developers YouTube
Google Cloud for Follow on Twitter
Startups
Join User Research
System status
We're hiring. Join
Release Notes Google Cloud!
About | Priva | Site | Google Cloud Carbon neutral Sign up for the Google Subscribe English
Google cy terms terms since 2007 Cloud newsletter