Adjust speaking rate(SSML) via cURL POST
Vidyasagar SC Machupalli
Posted on November 9, 2020
Stackoverflow is an ocean for learning and exploring. How? Try answering a question and you will understand :)
A few weeks ago, I saw this question on StackOverflow -
How to adjust the speaking rate in Watson Text-to-Speech using cURL POST?
I coded an Android chatbot that uses Speech-to-Text and Text-to-speech as add-ons. But, I never bothered about the speaking rate and Speech Synthesis Markup Language (SSML).
So, what is this speaking rate and SSML?
According to Virtualspeech,
Speaking rate is often expressed in words per minute (wpm). To calculate this value, you’ll need to record yourself talking for a few minutes and then add up the number of words in your speech. Divide the total number of words by the number of minutes your speech took.
Speaking rate (wpm) = total words / number of minutes
and according to IBM Cloud docs
The Speech Synthesis Markup Language (SSML) is an XML-based markup language that provides annotations of text for speech-synthesis applications. It is a recommendation of the W3C Voice-Browser Working Group that has been adopted as the standard markup language for speech synthesis by the VoiceXML 2.0 specification. SSML provides developers of speech applications with a standard way to control aspects of the synthesis process by enabling them to specify pronunciation, volume, pitch, speed, and other attributes via markup.
OK, I understood the terms.
What's the answer to the Stackoverflow question
Here's a working example with the POST
call,
curl -X POST -u "apikey:{API_KEY}" \
--header "Accept: audio/wav" \
--header "Content-Type: application/json" \
--data '{"text": "<p><s><prosody rate=\"+50%\">This is the first sentence of the paragraph.</prosody></s><s>Here is another sentence.</s><s>Finally, this is the last sentence.</s></p>"}' \
--output result.wav \
"{URL}/v1/synthesize" -v
on a Windows command prompt(cmd), create a JSON file input.json
with the below command
echo {"text": "<p><s><prosody rate='+50%'>This is the first sentence of the paragraph.</prosody></s><s>Here is another sentence.</s><s>Finally, this is the last sentence.</s></p>"} > input.json
and then cURL to see result.wav file
curl -X POST -u "apikey:{API_KEY}" ^
--header "Accept: audio/wav" ^
--header "Content-Type: application/json" ^
--data @input.json ^
--output result.wav ^
"{URL}/v1/synthesize" -v
For the sentence in the actual question, replace the JSON above with yours
{"text":"<prosody rate='fast'>Adult capybaras are one meter long.</prosody>"}
Here are some useful links I followed to create the above code sample that will help you in understanding the SSML attributes. Also, check the limitations of <prosody>
in the links below
Posted on November 9, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.