Text to Speech + Image — A Talkie in JS
Kostia Palchyk
Posted on July 7, 2020
In the previous part we created a website where users can generate GIF animations using Emoji, domain-specific language (DSL) and a Canvas. In this post we'll upgrade our animations to talkies!
Intro
I thought that it'd be funny to create animations where Emoji can talk. I already had Emoji moving around and displaying phrases as text. Obviously it was missing sound. In this article I'll show you how I added it!
tl;dr: try this animation
⚠️ warning: contains sound!
Text-to-Speech
Accidentally I stumbled upon "Text To Speech In 3 Lines Of JavaScript" article (thanks, @asaoluelijah!) and that "3 lines" quickly migrated to my project.
Surely "3 lines" turned out to be 80. But I'll get to that later.
Text-to-Speech — is a part of browser Web Speech API that allows us to read text out loud and recognize speech.
But before we can go further with adding Text-to-Speech to animation, I need to show you how I rendered animation in the first place.
Animation and RxJS
After parsing DSL and rendering it to canvas (see part I), I had an array of frames:
[ { image: 'http://.../0.png'
, phrases: [ 'Hello!' ]
, duration: 1000
}
, { image: 'http://.../1.png'
, phrases: [ 'Hi!' ]
, duration: 1000
}
]
Each frame had a rendered image
, phrases
within it and frame duration
.
To show the animation I used a React component with RxJS stream inside:
Here I use a useEffect
hook to create a RxJS Observable and a subscription to it. The from
function will iterate over the rendered frames
array, delayWhen
will delay each frame by frame.duration
and map
will turn each frame into a new <img />
element. And I can easily loop the animation by simply adding a repeat()
operator.
Note that subscription has to be cancelled at some point (specially the endless repeat()
): the component might be destroyed or the frames
might change. So the function passed to useEffect
hook needs to return a teardown callback. In this case I unsubscribe from the animation observable, effectively terminating the flow.
With that covered, we can now discuss the Text-to-Speech!
Text-to-Speech and RxJS
Now I needed to pronounce the text using Speech API, but that frame.duration
delay I used wouldn't work: I had to wait until the phrase is spoken and only then switch to the next frame. Also, if user edits the scenario or navigates away — I need to stop current synthesis. Happily, RxJS is ideal for such things!
First I needed to create an Observable wrapper around Speech Synthesis API:
When utterance will end Observable will complete, thus letting us chaining the synthesis. Also, if we unsubscribe from Observable — the synthesis will be stopped.
I've actually decided to publish this Observable wrapper as an npm package. There's a link in the footer 👇!
Now we can safely compose our phrases and be notified when they end:
concat(
speak('Hello'),
speak('World')
)
.subscribe({
complete(){ console.log('done'); }
});
Try this code online at https://stackblitz.com/edit/rxjs-tts?file=index.ts
And to integrate the Text-to-Speech back into our Animation component:
from(frames).pipe(
concatMap(frame => {
// concat all phrases into a chain
const phrases$ = concat(
EMPTY,
...frame.phrases.map(text => speak(text))
);
// we'll wait for phrase to end
// even if duration is shorter
const duration$ = merge(
phrases$,
timer(frame.duration)
);
// to acknowledge the duration we need to merge it
// while ignoring it's values
return merge(
of(<img src={frame.image} />),
duration$.pipe(ignoreElements())
);
})
)
Thats it! Now our Emoji can walk and talk!
Turn the volume up and try this "Dancing" animation
And surely try creating your own 🙂
Outro
It was pretty simple, huh?
But there was a hidden trick: previously the web app was hosted on GitHub pages and users shared their animations using downloaded GIFs. But GIF cannot contain sound, you know... so I needed another way for users to share animations.
In the next article I'll share details on how I migrated the create-react-app to NextJS/Vercel platform and added MongoDB to it.
Have a question or idea? Please, share your thoughts in the comments!
Thanks for reading this and see you next time!
❤️ 🦄 📖
Links
-
Web Speech API
https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API
-
RxJS Text-to-Speech wrapper npm package
npm i rxjs-tts
-
My twitter (in case you want to follow 🙂)
Posted on July 7, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.