Introduction

As a developer, we often came across many open source projects in GitHub, in which Readme.md is one of the first files which we will see 🤩. It is the simplest way to understand, what the project is about, how to use it, and other related information (kind of documentation).

Here some known facts about readme from GitHub

A README is often the first item a visitor will see when visiting your repository. README files typically include information on:

What the project does

Why the project is useful

How users can get started with the project

Where users can get help with your project

Who maintains and contributes to the project

Moreover having a good readme file which will help and attract many contributors. However, it's always been challenging for any low vision or visually impaired developers and contributors. As most of the readme content is in the form of text, which is quite difficult for them to read and understand. So I developed a very simple tool called "ReadmeAloud" that can convert the raw-text from any public GitHub readme file to Speech 🎤 and also provides a way to download it in form of an mp3 file 🎵

Some of the Existing Solution

There are many solutions that already existing in the market which help to convert text into speech easily. Some of them are

Microsoft Edge Brower has an inbuilt feature called ReadAloud

Read aloud highlights each word on the webpage as it's being ?>?read. To stop listening, select the Pause button or the X to close Read aloud.

Google Chrome Extension - Read Aloud: A Text to Speech Voice Reader
Word for Microsoft 365 - Read Aloud

So you are saying that it's already existing what so special about "ReadmeAloud"?

Well, I agree with you that most of these tools were great and helpful for everyone. However, I felt there are a couple of factors that are missing like the feature to download the converted speech into a file(say mp3) and in most cases with the existing tools, either they need to pay for some license or they need to be connected with the internet to use the tool. That why I came up with my own little tool for low vision developers and focused on the most needed place for them which is Github, Readme File.

Architecuture

Here I have focused on the use-case rather than the architecture or technology that the reason you can see a very simple architecture like
-Azure front door
-Azure Webapp
-Azure Text-to-Speech Cognitive API

Source Code:

This is an open-source project.

jayendranarumugam / ReadmeAloud

"ReadmeAloud" is a simple tool that can convert the raw-text from any public GitHub readme file to Speech 🎤 and also provides a way to download it in form of an mp3 file 🎵

ReadMeAloud

A simple tool to provide an easy and efficient way to understand open-source projects for everyone especially visually impaired or sight-impaired friends

Architecutre

Read the detailed article from

Dev Community Post

View on GitHub

Workflow

The user will provide a valid Github Raw Readme.md URL e.g, https://raw.githubusercontent.com/jayendranarumugam/DemoSecrets/master/README.md in the azure front door
Once the user clicks the Search button the azure front door route the traffic to the azure web app (Blazor) securely then convert the text from the URL to Speech using Azure Cognitive API
Once the Speech is converted successfully, the audio bytes will be used to play the audio on the browser and also provide the way to download it as an mp3 file.

Code Walkthrough

This is my first time coding a server-side blazor 😁. Mostly I reused the default boilerplate blazor code, in which I modified and added some parts for my projects e.g, SpeechService. I hope I learned something about blazor now. I also put some great GitHub repos in the below References section which help me to understand blazor.

The entire logic of converting the text into speech is done using simple Azure Cognitive Speech SDK which you can find in SpeechService.cs

       public async Task<byte[]> SynthesizeAudioAsync(string text)
        {
            SpeechConfig speechConfigForAudioAsync = SpeechConfig.FromSubscription(Configuration["CognitiveAPIKey"], Configuration["CognitiveAPIRegion"]);
            speechConfigForAudioAsync.SetSpeechSynthesisOutputFormat(SpeechSynthesisOutputFormat.Audio16Khz32KBitRateMonoMp3);


            using (var synthesizer = new SpeechSynthesizer(speechConfigForAudioAsync, null as AudioConfig))
            {                                

                    using (var result = await synthesizer.SpeakTextAsync(text))
                    {
                        if (result.Reason == ResultReason.SynthesizingAudioCompleted)
                        {
                            return result.AudioData;

                        }
                        else if (result.Reason == ResultReason.Canceled)
                        {
                            var cancellation = SpeechSynthesisCancellationDetails.FromResult(result);                            
                        }
                        return null;
                    }
            }

        }

The result.AudioData is the converted audio in (byte array) byte[] format which we will use to play in the browser and download as an mp3 file using Javascript functions like downloadFromByteArray, playAudio and stopAudio which you can find in the helper.js

Demo

Improvements:

Readmealoud Is more like a small prototype where we can fit many features on top of that easily. Being said that, there are some limitations with the current design which we can improve in the future. Some of them are

The architecture itself is currently highly coupled with the server(blazor), we can make it more scalable by introducing a separate layer for the cognitive calls i.e, the azure function
Current limitation to convert only the public GitHub repo, we can improve that to private repo also by including some additional authentication.
Engish language is the by default for both text and speech conversation, we can improve that easily, since azure cognitive service has a very wide variety of supported languages
I'm not a front-end guy 🙈. So feel free to contribute to readmealoud with your creative ideas or UI/
If the Readme file is too long then the speech conversation would take more time some time it eventually timed out too. Currently, it is suited for small readme files. We can improve that by changing the architecture which we discussed above.
We can also improve the design with more accessibility for people with low vision like providing voice search capability input for GitHub repo details.

Conclusion:

The whole idea of the readmealoud is to provide an easy and efficient way to understand open-source projects for everyone especially visually impaired or sight-impaired friends. Though I showcased this for GitHub readme URL, however, we can put any valid URL for given plain text content. This is just a small idea and I hope it will reach its own audience 🤗

References:

gpeipman / BlazorDemo

Demo application for my writings about Blazor

BlazorDemo

This is my simple Blazor application that demonstrates how to build SPA on Blazor and how to communicate with ASP.NET Core backend Demo application is simple books database.

Solution contains:

Sample database BACPAC file (can be imported to MSSQL using SSMS)
Client application with Blazor UI
Basic select and CRUD operations are implemented in UI and in back-end
Displaying of delete confirmation dialog and deleting of books
Fully functioning add/edit form
Pager component and support for data paging
Dependency injection with custom service classes
Protecting Blazor application and Azure Functions based back-end using Azure AD

Azure AD example

For Azure AD there are two project in solution:

BlazorDemo.AdalClient - Blazor web application that supports Azure AD
BlazorDemo.AzureFunctionsBackend - Azure Functions project with all functions that form back-end for Blazor application

On Azure the following services are needed:

Azure AD - free one is okay
Azure SQL - instance with…