How to Integrate OpenAI for Text Generation, Text-to-Speech, and Speech-to-Text in .NET
PeterMilovcik
Posted on November 7, 2024
With the release of OpenAI's latest NuGet package (version 2.0.0), developers can easily integrate AI-driven text generation, text-to-speech (TTS), and speech-to-text (STT) functionalities into their .NET applications. This guide will walk through creating an OpenAI service in .NET that allows you to generate text responses, convert text to audio, and transcribe audio files back to text.
This implementation will use the minimum configuration necessary. For Windows, we’ll also leverage the NAudio package for handling audio playback, as it offers a straightforward solution for recording and playing audio files.
Prerequisites
Before you start integrating OpenAI’s capabilities into your .NET project, make sure you have the following set up:
- Install the OpenAI NuGet Package (Version 2.0.0): Add the latest version of the OpenAI NuGet package to your .NET project:
dotnet add package OpenAI --version 2.0.0
- Install NAudio (for Windows audio handling): If you're working on a Windows machine and need to handle audio recording or playback, add the NAudio NuGet package:
dotnet add package NAudio
-
Set the OpenAI API Key:
-
For Windows users, you can set the
OPENAI_API_KEY
environment variable using the Command Prompt:
-
For Windows users, you can set the
setx OPENAI_API_KEY your_openai_api_key
- Note: Run this command in a Command Prompt with administrative privileges for a system-wide setting.
- Restart any open Command Prompt or PowerShell windows after running this command to ensure the new variable is recognized.
- For other platforms (macOS, Linux), you can set the environment variable using:
export OPENAI_API_KEY=your_openai_api_key
- Ensure .NET SDK is Installed: Make sure you have the latest version of the .NET SDK installed. You can check your version using:
dotnet --version
With these prerequisites in place, you are ready to start building AI-enhanced features into your .NET applications!
Step 1: Generating Text Responses
The following OpenAiService
class uses OpenAI’s text generation API, powered by the GPT-4 model, to generate responses based on a given prompt.
public class OpenAiService
{
private readonly ChatClient _chatClient;
public OpenAiService()
{
var apiKey = Environment.GetEnvironmentVariable("OPENAI_API_KEY");
_chatClient = new ChatClient("gpt-4o-mini", apiKey);
}
public async Task<string> GenerateResponseAsync(string prompt)
{
var messages = new List<ChatMessage>
{
new SystemChatMessage("You are a knowledgeable assistant."),
new UserChatMessage($"Generate a response based on the prompt:\n\n{prompt}")
};
var completion = await _chatClient.CompleteChatAsync(messages);
return completion.Content[0].Text;
}
}
In this class:
- The
GenerateResponseAsync
method takes aprompt
and generates a response. - We initiate a conversation by sending a system message, setting the tone as a "knowledgeable assistant."
- Finally, we pass the prompt to the model and return the generated response.
Step 2: Converting Text to Speech
To convert text to speech, we’ll use OpenAI’s TTS functionality. This TextToSpeechService
class converts a given text to an audio file and plays it.
public class TextToSpeechService
{
private readonly AudioClient _audioClient;
public TextToSpeechService()
{
var apiKey = Environment.GetEnvironmentVariable("OPENAI_API_KEY");
_audioClient = new AudioClient("tts-1", apiKey);
}
public async Task ConvertTextToSpeechAsync(string text)
{
var speech = await _audioClient.GenerateSpeechAsync(text, GeneratedSpeechVoice.Onyx);
using (var stream = File.OpenWrite("output.mp3"))
{
speech.ToStream().CopyTo(stream);
}
PlayAudio("output.mp3");
}
private void PlayAudio(string filePath)
{
using (var audioFile = new AudioFileReader(filePath))
using (var outputDevice = new WaveOutEvent())
{
outputDevice.Init(audioFile);
outputDevice.Play();
}
}
}
Key points:
-
ConvertTextToSpeechAsync
accepts a text string, converts it into speech, and saves it as an MP3 file. - The
PlayAudio
method leveragesNAudio
for playback. It reads the MP3 file and plays it back on your system.
Step 3: Transcribing Audio to Text
The following SpeechToTextService
class uses OpenAI’s Whisper model to transcribe audio files into text. This can be incredibly useful for processing voice input.
public class SpeechToTextService
{
private readonly AudioClient _audioClient;
public SpeechToTextService()
{
var apiKey = Environment.GetEnvironmentVariable("OPENAI_API_KEY");
_audioClient = new AudioClient("whisper-1", apiKey);
}
public async Task<string> TranscribeAudioAsync(string audioFilePath)
{
var transcription = await _audioClient.TranscribeAudioAsync(audioFilePath);
return transcription.Text;
}
}
This class:
- Accepts an audio file path and transcribes the audio content into text.
- The transcription result is returned as a plain text string.
Step 4: Recording Audio with NAudio
For applications that need to capture audio from the user, such as for speech-to-text input, you can use the NAudio
library to record audio and save it as a .wav
file. This is especially useful for Windows-based applications, where NAudio
provides a straightforward API for handling audio input.
The StartRecordingAsync
method below demonstrates how to record audio from the default microphone, saving it to a specified output file path.
public async Task StartRecordingAsync(string outputFilePath, CancellationToken cancellationToken)
{
var waveFormat = new WaveFormat(44100, 16, 1); // 44.1 kHz, 16-bit, mono
using (var waveIn = new WaveInEvent { WaveFormat = waveFormat })
using (var writer = new WaveFileWriter(outputFilePath, waveFormat))
{
waveIn.DataAvailable += (sender, e) =>
{
writer.Write(e.Buffer, 0, e.BytesRecorded);
};
waveIn.StartRecording();
try
{
await Task.Delay(Timeout.Infinite, cancellationToken); // Keeps recording until cancellation
}
catch (TaskCanceledException)
{
waveIn.StopRecording();
}
}
}
In this code:
- Initialize Audio Format: We set up the audio format to 44.1 kHz, 16-bit, mono. These settings provide good quality for most voice recordings.
-
Create Audio Input and Writer: We use
WaveInEvent
for capturing audio from the default microphone andWaveFileWriter
to write the audio data to a file. -
Handle Data Available Event: As audio data becomes available (captured in chunks), it is written to the file through the
writer
. -
Start and Stop Recording: Recording starts with
StartRecording()
and will continue until the providedCancellationToken
is canceled, at which pointStopRecording()
is called to end the recording.
Usage Example
To start recording audio, you can call this method and provide a file path and cancellation token:
var recordingService = new AudioRecordingService();
var cancellationTokenSource = new CancellationTokenSource();
Console.WriteLine("Recording audio. Press any key to stop...");
_ = recordingService.StartRecordingAsync("recording.wav", cancellationTokenSource.Token);
// Wait for a key press to stop recording
Console.ReadKey();
cancellationTokenSource.Cancel();
This example will begin recording audio and save it to recording.wav
until a key is pressed, triggering the cancellation of the recording.
With the addition of audio recording using NAudio
, you now have a full toolkit for handling text generation, text-to-speech, speech-to-text, and audio recording within your .NET application. This setup provides a complete pipeline for interactive and conversational applications in .NET, enabling voice-based input, audio output, and seamless integration with OpenAI’s powerful language models.
Putting It All Together
With these services implemented, you have the foundation for a fully interactive .NET application that can generate text, convert text to speech, transcribe spoken input, and record audio. Here’s an example of how to use all four services in a cohesive application.
var openAiService = new OpenAiService();
var ttsService = new TextToSpeechService();
var sttService = new SpeechToTextService();
var recordingService = new AudioRecordingService();
var cancellationTokenSource = new CancellationTokenSource();
// Step 1: Generate a Text Response
string prompt = "Tell me something interesting about AI.";
string generatedText = await openAiService.GenerateResponseAsync(prompt);
Console.WriteLine("Generated Text: " + generatedText);
// Step 2: Convert Generated Text to Speech
await ttsService.ConvertTextToSpeechAsync(generatedText);
// Step 3: Record Audio Input
Console.WriteLine("Recording audio input. Press any key to stop recording...");
_ = recordingService.StartRecordingAsync("user_recording.wav", cancellationTokenSource.Token);
Console.ReadKey();
cancellationTokenSource.Cancel();
// Step 4: Transcribe Recorded Audio
string transcribedText = await sttService.TranscribeAudioAsync("user_recording.wav");
Console.WriteLine("Transcribed Text: " + transcribedText);
Conclusion
Using OpenAI’s .NET SDK alongside NAudio, you can bring powerful AI capabilities into your .NET applications. This integration covers:
- Text Generation: Generate contextually relevant responses.
- Text-to-Speech: Convert generated text to audio for a more interactive experience.
- Speech-to-Text: Capture and transcribe user input.
- Audio Recording: Enable seamless audio capture for user interactions.
This setup provides a complete, interactive pipeline that can power chatbots, virtual assistants, or any voice-enabled application. By following this guide, you’ll have a solid foundation for enhancing your .NET applications with AI-powered, voice-driven features.
Before You Go...
Did this guide help you level up your .NET skills with OpenAI integration? If so, let’s spread the knowledge! Give it a like, share with fellow devs, or drop a comment below. Every interaction helps boost this content, bringing these tips to more developers. And hey—if it didn’t deliver, no hard feelings; your silence speaks louder than clicks! 😉
Posted on November 7, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 7, 2024