Dalu46
Posted on November 19, 2024
According to Statistics by the Genius, 60% of job seekers quit because the interview process is too long or complicated, and the average process takes about 23 days.
The traditional recruitment process is time-consuming and prone to bias. Instead of interviewing each candidate individually, you can use an AI avatar to simultaneously conduct interviews with all candidates. This way, you no longer need to wait as long as 23 days as the recruitment process can be completed in two to three days.
This article will guide you through building a recruitment AI video avatar that will interview prospective candidates using Simli’s API. It will show how to build an application that can turn job descriptions into interactive interviews, streamlining your hiring process and improving the candidate experience.
Prerequisites
To follow along with this tutorial, make sure you have the following:
- An understanding of JavaScript and React.
- Node and the Node Package Manager installed.
- A Simli account. To get started, create a free Simli account.
- An OpenAI account.
You can find the complete source code on GitHub.
Generate Interview Questions With OpenAPI
OpenAI provides APIs that use a large language model trained on large quantities of data to generate text from a prompt. One of these APIs is the chat completions endpoint. By using this endpoint and providing a job description as a prompt, you can generate customized interview questions.
Once the questions are generated, the next step is to convert them to audio so they can be sent to the SimliClient API, which will generate a Lipsynced AI avatar to interview the applicant. Simli is an AI video generator that provides you with a speech-to-video API to create video AI avatars.
Also, OpenAI provides an audio API with a speech endpoint based on its TTS (text-to-speech) model. This speech endpoint requires three key inputs: the model, the text to be converted into audio, and the voice for audio generation. This functionality will be used to convert the generated questions to audio.
Here’s How The Application Will Work
- The recruiter enters a job description that is sent to OpenAI’s chat completions API.
- The API returns a list of interview questions as text.
- These questions are then sent to the speech endpoint, where they are converted into audio.
- Finally, the audio is sent to Simli’s API, which generates a realistic, lip-synced video avatar to deliver the questions, creating an interactive and engaging interview experience for candidates. ## Set Up Your API Environment
Simli’s endpoint requires an API key, which you can get by creating a Simli account. Once you’ve successfully created an account, you will be redirected to the user profile dashboard, where you can generate your API key and track your API usage.
Click the copy button and save it.
Next, create an account on OpenAI to receive the API key. Once you create an account, you’ll be redirected to the profile dashboard. On the dashboard, navigate to the API keys section.
In the API keys section, click Create new secret key, give the key a name, and click Create secret key button.
Your secret key will now be generated. Be sure to copy and store it in a secure location to retrieve it later easily.
Note: The OpenAI real-time API is in beta and only available for paying users.
With your API keys ready, you can build your AI-driven interview experience. The next step is selecting an AI avatar that aligns with your brand and hiring role.
Choosing Your AI Avatar for Recruitment
When selecting an avatar for your AI recruiter, consider the desired brand image and the role you're hiring for. This tutorial will use the 'Franco' avatar, which was randomly chosen. You can explore Simli's library of avatars to find the perfect fit for your needs.
Simli has a create avatar tool that allows users to create custom avatars by uploading images. Consider using this feature if none of the available faces suit your needs.
Here’s a picture of Franco in the red box:
Once you’ve selected your avatar, let’s bring it to life by building a Next.js application.
Setting Up the Next.js Project
To get started, create a Next.js app by running the following command:
npx create-next-app@latest interview-simli
This command will prompt a few questions about configuring the Next.js application. Here's the response to each question:
Next, navigate to the application and install dependencies. Run the following command:
cd recruitment-video-app
npm install simli-client openai github:openai/openai-realtime-api-beta
The SimliClient is a tool for integrating real-time audio and video streaming capabilities into your web applications using WebRTC.
Once the project is set up, run the development server:
npm run dev
Your Next.js project should now be running at http://localhost:3000
.
In your project's root directory, create a .env
file and store the Simli and OpenAI API key credentials as shown below.
NEXT_PUBLIC_SIMLI_API_KEY="your simli api key"
NEXT_PUBLIC_OPENAI_API_KEY="your openai api key"
Create Real-time Video Interactions With Applicants
In your project, Navigate to the src/app
folder, create a components
folder, and create an Interview.js
file. This component will set up the interactive interview interface where users can initiate and respond to interview questions generated by an AI avatar.
First, you need to declare state variables and references that help control and monitor different aspects of the component. To do so, paste the following code snippet:
// src/app/components/Interview.js
// State management
//...
const [isLoading, setIsLoading] = useState(false);
const [isAvatarVisible, setIsAvatarVisible] = useState(false);
const [error, setError] = useState("");
const [isRecording, setIsRecording] = useState(false);
const [userMessage, setUserMessage] = useState("...");
// Refs for various components and states
const videoRef = useRef(null);
const audioRef = useRef(null);
const openAIClientRef = useRef(null);
const audioContextRef = useRef(null);
const streamRef = useRef(null);
const processorRef = useRef(null);
// New refs for managing audio chunk delay
const audioChunkQueueRef = useRef([]);
const isProcessingChunkRef = useRef(false);
//...
This code block declares different state variables to track isLoading
, isAvatarVisible
, error
and userMessage
state.
It also creates multiple refs. Let’s look at what each one does:
-
videoRef
andaudioRef
provide direct access to the video and audio elements. They are required for configuring theSimliClient
. -
audioContextRef
andprocessorRef
manage audio processing and encoding, which is critical for capturing and sending audio data. -
audioQueueRef
is a buffer for audio chunks, ensuring seamless audio playback by queuing chunks until they’re ready to be sent. TheSimliClient
requires the audio to be sent in chunks in PCM16 format, with a 16KHz sample rate.
Simli Client Initialization
Now, let’s initialize the Simli client. Paste the following code snippet inside the Interview.js
file:
// src/app/components/Interview.js
...
// Initializes the Simli client with the provided configuration
...
const initializeSimliClient = useCallback(() => {
if (videoRef.current && audioRef.current) {
const SimliConfig = {
apiKey: process.env.NEXT_PUBLIC_SIMLI_API_KEY,
faceID: simli_faceid,
handleSilence: true,
maxSessionLength: 60, // in seconds
maxIdleTime: 60, // in seconds
videoRef: videoRef,
audioRef: audioRef,
};
simliClient.Initialize(SimliConfig);
console.log("Simli Client initialized");
}
}, [simli_faceid]);
//...
This code block above initializes and configures a new instance of the SimliClient
. Let’s break down each part of the SimliConfig
function:
-
apiKey
: This is Simli API key. -
faceID
: Represents the avatar face ID that will be rendered in the video stream. -
handleSilence
: This boolean indicates whether the client should handle silent moments in the audio stream (e.g., muting or pausing the video if no audio is detected). -
maxSessionLength
: Sets the maximum session length (in seconds). -
maxIdleTime
: Sets the maximum idle time (in seconds). The session will disconnect after 600 seconds (10 minutes) without activity. -
videoRef
and
audioRef
: These are references to the video and audio elements where the media streams will be displayed in the browser.
OpenAI Client Initialization
The next step is to initialize the OpenAI client. To do so, paste the following code inside the Interview.js
file:
// src/app/components/Interview.js
...
// Initializes the OpenAI client, sets up event listeners, and connects to the API
...
const initializeOpenAIClient = useCallback(async () => {
try {
console.log("Initializing OpenAI client...");
openAIClientRef.current = new RealtimeClient({
apiKey: process.env.NEXT_PUBLIC_OPENAI_API_KEY,
dangerouslyAllowAPIKeyInBrowser: true,
});
await openAIClientRef.current.updateSession({
instructions: initialPrompt,
voice: openai_voice,
turn_detection: { type: "server_vad" },
input_audio_transcription: { model: "whisper-1" },
});
// Set up event listeners
openAIClientRef.current.on(
"conversation.updated",
handleConversationUpdate
);
openAIClientRef.current.on("conversation.interrupted", () => {
interruptConversation();
});
openAIClientRef.current.on(
"input_audio_buffer.speech_stopped",
handleSpeechStopped
);
// openAIClientRef.current.on('response.canceled', handleResponseCanceled);
await openAIClientRef.current.connect();
console.log("OpenAI Client connected successfully");
setIsAvatarVisible(true);
} catch (error) {
console.error("Error initializing OpenAI client:", error);
setError(`Failed to initialize OpenAI client: ${error.message}`);
}
}, [initialPrompt]);
//...
The function initializeOpenAIClient
initializes the OpenAI client, which will handle real-time conversations with the applicant. The client is set up with an API key and an initial message that welcomes the user to the interview. After that, the event listeners are added to manage mistakes and updates within the conversation. Once the client is configured, isAvatarVisible
is set to true, which makes the avatar appear in the user interface.
Note: Setting
dangerouslyAllowAPIKeyInBrowser
totrue
is generally for development or prototyping, as your OpenAI API key could be exposed to the client-side, which is vulnerable. In a production environment, API calls are better handled on the server side by creating a secure Next.js API route, which keeps your key hidden.
Audio Processing and Sending
You need to create a function to reduce audio to 16 kHz and break it into smaller PCM chunks. To do so, paste the following code inside Interview.js
file:
// src/app/components/Interview.js
...
// Downsamples audio data from one sample rate to another
...
const downsampleAudio = (
audioData,
inputSampleRate,
outputSampleRate
) => {
if (inputSampleRate === outputSampleRate) {
return audioData;
}
const ratio = inputSampleRate / outputSampleRate;
const newLength = Math.round(audioData.length / ratio);
const result = new Int16Array(newLength);
for (let i = 0; i < newLength; i++) {
const index = Math.round(i * ratio);
result[i] = audioData[index];
}
return result;
};
//...
Sending Audio Data To Simli
To send audio data to SimliClient
for playback, you’ll create a function. To do so, paste the following code snippet:
// src/app/components/Interview.js
...
// Processes the next audio chunk in the queue.
...
const processNextAudioChunk = useCallback(() => {
if (
audioChunkQueueRef.current.length > 0 &&
!isProcessingChunkRef.current
) {
isProcessingChunkRef.current = true;
const audioChunk = audioChunkQueueRef.current.shift();
if (audioChunk) {
const chunkDurationMs = (audioChunk.length / 16000) * 1000; // Calculate chunk duration in milliseconds
// Send audio chunks to Simli immediately
simliClient?.sendAudioData(audioChunk);
console.log(
"Sent audio chunk to Simli:",
chunkDurationMs,
"Duration:",
chunkDurationMs.toFixed(2),
"ms"
);
isProcessingChunkRef.current = false;
processNextAudioChunk();
}
}
}, []);
//...
The processNextAudioChunk
function checks if there are any chunks in audioQueueRef
. If so, it takes the next chunk, sends it to Simli for playback, and removes it from the queue. This ensures that only one chunk is sent at a time, providing a smooth playback experience for the user without overlapping audio. Then, recursively call the function to process the next chunk in the queue.
Handle OpenAI Responses
Next, a function will be created to manage responses from the OpenAI API.
// src/app/components/Interview.js
...
// Handles conversation updates, including user and assistant messages
...
const handleConversationUpdate = useCallback((event) => {
console.log("Conversation updated:", event);
const { item, delta } = event;
if (item.type === "message" && item.role === "assistant") {
console.log("Assistant message detected");
if (delta && delta.audio) {
const downsampledAudio = downsampleAudio(delta.audio, 24000, 16000);
audioChunkQueueRef.current.push(downsampledAudio);
if (!isProcessingChunkRef.current) {
processNextAudioChunk();
}
}
} else if (item.type === "message" && item.role === "user") {
setUserMessage(item.content[0].transcript);
}
}, []);
//Handles interruptions in the conversation flow.
const interruptConversation = () => {
console.warn("User interrupted the conversation");
simliClient?.ClearBuffer();
openAIClientRef.current?.cancelResponse("");
};
//...
The code above defines two functions: handleConversationUpdate
and interruptConversation
. The handleConversationUpdate
function first checks if the message is from the assistant. Then, it checks If the assistant’s message includes audio data; if true, it uses the downsampleAudio
function to convert the audio to a lower sample rate (24,000 Hz to 16,000 Hz).
This downsampled audio is added to audioChunkQueueRef.current
, a reference for storing audio chunks. If no chunk is being processed, the processNextAudioChunk()
function (which we have previously declared) is called to start processing the audio chunks.
The interruptConversation
function handles conversation interruption. If the user interrupts the interviewer, the function clears the SimliClient buffer and cancels the ongoing response from the OpenAI API.
Audio Recording
Next, let’s create functions to handle audio recording when the prospective candidate is talking. Paste the following code:
// src/app/components/Interview.js
...
// Starts audio recording from the user's microphone
...
const startRecording = useCallback(async () => {
if (!audioContextRef.current) {
audioContextRef.current = new AudioContext({ sampleRate: 24000 });
}
try {
console.log("Starting audio recording...");
streamRef.current = await navigator.mediaDevices.getUserMedia({
audio: true,
});
const source = audioContextRef.current.createMediaStreamSource(
streamRef.current
);
processorRef.current = audioContextRef.current.createScriptProcessor(
2048,
1,
1
);
processorRef.current.onaudioprocess = (e) => {
const inputData = e.inputBuffer.getChannelData(0);
const audioData = new Int16Array(inputData.length);
let sum = 0;
for (let i = 0; i < inputData.length; i++) {
const sample = Math.max(-1, Math.min(1, inputData[i]));
audioData[i] = Math.floor(sample * 32767);
sum += Math.abs(sample);
}
openAIClientRef.current?.appendInputAudio(audioData);
};
source.connect(processorRef.current);
processorRef.current.connect(audioContextRef.current.destination);
setIsRecording(true);
console.log("Audio recording started");
} catch (err) {
console.error("Error accessing microphone:", err);
setError("Error accessing microphone. Please check your permissions.");
}
}, []);
// Stops audio recording from the user's microphone
const stopRecording = () => { const stopRecording = useCallback(() => {
if (processorRef.current) {
processorRef.current.disconnect();
processorRef.current = null;
}
if (streamRef.current) {
streamRef.current.getTracks().forEach((track) => track.stop());
streamRef.current = null;
}
setIsRecording(false);
console.log("Audio recording stopped");
}, []);
//...
Here’s what each function does:
-
startRecording
:- Requests microphone access, creates an audio context and streams audio data.
- Audio data is captured, converted to PCM format for compatibility, and sent to the OpenAI client for processing.
-
stopRecording
: Closes the audio context and disconnects the processor.
Interaction Start and Stop
// src/app/components/Interview.js
...
// Handles starting the interaction
...
const handleStart = useCallback(async () => {
setIsLoading(true);
setError("");
try {
await simliClient?.start();
await initializeOpenAIClient();
} catch (error) {
console.error("Error starting interaction:", error);
setError(`Error starting interaction: ${error.message}`);
} finally {
setIsAvatarVisible(true);
setIsLoading(false);
}
}, [initializeOpenAIClient]);
// Handles stopping the interaction, cleaning up resources and resetting states.
const handleStop = useCallback(() => {
console.log("Stopping interaction...");
setIsLoading(false);
setError("");
stopRecording();
setIsAvatarVisible(false);
simliClient?.close();
openAIClientRef.current?.disconnect();
if (audioContextRef.current) {
audioContextRef.current.close();
}
stopRecording();
onClose();
console.log("Interaction stopped");
}, [stopRecording]);
In the code above, the handleStart
function initializes the interaction by starting the necessary clients and preparing the interface for recording. By calling the simliClient?.start()
method, the SimliClient
initiates a WebRTC handshake to negotiate a connection between the client and Simli's server.
The handleStop
function stops the interaction by calling the simliClient?.close()
method, cleaning up used resources, like client connections, and updating the loading and avatar visibility states.
Component Mount and Cleanup
Finally, for this component, we need to initialize the simliClient
when the component mounts. Paste the following code:
// src/app/components/Interview.js
...
// Effect to initialize Simli client once the component mounts and clean up resources on unmount
...
useEffect(() => {
initializeSimliClient();
if (simliClient) {
simliClient?.on("connected", () => {
console.log("SimliClient connected");
const audioData = new Uint8Array(6000).fill(0);
simliClient?.sendAudioData(audioData);
console.log("Sent initial audio data");
startRecording();
});
simliClient?.on("disconnected", () => {
console.log("SimliClient disconnected");
});
}
return () => {
try {
simliClient?.close();
openAIClientRef.current?.disconnect();
if (audioContextRef.current) {
audioContextRef.current.close();
}
} catch {}
};
}, [initializeSimliClient]);
This useEffect
hook initializes the simliClient
when the component mounts, setting up event listeners for when it connects or disconnects. On connection, it sends a silent audio signal to keep the connection alive and starts recording audio. The cleanup function, triggered on component unmount, closes simliClient
, and disconnects the OpenAI client.
Next, navigate to the src/app/pages.js
file and paste the following code:
// src/app/components/Interview.js
...
// configure the avatar and display the home page
...
"use client";
import React, { useState, useEffect } from "react";
import Interview from "./components/Interview";
const Demo = () => {
const [jobDescription, setJobDescription] = useState("");
const avatar = {
name: "Frank",
openai_voice: "alloy",
simli_faceid: "5514e24d-6086-46a3-ace4-6a7264e5cb7c",
initialPrompt: `Your name is Frank, an interviewer hiring for a specific role. You are looking for a candidate whose expertise aligns closely with the following job description: ${jobDescription}. Please generate three interview questions that assess key qualifications and relevant experience. Begin by introducing yourself and asking the interviewee to share a bit about their background.`,
};
return (
<div className="bg-black min-h-screen flex flex-col items-center font-abc-repro font-normal text-sm text-white p-8">
<div className="flex flex-col items-center mt-4">
<label htmlFor="job-description" className="font-bold mb-2">
Add Job Description
</label>
<textarea
id="job-description"
placeholder="Enter job description, e.g., Responsibilities, Requirements"
value={jobDescription}
onChange={(e) => setJobDescription(e.target.value)}
className="p-2 border border-gray-300 rounded-md w-80 h-24 resize-none mb-4 text-black"
/>
</div>
<div className="flex flex-col items-center gap-6 bg-effect15White p-6 pb-[40px] rounded-xl w-full">
<div>
<Interview
openai_voice={avatar.openai_voice}
simli_faceid={avatar.simli_faceid}
initialPrompt={avatar.initialPrompt}
/>
</div>
</div>
</div>
);
};
export default Demo;
This code handles the job description input from the recruiter and passes it to the initial prompt, the face ID, and the OpenAI voice that should be sent to the OpenAI API. Then, it is passed on as props to Interview.js
. component.
The Final Result
To test the app, the user inputs the job description in a text area, which OpenAI API processes. The OpenAI API uses this prompt to generate three questions for the avatar to ask the applicant.
Here's the video link to see how the application works:
Conclusion
By leveraging SimliClient and OpenAI, this guide provides a comprehensive solution to the problem of time-consuming recruitment by building an application that converts static job descriptions into dynamic, interactive interview videos. Combining these tools is a game-changer because together they can be utilized to automate the initial candidate screening process and ease the stress of going through the traditional recruitment process.
Simli’s API comes with a free plan. Sign up on Simli today to get started.
Posted on November 19, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.