Automating Faceless Shorts Videos for YouTube and TikTok Using OpenAI and ElevenLabs

Creating short videos for YouTube and TikTok can be time-consuming, but with the right tools, you can automate the process. In this guide, we’ll show you how to use OpenAI, ElevenLabs, and MoviePy to automatically generate faceless videos from a script—no camera or microphone required.

Let’s break it down step-by-step.

1. Setting Up the APIs

You’ll need API keys for OpenAI (for generating images) and ElevenLabs (for voiceovers). Get these from their respective websites.

import openai
from elevenlabs import ElevenLabs

openai.api_key = "your_openai_api_key"
elevenlabs_client = ElevenLabs(api_key="your_elevenlabs_api_key")

Replace "your_openai_api_key" and "your_elevenlabs_api_key" with your actual API keys.

2. Prepare the Script

Your video content starts with a script. For example, here’s a quick one about Dogecoin:

story_script = """
Dogecoin began as a joke in 2013, inspired by the popular 'Doge' meme. It eventually evolved into a legitimate cryptocurrency with support from figures like Elon Musk.
"""

This script will be used to generate images and voiceovers for each sentence.

3. Generate Images with OpenAI’s DALL-E

For each sentence in your script, we’ll generate a corresponding image. Here’s how you can do it:

def generate_image_from_text(sentence, context, idx):
    prompt = f"Generate an image that describes: {sentence}. Context: {context}"
    response = openai.images.generate(
        model="dall-e-3",
        prompt=prompt,
        size="1024x1792",
        response_format="b64_json"
    )

    image_filename = f"images/image_{idx}.jpg"
    with open(image_filename, "wb") as f:
        f.write(base64.b64decode(response.data[0].b64_json))
    return image_filename

This function takes each sentence and generates an image that best matches the description.

4. Create Voiceovers with ElevenLabs

Next, we’ll generate voiceovers for each sentence using ElevenLabs.

def generate_audio_from_text(sentence, idx):
    audio = elevenlabs_client.text_to_speech.convert(
        voice_id="pqHfZKP75CvOlQylNhV4",
        model_id="eleven_multilingual_v2",
        text=sentence,
        voice_settings=VoiceSettings(stability=0.2, similarity_boost=0.8)
    )
    audio_filename = f"audio/audio_{idx}.mp3"
    with open(audio_filename, "wb") as f:
        for chunk in audio:
            f.write(chunk)
    return audio_filename

This converts each sentence into a corresponding voiceover file.

5. Sync Images and Audio Using MoviePy

Now, we’ll combine the images and audio into video clips. Here’s how:

from moviepy.editor import ImageClip, AudioFileClip

image_clip = ImageClip(image_path, duration=audio_clip.duration)
image_clip = image_clip.set_audio(audio_clip)
video_clips.append(image_clip.set_fps(30))

Each image will be displayed for the duration of its associated audio.

6. Add Video Effects

To make the video more engaging, we’ll apply effects like zoom and fade. Here’s a basic zoom-in effect:

def apply_zoom_in_center(image_clip, duration):
    return image_clip.resize(lambda t: 1 + 0.04 * t)

These effects keep the visuals dynamic and interesting without too much effort.

7. Assemble the Final Video

Once all the clips are ready, we’ll concatenate them into a single video.

final_video = concatenate_videoclips(video_clips, method="compose")
final_video.write_videofile(output_video_path, codec="libx264", audio_codec="aac", fps=30)

This outputs your final video, ready for upload.

8. Add Captions (Optional)

Captions make videos more accessible. We use Captacity to automatically add them.

captacity.add_captions(
    video_file=output_video_path,
    output_file="captioned_video.mp4",
    font_size=130,
    font_color="yellow"
)

9. Add Background Music

To finish, we’ll add background music to the video. The music is downloaded and synced with the video’s length.

background_music = AudioFileClip(music_filename).subclip(0, final_video.duration).volumex(0.2)
narration_audio = final_video.audio.volumex(1.5)
combined_audio = CompositeAudioClip([narration_audio, background_music])
final_video.set_audio(combined_audio)

See the GitHub Project for this post!

This process powers our Faceless Shorts Video service on Robopost, where we generate short-form videos automatically. By leveraging OpenAI for visuals and ElevenLabs for narration, we’ve created an efficient, scalable system for producing content without manual editing.

Now, you can create high-quality, faceless videos for YouTube or TikTok without spending hours in front of a camera. This approach works for educational videos, storytelling, or viral content—whatever suits your needs.

Blog