🦄 Memoire: Create Narrated Videos with AI in Minutes!

This is a submission for the The Pinata Challenge

✒️ Introduction

Creating captivating videos with engaging narratives can be time-consuming and complex. It may even end up unprofessional. Ever try outsourcing a narration/voiceover to someone? Get ready to cough up a good amount of money for that. What if there was a way to simplify this process using AI? And cheaper too?

Meet Memoire, an AI-powered tool designed to create narrated videos in minutes. Whether you're a content creator, a marketer, or just someone who loves sharing stories, Memoire is here to transform your ideas into stunning videos effortlessly.

In this article, I'll walk you through Memoire, showcasing its features, the challenges faced during development, and the exciting possibilities it offers.

🔐 Key Features

1/ Full-Featured Authentication: Memoire ensures security and user experience with its robust authentication system powered by NextAuth, allowing only verified users to access the app. The system includes beautifully designed emails for account verification and password resets, enhancing both functionality and user engagement.

2/ Upload Media and Generate Descriptions: You can upload your photos, and Memoire will generate accurate and engaging descriptions for them. If the description is missing important context, you can easily add your input and regenerate a more fitting description.

3/ Media Transitions: Elevate your video storytelling with Memoire's diverse media transitions, offering options like "fade," "wipeleft," "slideup," and more. These transitions provide a professional touch, ensuring smooth and visually appealing scene changes in your videos.

4/ Sortable Media List: Uploading photos in batches can sometimes lead to an unpredictable order of completion. With Memoire, you can easily drag and drop media boxes to arrange them in the order you prefer.

5/ AI Script Generation: Memoire uses Google's Gemini 1.5 Pro model to generate scripts for your videos. This ensures high-quality, contextually relevant scripts that enhance your video narratives.

6/ AI Audio Generation with Selectable Voices: Powered by OpenAI's TTS-1 model, Memoire offers customizable voices for your narrations. Choose from Echo, Alloy, Fable, Onyx, Nova, and Shimmer to find the perfect voice for your project.

7/ Project Settings: Customize your project by adding a description, which helps the AI generate better scripts. You can also change your project's aspect ratio and frame rate to suit your needs.

8/ In-Browser Output Generation: Memoire uses Remotion to generate video previews directly in your browser. Although the preview has some minor differences from the final output, fixes are underway to improve it.

9/ AI Music Generation: Memoire leverages Meta's Music Gen model to generate background music for your videos. This feature is still a work in progress and is not available for public testing yet.

10/ AI Powered Subtitle Generation: Using OpenAI's Whisper model, Memoire can generate subtitles for your videos. This feature is also in development and will be available soon.

🛠️ Tech Stack

FrontEnd: TypeScript, Next.js, DND Kit
BackEnd: Next.js API Routes, Server Actions, Prisma
Styling: Tailwind CSS, shadcn/ui components
File Storage: Pinata
Rate Limit: Upstash
Authentication: Next Auth
AI Models: Google's Gemini 1.5 Pro, OpenAI's TTS-1, Meta's Music Gen, OpenAI's Whisper
In-Browser Preview: Remotion

🦄 How I Used Pinata

I had fun trying out a couple of stuff with Pinata! Here they are:

1/ Multi-File Upload Component (w/ Progress Tracking) (MediaPane.tsx):
Pinata's raw API endpoint was leveraged to create a robust multi-file upload component with real-time progress tracking. This approach offers more control, and a better user experience compared to using the SDK.

Key Features:

Direct upload to Pinata using axios
JWT-based authentication for secure uploads
Real-time upload progress tracking

Here's how it works:

a. Fetch JWT for authentication:

const keyRequest = await fetch('/api/key');
const keyData = await keyRequest.json() as { JWT: string };

b. Prepare and send the upload request:

const UPLOAD_ENDPOINT = `https://uploads.pinata.cloud/v3/files`;
const formData = new FormData();
formData.append(`file`, addedFileState.file);

const { data: uploadResponse }: AxiosResponse<{ data: PinataUploadResponse }> = await axios.post(UPLOAD_ENDPOINT, formData, {
    headers: {
        Authorization: `Bearer ${keyData.JWT}`
    },
    onUploadProgress: async (progressEvent) => {
        if (progressEvent.total) {
            const percentComplete = (progressEvent.loaded / progressEvent.total) * 100;
            updateFileProgress(addedFileState.key, percentComplete);
        }
    }
});

c. Track upload progress:

onUploadProgress: async (progressEvent) => {
    if (progressEvent.total) {
        const percentComplete = (progressEvent.loaded / progressEvent.total) * 100;
        updateFileProgress(addedFileState.key, percentComplete);
    }
}

d. Handle the upload response and prepare metadata:

await new Promise(resolve => setTimeout(resolve, 1000));
updateFileProgress(addedFileState.key, 'COMPLETE');

const data = addedFileState.type === 'PHOTO'
    ? await getPhotoDimensions(addedFileState.preview)
    : await getVideoDimensions(addedFileState.preview);

const metadata = { ...data, cid: uploadResponse.data.cid, type: addedFileState.type };

This implementation allows for a seamless upload experience with visual feedback, enhancing user interaction during the potentially time-consuming process of uploading media files.

2/ Custom Image Component (PinataImage.tsx):

A custom PinataImage component was created to efficiently handle image retrieval, caching, and display. This component optimizes performance by reducing unnecessary network requests and leveraging browser storage.

Key Features:

Local caching using IndexedDB
Signed URL generation for secure access
Lazy loading and skeleton placeholders

Here's a breakdown of its functionality:

a. Check for cached images:

const cachedImage = await db.images.where({ cid, width, height }).first();
if (cachedImage) {
    setImageUrl(URL.createObjectURL(cachedImage.blob));
    return;
}

b. Generate signed URL for secure access:

const params = new URLSearchParams({
    cid,
    width: width?.toString() || '',
    height: height?.toString() || '',
    expires
});

const response = await fetch(`/api/getSignedUrl?${params}`);
if (!response.ok) {
    throw new Error('Failed to fetch signed URL');
}

const data = await response.json() as { url: string };

c. Fetch and cache the image:

const imageResponse = await fetch(`/api/getImage?url=${encodeURIComponent(data.url)}`);
if (!imageResponse.ok) {
    throw new Error('Failed to fetch image');
}

const blob = await imageResponse.blob();
const objectUrl = URL.createObjectURL(blob);
setImageUrl(objectUrl);

await db.images.put({ cid, width: Number(width), height: Number(height), blob });

d. Render the image or a skeleton placeholder:

const renderedImage = useMemo(() => {
    if (imageUrl) {
        return (
            <Image
                src={imageUrl}
                unoptimized={!!src}
                width={Number(width)}
                height={Number(height)}
                alt={alt}
                className={className}
                crossOrigin='anonymous'
                {...props}
            />
        );
    } else {
        return (
            <Skeleton className={className} />
        );
    }
}, [imageUrl, width, height, src, alt, className, props]);

This component ensures efficient loading and display of images stored on Pinata, improving the overall performance and user experience of Memoire.

3/ Media Management and Retrieval (VideoPreview.tsx):

In addition to uploading and displaying images, Pinata is used for storing and retrieving various types of media, including audio and video files. This is evident in the VideoPreview component:

a. Retrieve media files using their CIDs:

const getMediaUrl = useCallback(async (cid: string, projectId: string, type: 'media' | 'audio'): Promise<string> => {
    try {
        if (typeof window === 'undefined') {
            return '';
        }

        const table = type === 'media' ? db.media : db.audio;
        let item = await table.where({ cid }).first();
        if (item) {
            return URL.createObjectURL(item.file);
        }

        const response = await fetch(`/api/getFile?cid=${encodeURIComponent(cid)}`);
        if (!response.ok) {
            throw new Error(`HTTP error! status: ${response.status}`);
        }

        const blob = await response.blob();

        await table.put({
            cid,
            file: blob,
            projectId
        });

        return URL.createObjectURL(blob);
    } catch (error) {
        return ''
    }
}, []);

b. Load audio files for narration:

const loadAudio = useCallback(async () => {
    if (narration?.audioCid) {
        const audioUrl = await getMediaUrl(narration.audioCid, project.id, 'audio');
        setLoadedAudioUrl(audioUrl);
        setNarration({ audioUrl });
    }
    // eslint-disable-next-line react-hooks/exhaustive-deps
}, [narration?.audioCid, project.id, getMediaUrl]);

c. Load and sort media items:

const loadMediaItems = useMemo(() => async () => {
    try {
        const loadedItems = await Promise.all(
            mediaItems.map(async (media) => ({
                ...media,
                url: await getMediaUrl(media.cid, project.id, 'media')
            }))
        );

        const sortedMediaItems = [...loadedItems].sort((first, next) =>
            project.mediaOrder.indexOf(first.id) - project.mediaOrder.indexOf(next.id)
        );

        // Compare sortedMediaItems with loadedMediaItems
        const hasChanged = loadedMediaItems.length === 0 ||
            sortedMediaItems.length !== loadedMediaItems.length ||
            sortedMediaItems.some((item, index) => {
                const loadedItem = loadedMediaItems[index];
                return !loadedItem ||
                    item.duration !== loadedItem.duration ||
                    item.transition !== loadedItem.transition;
            });

        if (hasChanged) {
            setLoadedMediaItems(sortedMediaItems);
        }

        await loadAudio();
    } catch (error) {
        console.error('Error loading media items :>>', error);
    }
}, [mediaItems, loadedMediaItems, getMediaUrl, project.id, project.mediaOrder, loadAudio]);

This comprehensive approach to media management allows for efficient storage, retrieval, and playback of various media types within Memoire.

💪 Challenges Faced

1/ Pinata Integration: Working with Pinata was an intriguing experience. Their JavaScript SDK for uploading files presented a challenge: it lacked a built-in method for tracking upload progress, which was crucial for my project to provide users with real-time feedback. Determined to find a solution, I dove into their documentation and discovered that I could use the API directly to achieve this.

Also, instead of following the conventional approach of prefetching signed URLs, I opted for a different route. I made API calls directly from the front end and cached the responses using IndexedDB. This innovative strategy allowed me to load each file only once, significantly minimizing the number of API calls to Pinata and ultimately saving on credits 😬. It was a rewarding challenge that pushed me to think creatively and efficiently!

2/ AI Integration: Integrating AI services for narration and script generation was a significant challenge. Ensuring that the AI produces high-quality output required extensive testing and fine-tuning. I also ran into rate limits while I was testing aggressively.

3/ User Experience: Creating an intuitive and user-friendly interface was crucial. I spent a considerable amount of time designing and iterating on the UI to ensure it meets users' needs while being aesthetically pleasing. This was a lot tougher for me because I didn't have the time to bring in a designer to work with me ;(.

📸 Screenshots

🔗 Project Link

Link: https://dub.sh/MemoireDemo

💻 Code Repository

Link: https://git.new/MemoireRepo

⚠ Known Issues

1/ Narration audio not syncing up with video.
2/ Video preview component flickers unnecessarily on first load.

✨ Conclusion

Memoire is designed to simplify video creation. By harnessing the power of AI, I've made it possible to produce high-quality narrated videos in minutes for dirt cheap. Whether you're looking to create content for social media, marketing campaigns, or personal projects, Memoire has you covered.

I'm excited to see what you'll create with Memoire. Feel free to share your feedback and let me know how I can improve. Stay tuned for more updates and features!