Building a Chat-GPT Web Clone: Adding File Upload and Speech Recognition [Next.js 13 App Router]

ervinsungkono

Ervin Cahyadinata Sungkono

Posted on July 18, 2023

Building a Chat-GPT Web Clone: Adding File Upload and Speech Recognition [Next.js 13 App Router]

Project Idea 💭

Recently, I created a Chat-GPT Web Clone having the exact features the official Chat-GPT website has. However, making the exact same website feels kind of effortless to me. I realized that I had to innovate and add something new to my Chat-GPT Web Clone project.

While using the official Chat-GPT website, I realized a few missing features that would've been useful if it was added. For example:

  • Adding file upload that allows us to upload documents(right now GPT-4 can do this, but GPT-3.5 still can't).
  • Adding speech recognition that allows us to talk to the bot without directly typing on our keyboard.

That is why I decided to add these two features to my Chat-GPT Web Clone project.

Adding New Features ⭐

For the file upload feature, I used an npm package pdf-parse. And for the speech recognition feature, I used react-speech-recognition, @speechly/speech-recognition-polyfill, and regenerator-runtime.

npm install pdf-parse react-speech-recognition @speechly/speech-recognition-polyfill regenerator-runtime
Enter fullscreen mode Exit fullscreen mode

File Upload

First to make the file upload, I created a file at the /app/api directory and named it pdf-to-text.js, this file will be used for our API endpoint for parsing .pdf files. Inside the file we place in our code for the API:

import { NextResponse } from "next/server"
import pdf from 'pdf-parse/lib/pdf-parse'

export async function POST(request){
    const formData = await request.formData()
    const pdfFile = formData.get('pdfFile')
    const buffer = Buffer.from(await pdfFile.arrayBuffer())

    try {
        const parsedPdf = await pdf(buffer)
        return NextResponse.json(parsedPdf)
    } catch (error) {
        return NextResponse.json({ error: error.message }, { status: 400 })
    }
}
Enter fullscreen mode Exit fullscreen mode

What this code does is that when there is a POST request to the API with the endpoint /api/pdf-to-text, this will get the submitted PDF file and parse it using the pdf-parse library. Then, the returned result will be a JSON object containing the following properties:

{
    numpages: integer,
    numrender: integer,
    info: [Object],
    metadata: [Object],
    version: string,
    text: string,  
}
Enter fullscreen mode Exit fullscreen mode

Next, I created a helper function in /app/lib/api.js for fetching the API I just created.

export const getParsedPdf = async(formData) => {
    const response = await fetch("/api/pdf-to-text",{
        method: "POST",
        body: formData
    })
    .then(res => res.status === 400 ? "" : res.json())
    .catch(err => console.log(err))

    return response
}
Enter fullscreen mode Exit fullscreen mode

After that, inside my InputField component, I created a button that has a click event listener that will open file upload.

InputField.js

"use client"
...
import { AiFillFileText as TxtIcon } from "react-icons/ai"
...

export default function InputField({ name, value, setValue, placeholder, autoFocus = false, disabled = false, disableInput = false }){
    const uploadTxt = () => {
        const fileInput = document.createElement('input')
        fileInput.type = 'file'
        fileInput.accept = '.txt, .pdf'
        fileInput.multiple = true
        fileInput.formEnctype = "multipart/form-data"
        fileInput.onchange = _ => {
            const files =  Array.from(fileInput.files)
            files.forEach(async(file, index)=> {
                if(file.type === "application/pdf") {
                    const formData = new FormData()
                    formData.append("pdfFile", file)
                    const extractedText = await getParsedPdf(formData).then(res => res.text)
                    setValue(prevValue => `${prevValue}${index > 0 ? "\n" : ""}${extractedText.trim()}`)
                }
                else{
                    const reader = new FileReader()
                    reader.onload = () => {
                        setValue(prevValue => `${prevValue}${index > 0 ? "\n" : ""}${reader.result.trim()}`)
                    }
                    if(file) reader.readAsText(file)
                }
            })
        }
        fileInput.click()
    }
    return(
    ...
        <button type="button" className="p-2 hover:bg-gray-300 dark:hover:bg-gray-900 rounded-md text-gray-600 dark:text-gray-400 transition-colors duration-200" onClick={uploadTxt}>
            <TxtIcon size={16}/>
        </button>
    ...
   )
}
Enter fullscreen mode Exit fullscreen mode

From the code above, the uploadTxt function will be triggered when the button is clicked. It will open a file upload and wait for a file to be inputted, I set the accept attribute to .txt/.pdf only to ensure that the file uploaded can be parsed to raw text.

After receiving a file, it will check the file type, if it is a pdf then it will parse it using the function getParsedPdf and get the text property from the response. Otherwise, it will read it using the FileReader's method readAsText.

Speech Recognition

And now for the speech recognition. I created a component called Dictaphone.js and added these code:

Dictaphone.js

"use client"
import { useEffect } from 'react'
import 'regenerator-runtime/runtime'
import SpeechRecognition, { useSpeechRecognition } from 'react-speech-recognition'
import { createSpeechlySpeechRecognition } from '@speechly/speech-recognition-polyfill'

import { BsFillMicFill } from "react-icons/bs"

const speechlyAppId = process.env.NEXT_PUBLIC_SPEECHLY_APP_ID
if(speechlyAppId){
  const SpeechlySpeechRecognition = createSpeechlySpeechRecognition(speechlyAppId)
  SpeechRecognition.applyPolyfill(SpeechlySpeechRecognition)
}

export default function Dictaphone({ setPreviewSpeech, handleSpeechEnd }){
  const {
    transcript,
    resetTranscript,
    listening,
    browserSupportsSpeechRecognition,
    isMicrophoneAvailable
  } = useSpeechRecognition()

  useEffect(() => {
    if(transcript.length > 0){
      setPreviewSpeech(transcript)
    }
    if(!listening) {
      handleSpeechEnd()
      resetTranscript()
    }
  }, [listening, transcript]);

  const startListening = () => SpeechRecognition.startListening({ continuous: true, language: 'en-US' });

  if (!browserSupportsSpeechRecognition) return null

  return (
    <button 
        type='button' 
        disabled={!isMicrophoneAvailable} id='mic-btn'
        className={`${(listening && isMicrophoneAvailable) ? "bg-green text-white" : "bg-white dark:bg-gray-700 text-gray-600 dark:text-gray-400 hover:bg-gray-300 dark:hover:bg-gray-900 border-black/10 dark:border-white/10 disabled:bg-gray-200 dark:disabled:bg-gray-800"} disabled:text-gray-600/40 dark:disabled:text-gray-400/40 border p-3 shadow-md rounded-full transition-colors duration-200`}
        onTouchStart={startListening}
        onMouseDown={startListening}
        onTouchEnd={SpeechRecognition.stopListening}
        onMouseUp={SpeechRecognition.stopListening}
    >
        <BsFillMicFill/>
    </button>
  )
}
Enter fullscreen mode Exit fullscreen mode

I created an environment variable called NEXT_PUBLIC_SPEECHLY_APP_ID to use the Speechly polyfill, you can get it here. Then, I followed the documentations given by react-speech-recognition to use their package.

I connected the given transcript to the text input from the InputField component earlier. When we start touching the Mic button, the Dictaphone starts listening continously(won't stop until the button is released). When the Dictaphone stops listening, we call resetTranscript to set the transcript back to an empty string.

Image description

Challenges 🎯

Some of the challenges I had while making these features are:

  1. Problems related to the pdf-parse library. Because pdf-parse is an old library and unmaintained, there was some issues I had to handle before finally getting a working pdf to text function.
  2. react-speech-recognition is not supported by most browsers. I solved this issue by implementing the Speechly polyfill, which helped in supporting other browsers than Chrome.

Conclusion

And that is how I added file upload and speech recognition feature to my Chat-GPT Web Clone project. You can check out the full code on this Github Repo or view the demo website by clicking this link.

Thank you for your time reading this article and feel free to leave a comment if you have any questions or feedbacks.

💖 💪 🙅 🚩
ervinsungkono
Ervin Cahyadinata Sungkono

Posted on July 18, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related