Searching pdf files - Coding a Google custom search engine (gcse) component in React

mjoycemilburn

MartinJ

Posted on September 14, 2022

Searching pdf files - Coding a Google custom search engine (gcse) component in React

Introduction

Most large organisations will hold huge archives of pdf documents. Searching for information in these represents a major challenge.

In fact, I've no idea how you might set about this under your own steam. Fortunately, Google's custom search engine (gcse) facility makes the task a five minute job.

The gcse is a little known but quite amazingly useful piece of Google magic - one that should be at the top of the toolbox for anyone who has responsibility for document archive management. Basically it allows you to target the entire might of the Google search engine at a specified folder element in your url. And had you realised that Google searches handle pdf files as easily as rendered websites?

The first step in setting up the procedure is to use the Programmable Search Engine Homepage to specify your search target - see the "Create a search engine" section of Google's Getting started with Programmable Search Engine support page.

For this, all you need is a Google account. The instructions referenced above will then enable you to use the Google Programmable Search Engine Console to register a search engine, named with a tag that you choose yourself and keyed on a unique Search Engine Id supplied by Google.

Coding a GCSE reference in a webapp

To use your gcse in a conventional Javascript webapp, all you need is the following tiny packet of javascript:

<script src="https://cse.google.com/cse.js?cx=" + mySearchengineId></script>
<div class="gcse-search"></div>
Enter fullscreen mode Exit fullscreen mode

The effect of this will be to display a text-search input field and a search icon button.

Submitting a search specification will then typically present the results in a popup window (depending on your choice of gcse layout).

This arrangement has worked flawlessly for me in the past - the only time I ever encountered a problem was when <td> and <th> styles in my webapp's stylesheets collided with Google's use of these elements in its script. This was easily fixed by qualifying my styles with a classname.

But I was initially stumped when I wanted to use a cse in a React webapp.

Information on the web on this is patchy but eventually I hit on a sandbox registered by khrismuc at React cascading select.

This uses a React useEffect hook to reach into the DOM and invoke Google's cse script. All I had to do in my in particular case (where I was searching a folder of newsletter pdfs for my application) was to create myself a component as follows:

import React, { useEffect } from "react";

function NewsletterGcseSearch() {
    useEffect(() => {
        const script = document.createElement("script");
        document.head.append(script);
        script.src = "https://cse.google.com/cse.js?cx=00111 .... 6146:di0ylihvlxu";
    }, []);

    return (
            <div className="gcse-search"></div>
    );
}

export { NewsletterGcseSearch };
Enter fullscreen mode Exit fullscreen mode

Here, obviously, I've obscured my Search Engine Id, but I'm sure you'll get the idea. All I then had to do then was to import the component into my webapp and render it as <NewsletterGcseSearch/>

Thank you Google (and khrismuc) - much obliged.

💖 💪 🙅 🚩
mjoycemilburn
MartinJ

Posted on September 14, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related