Ashiqur Rahman
Posted on July 23, 2022
Have you ever been put up in a situation where you have to make a huge React.js app SEO-friendly but you dont have much time in your hand to migrate the app to a framework like Next.js or Gatsby.js that supports server-side-rendering?
This solution im gonna discuss today can really help you server-side-render your react application in just a few minutes using a tool like Selenium and a webserver.
NOTE: There seems to be a common misinformation on the internet that the library React-Helmet is the SEO solution for React. But, ITS NOT, atleast not all by it's own. Its important to understand that, React Helmet uses javascript to insert the <meta>
tags into the DOM. However, when the google or facebook bot comes to crawl your website, they don't execute javascript. Therefore, the page the bots see when they come to your website does not contain the <meta>
tags and the bots can't learn much about your site. Another library that works the same way as React-Helmet is React-Meta-Tags. We would still need a library like this but that will only work after we implement the ideas discussed farther down this post.
In my case, the REST API that the React front end was consuming was built using python. So, Im gonna be using the python Selenium package. But, you can use the idea regardless of which backend technology your project uses. Another thing i want to mention is that, my React app was being served by a Nginx webserver. But again, you should be able to apply the idea which basically just requires you to update the config of whatever webserver you are using.
Solution
Step 1: Update React App webserver config
As mentioned earlier, the React app i was working on was being served through Nginx. Here's what i changed in the existing Nginx config,
location / {
set $server_side_render 0;
set $server_side_render_host_path api.mydomain.com/v1/ssr/;
if ($http_user_agent ~* "googlebot|bingbot|yandex|baiduspider|twitterbot|facebookexternalhit|rogerbot|linkedinbot|embedly|quora link preview|showyoubot|outbrain|pinterest\/0\.|pinterestbot|slackbot|vkShare|W3C_Validator|whatsapp") {
set $server_side_render 1;
}
if ($uri !~ "^(.*)/(product|category|other_endpoints_i_want_to_be_seo_friendly)(.*)"){
set $server_side_render 0;
}
if ($server_side_render = 1) {
proxy_pass https://$server_side_render_host_path$request_uri;
}
try_files $uri $uri/ /index.html;
}
The idea behind the change is that, we want to check when one of the bots of a popular site like Facebook or Google comes to our site, and then delegate those requests to a specific endpoint on our backend API. We are calling this endpoint api.mydomain.com/v1/ssr/
. You might be wondering why just send the bots to this endpoint? Why not send everyone? I would not recommend doing it because, obviously it would be super slow for an actual user of your website to go through all this just to receive a response from your site. Luckily, google bot and the other bots have a long enough timeout, so its still fast enough for the bots but not as fast for the real users. If you want to serve server-side-rendered html to all of your users, you should consider migrating to a framework like Next.js or Gatsby.js. But, thats gonna take a good amount of time too if your React app is large enough which is exactly why i think the aprroach i am discussing in this post is relevant.
Step 2: Add backend API /ssr/ endpoint
Now, that we have sent the bots to this endpoint, we need to serve them javascript rendered html files for their request_uri. This is where Selenium comes in, we can use it just to render html on the backend. Here's how it works,
def render_html(request, path):
if request.method != 'GET':
return HttpResponse(status=405)
# start after 3 letters (s-s-r)
idx = request.get_full_path().find('ssr/')
url = f"https://www.mydomain.com{request.get_full_path()[idx + 3:]}"
chrome_options = Options()
chrome_options.add_argument("--disable-extensions")
chrome_options.add_argument("--enable-javascript")
chrome_options.add_argument("--headless")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument('--ignore-ssl-errors=yes')
chrome_options.add_argument('--ignore-certificate-errors')
chrome_options.add_argument('--disable-web-security')
chrome_options.add_argument('--enable-logging=stderr --v=1')
# chrome_options.add_experimental_option('w3c', False)
chrome_options.add_argument('user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36')
d = DesiredCapabilities.CHROME
d['goog:loggingPrefs'] = {'browser': 'ALL'}
driver = webdriver.Chrome(chrome_options=chrome_options, desired_capabilities=d)
driver.get(url)
soup = BeautifulSoup(driver.page_source, 'html.parser')
for entry in driver.get_log('browser'):
print('Selenium-Log:', entry)
meta = soup.html.findAll('meta')
for item in meta:
print(item)
driver.quit()
return HttpResponse(soup.html, content_type="text/html")
We use chrome webdriver and use the option --enable-javascript
to find a javascript rendered html string of the website. This html string will contain the appropriate<meta>
tags added by libraries like React-Helmet
. Thus, we are sending a server-side-rendered html to the bots that come to our site.
Step 3: Add appropriate tags to the react pages
Now, we can use a library like React-Helmet or React-Meta-Tags to inject the tags for each page.
Step 4: Testing
We can test if the system we designed is working using a tool like Facebook-Sharing-Debugger to check what the facebook bot sees when it hits one of the SSR enabled endpoints in our website.
Voila ! We have successfully tricked the bots into seeing a server-side-rendered html of our site which contains the appropriate <meta>
tags used for SEO and SMO.
BTW, consider caching these server-side-rendered htmls to make the bots 🤖 🤖 🤖 even happier xD
Posted on July 23, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 12, 2024
November 12, 2024
September 24, 2024
July 19, 2024