Open Graph Protocol Analyzer in Python

max236

Mageshwaran

Posted on October 19, 2020

Open Graph Protocol Analyzer in Python

What is Open Graph protocol used for?

The Open Graph protocol enables any web page to become a rich object in a social graph. For instance, this is used on Facebook to allow any web page to have the same functionality as any other object on Facebook.

Open Graph Protocol was first introduced by Facebook that allows integration between Facebook and its user data and a website. By integrating Open Graph meta tags into your website will help social network to crawl the data given in the web page, you can identify which elements of your page you want to show when someone share's your page.

You may seen this while sharing a web link in social network, like twitter card, Facebook link share, Whatsapp link card.

To learn more about Open Graph protocol
Online testing tool Open Graph Tester

We are going to build a simple Open Graph protocol analyzer, which will fetch the OGP data from websites. By using the python libraries BeautifulSoup and requests.

Install beautifulsoup4 and requests

pip install requests
pip install beautifulsoup4
Enter fullscreen mode Exit fullscreen mode

The website like user here is only for learning purposes.

import requests
from bs4 import BeautifulSoup
url = "https://www.udemy.com/course/learn-flutter-dart-to-build-ios-android-apps/"
r = requests.get(url=url)
# Create a BeautifulSoup object
soup = BeautifulSoup(r.text, 'html.parser')
Enter fullscreen mode Exit fullscreen mode

Above code will fetch the website content and load it as a BeautifulSoup object, from that we can extract the data.

# webpage can have more meta tags

# return the first meta tag
soup.find("meta")

# return all meta tag
soup.find_all("meta")

# return all meta tag with property as og:title
soup.find("meta",  property="og:title")
Enter fullscreen mode Exit fullscreen mode

Alt Text

soup.find_all("meta") find all the meta tags, return the value in python list format, using this then filter out the individual tags. Iterate over the meta tag value, filter out the tag values using python if statement based on tag properties.

# data holder
data = {
    "tag": {},
    "ogp": {}
}
# find all the meta tags in the web page
for i in soup.find_all("meta"):
    # extract individual tag with the property value
    if i.get("property", None) == "og:title":
        data["tag"]["title"] = i
        data["ogp"]["title"] = i.get("content", None)
    if i.get("property", None) == "og:url":
        data["tag"]["url"] = i
        data["ogp"]["url"] = i.get("content", None)
    if i.get("property", None) == "og:description":
        data["tag"]["description"] = i
        data["ogp"]["description"] = i.get("content", None)
    if i.get("property", None) == "og:image":
        data["tag"]["image"] = i
        data["ogp"]["image"] = i.get("content", None)
    if i.get("property", None) == "og:type":
        data["tag"]["type"] = i
        data["ogp"]["type"] = i.get("content", None)
    if i.get("property", None) == "og:site_name":
        data["tag"]["site_name"] = i
        data["ogp"]["site_name"] = i.get("content", None)
    if i.get("property", None) == "og:locale":
        data["tag"]["locale"] = i
        data["ogp"]["locale"] = i.get("content", None)
print(data)
Enter fullscreen mode Exit fullscreen mode
{'tag': {'title': <meta content="Flutter &amp; Dart - The Complete Guide [2020 Edition]" property="og:title"/>, 'url': <meta content="https://www.udemy.com/course/learn-flutter-dart-to-build-ios-android-apps/" property="og:url"/>, 'description': <meta content="A Complete Guide to the Flutter SDK &amp; Flutter Framework for building native iOS and Android apps" property="og:description"/>, 'image': <meta content="https://img-a.udemycdn.com/course/480x270/1708340_7108_4.jpg?mTkNpG_o5Wh0tcZgEWDnLLfndz7BG87EWBPuhbZij4iaIzFjeWC9AwmBEt4sTy0ioCD3r8w-Wtzfac00nfnb-TGMYVhafN8EXUpihTvhffAbcaEuTbQgRQvPORm5i1bX" property="og:image"/>, 'type': <meta content="udemy_com:course" property="og:type"/>, 'site_name': <meta content="Udemy" property="og:site_name"/>, 'locale': <meta content="en_US" property="og:locale"/>}, 'ogp': {'title': 'Flutter & Dart - The Complete Guide [2020 Edition]', 'url': 'https://www.udemy.com/course/learn-flutter-dart-to-build-ios-android-apps/', 'description': 'A Complete Guide to the Flutter SDK & Flutter Framework for building native iOS and Android apps', 'image': 'https://img-a.udemycdn.com/course/480x270/1708340_7108_4.jpg?mTkNpG_o5Wh0tcZgEWDnLLfndz7BG87EWBPuhbZij4iaIzFjeWC9AwmBEt4sTy0ioCD3r8w-Wtzfac00nfnb-TGMYVhafN8EXUpihTvhffAbcaEuTbQgRQvPORm5i1bX', 'type': 'udemy_com:course', 'site_name': 'Udemy', 'locale': 'en_US'}}
Enter fullscreen mode Exit fullscreen mode

this is good, but ogp has more properties like og:image, og:audio, og:determiner, og.local, etc. For more detail https://ogp.me/#optional .

without explicit specifying individuals ogp property value, check the property attribute has a og value or else exclude it. Store the values in python dictionary variable called data

data = {
    "tag": {},
    "ogp": {}
}
for i in soup.find_all("meta"):
    if i.get("property", None) is not None:     
        if i.get("property", None).split(":")[0] == "og":
            data["tag"][i.get("property", None)] = i
            data["ogp"][i.get("property", None)] = i.get("content", None)
print(data)
Enter fullscreen mode Exit fullscreen mode
{'tag': {'og:title': <meta content="Flutter &amp; Dart - The Complete Guide [2020 Edition]" property="og:title"/>, 'og:url': <meta content="https://www.udemy.com/course/learn-flutter-dart-to-build-ios-android-apps/" property="og:url"/>, 'og:description': <meta content="A Complete Guide to the Flutter SDK &amp; Flutter Framework for building native iOS and Android apps" property="og:description"/>, 'og:image': <meta content="https://img-a.udemycdn.com/course/480x270/1708340_7108_4.jpg?mTkNpG_o5Wh0tcZgEWDnLLfndz7BG87EWBPuhbZij4iaIzFjeWC9AwmBEt4sTy0ioCD3r8w-Wtzfac00nfnb-TGMYVhafN8EXUpihTvhffAbcaEuTbQgRQvPORm5i1bX" property="og:image"/>, 'og:type': <meta content="udemy_com:course" property="og:type"/>, 'og:site_name': <meta content="Udemy" property="og:site_name"/>, 'og:locale': <meta content="en_US" property="og:locale"/>}, 'ogp': {'og:title': 'Flutter & Dart - The Complete Guide [2020 Edition]', 'og:url': 'https://www.udemy.com/course/learn-flutter-dart-to-build-ios-android-apps/', 'og:description': 'A Complete Guide to the Flutter SDK & Flutter Framework for building native iOS and Android apps', 'og:image': 'https://img-a.udemycdn.com/course/480x270/1708340_7108_4.jpg?mTkNpG_o5Wh0tcZgEWDnLLfndz7BG87EWBPuhbZij4iaIzFjeWC9AwmBEt4sTy0ioCD3r8w-Wtzfac00nfnb-TGMYVhafN8EXUpihTvhffAbcaEuTbQgRQvPORm5i1bX', 'og:type': 'udemy_com:course', 'og:site_name': 'Udemy', 'og:locale': 'en_US'}}
Enter fullscreen mode Exit fullscreen mode

After this blog post in submitted Dev will generate Open Graph protocol for this page, you can check this by View page source.

https://dev.to/magesh236/open-graph-protocol-analyzer-4dk0

Alt Text

** Conclusion:** This comes under web scraping technique, so use it with caution. Not all the website allows you to scrape their content in that case use tool like selenium to render the website, after that get the web page content and pass it to the web scraping tool.

💖 💪 🙅 🚩
max236
Mageshwaran

Posted on October 19, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related