🔥🤖 Build an AI-Powered Discord Bot to Recommend HackerNews Posts using OpenAI, Novu and Agenta 🚀

mmabrouk

Mahmoud Mabrouk

Posted on September 6, 2023

🔥🤖 Build an AI-Powered Discord Bot to Recommend HackerNews Posts using OpenAI, Novu and Agenta 🚀

TL;DR

In this tutorial, you'll learn how to create an AI agent that alerts you about relevant Hacker News posts tailored to your interests. The agent sends Discord notifications whenever a post matches your criteria.

We'll write the code using Python, use Beautifulsoup for web scraping, use OpenAI with Agenta for building the AI, and Novu for Slack notifications.

Goal? No more endless scrolling on Hacker News. Let the AI bring worthy posts to you!

No more Procrastination


Agenta: The Open-source LLM app builder 🤖

A bit about us: Agenta is an end-to-end open-source LLM app builder. It enables you to quickly build, experiment, evaluate, and deploy LLM apps as APIs. You can use it by writing code in Langchain or any other framework, or directly from the UI.


Star us on Github


Here's the plan 📝

We will write a script that does the following:

  • Scrape the first five Hacker News pages for post titles using Python and Beautifulsoup.
  • Use Agenta and GPT3.5 to categorize posts based on your interests.
  • Send compelling posts to your Slack channel.

Setting things up:

To get started, let's create a project folder and run Poetry. If you aren't familiar with Poetry, you should check it out as it provides an alternative to virtual environments that is much easier to use.

mkdir hnbot; cd hnbot/
poetry init

This command will guide you through creating your pyproject.toml config.


Would you like to define your main dependencies interactively? (yes/no) [yes] yes

Package to add or search for (leave blank to skip): novu
Add a package (leave blank to skip): beautifulsoup4


Do you confirm generation? (yes/no) [yes] yes
Enter fullscreen mode Exit fullscreen mode

Follow the prompts to set up your project. Don't forget to install novu and beautifulsoup4.

Now let's create the folder for our package and initialize the poetry environement

% mkdir hn_bot; cd hn_bot
% cd hn_bot 
% poetry shell
(hn-bot-py3.9) (base) % poetry install
Enter fullscreen mode Exit fullscreen mode

Now, we have a local environment where:

  1. All our requirements are installed.
  2. We have a Python package called hn_bot that is in our Python lib.

This means if we have multiple files in our library, we can import them using import hn_bot.module_name.

Scraping Hacker News Posts

Scraping the Hacker News page is straightforward since it does not use any complicated JavaScript. The pages are located at https://news.ycombinator.com/?p=pagenumber.

To find the titles and links on the page, we just need to open the web browser and access the dev console. Once there, we can check if there are any elements we can use to locate the titles and links. Luckily, it seems that every post is a span with the class "titleline."

Firefox Dev Console

We can use this to extract information from a single page. Let's write a function that extracts titles and links from Hacker News.

# hn_scraper.py
from typing import Dict, List

import requests
from bs4 import BeautifulSoup

def scrape_page(page_number: str) -> List[Dict[str, str]]:
    response = requests.get(f"https://news.ycombinator.com/news?p={page_number}")
    yc_web_page = response.text
    soup = BeautifulSoup(yc_web_page, 'html.parser')

    articles = []

    for article_tag in soup.find_all(name="span", class_="titleline"):
        title = article_tag.getText()
        link = article_tag.find("a")["href"]
        articles.append({"title": title, "link": link})

    return articles
Enter fullscreen mode Exit fullscreen mode

We can test it by adding a print(scrape_page(1)) at the end of the script and running it on shell:

 % python hn_scraper.py
(['Linux Network Performance Parameters Explained (github.com/leandromoreira)', 'Double Commander – Changes in version 1.1.0 (github.com/doublecmd)', 'If You’ve Got a New Car, It’s a Data Privacy Nightmare (gizmodo.com)', 'Ask HN: I’m an FCC Commissioner proposing regulation of IoT security updates', 'Gcsfuse: A user-space file system for interacting with Google Cloud S

Enter fullscreen mode Exit fullscreen mode

Congratulations!🎉 Now we have a script that scrapes post titles from HackerNews


Creating the AI Agent 🤖

Now that we have a list of posts, we need to use OpenAI gpt models to classify whether they are relevant based on the user's interests. For this, we are going to use Agenta.

Agenta allows you to create LLM apps from code or from the UI. Since our LLM app today is quite simple, we will create it from the UI.

Agenta can be self-hosted, however to get started quickly we'll use demo.agenta.ai.

Since our LLM app today is quite simple, we will just go ahead and create it from the UI.

You can self host agenta (Check out docs for that here (https://docs.agenta.ai/installation/local-installation/local-installation) or use the cloud-hosted demo. To get started quickly we'll do the later.

Let's go to demo.agenta.ai and login.

First, let's create a new app by clicking on "Create New App".

Create New App

Then we select start from template

Start From Template

And use a single prompt template

Single Prompt Template

Doing some Prompt Engineering In Agenta 🪄 ✨

Now we have a playground for creating the app.

First, let's add the inputs for our application. In this case, we will be using "title" for the Hacker News title and "interests" for the user's interests.

Adding Inputs

Next, we need to do a little prompt engineering. Since we are using gpt3.5 (the cheapest variant in OpenAI). It takes two messages: the system message and the user message. We can use the system message to guide the language model to answer in a certain way, while the prompt prompts the human to give the parameters of the task.

In this case, I tried a simple prompt for the system that ensures the answer is either "True" or "False." For the human prompt, I just asked the system to classify. Note that we used the fstring usual format to inject the inputs that we have added into the prompt.

The Prompts

Now we can then test the application with some examples of Hacker News titles:

Testing the Prompt

Agenta provides tools to systematically evaluate applications and optimize prompts, parameters, and workflows (in case we are using something more complex with embeddings and retrieval augmented generations). However, in this case, such evaluation is unnecessary. The app itself is very simple, and gpt3.5 is able to solve the classification problem with minimal effort.

Let's save our changes

Saving Changes

Then deploy the application as an API.

For this we jump to the endpoints menu and copy paste the code snippet to our code.

Endpoint Menu

Wrapping it up 🌯

Wrapping it up

Now, we can create a function based on this code snippet.

# llm_classifier.py
import requests
import json

def classify_post(title: str, interests: str) -> bool:

    url = "https://demo.agenta.ai/64f1d1aefeebd024bbdb1ea4/hn_bot/v1/generate"
    params = {
        "inputs": {
            "title": title,
            "interests": interests
        },
        "temperature": 0,
        "model": "gpt-3.5-turbo",
        "maximum_length": 100,
        "prompt_system": "You are an expert in classification. You answer only with True or False.",
        "prompt_human": "Classify whether this hackernews post is interesting for someone with the following interests:\nHacker news post title: {title}\nInterests: {interests}",
        "stop_sequence": "\n",
        "top_p": 1,
        "frequence_penalty": 0,
        "presence_penalty": 0
    }

    response = requests.post(url, json=params)

    data = response.json()

    return bool(data)
Enter fullscreen mode Exit fullscreen mode

Sending a Discord message 🎮

First we need to create a new channel in Discord

Creating a new channel in Discord

Next we need to create a webhook and copy the url

Getting the Webhook of the channel

Now we need to setup the integration in Novu. For this we have to go to the Integration Store, click on “Add a provider”, select Discord, and don't forget to activate it!

Add a provider in Novu

Last, we need to create a workflow that triggers the message to be sent to our Discord. We will add the {{content}} variable to the message which we will later inject using the code.

Create a Workflow in Novu

Write the messaging function

Now it's time to write the message that will trigger the workflow

# novu_bot
from novu.config import NovuConfig
from novu.api import EventApi
from novu.api.subscriber import SubscriberApi
from novu.dto.subscriber import SubscriberDto

NovuConfig().configure("https://api.novu.co", "YOUR_API_KEY")
webhook_url = "..." # the webhook url we got from Discord

def send_message(msg):
    your_subscriber_id = "123"  # Replace this with a unique user ID.

    # Define a subscriber instance
    subscriber = SubscriberDto(
        subscriber_id=your_subscriber_id,
        email="abc@gmail.com",
        first_name="John",
        last_name="Doe"
    )

    SubscriberApi().create(subscriber)
    SubscriberApi().credentials(subscriber_id=your_subscriber_id,
                                provider_id="discord",

    EventApi().trigger(
        name="slackbot",  # The trigger ID of the workflow. It can be found on the workflow page.
        recipients=your_subscriber_id,
        payload={},  # Your Novu payload goes here
    )
Enter fullscreen mode Exit fullscreen mode

Putting everything together

Now we're ready to assemble all the elements to get our AI assistant running.

Let's create an app.py file in which we first call the scraper, then the LLM classifier, and finally send a message with the interesting posts.

from hn_bot import hn_scraper, llm_classifier, novu_bot
import schedule
import time

interests = "LLMs, LLMOps, Python, Infrastructure, Tennis, MLOps, Data science, AI, startups, Computational Biology"

def main():
    novu_bot.send_message("Interesting posts at HackerNews:\n")

    posts = hn_scraper.scrape_page("1")
    for title, url in posts:
        if llm_classifier.classify_post(title, interests) == "True":
            novu_bot.send_message(f"{title}\n{url}")

if __name__ == "__main__":
    main()
Enter fullscreen mode Exit fullscreen mode

Et voila, we have all the interesting posts coming up in our Discord.

The result

Finally let's schedule this to run each hour ⏰

We would like to run the script to check new posts each hour. For this we need to add the python library schedule

from hn_bot import hn_scraper, llm_classifier, novu_bot
from time import sleep
interests = "LLM, LLMOps, MLOps, Data science, AI, startups"

done_post_titles = []


def main():
    novu_bot.send_message("Interesting posts at HackerNews:\n")

    posts = []
    for i in range(1, 5):
        posts += hn_scraper.scrape_page(i)
    for post in posts:
        title = post["title"]
        url = post["link"]
        if llm_classifier.classify_post(title, interests) and title not in done_post_titles:
            done_post_titles.append(title)
            novu_bot.send_message(f"{title}\n{url}")


if __name__ == "__main__":
    while True:
        main()
        sleep(3600)
Enter fullscreen mode Exit fullscreen mode

Congratulations on making it thus far!🎉 You've now got an automated AI assistant keeping an eye on Hacker News for you.


Summary 📜

In this tutorial, we've built an AI-powered assistant to keep you in the loop with relevant Hacker News posts. You should have learned:

  • How to use Beautifulsoup for scraping hackernews
  • How to create an LLM app based on one prompt using Agenta and OpenAI gpt3.5
  • How to send notifications on Discord using Novu

You can check the code at this https://github.com/Agenta-AI/blog/tree/main/hackernews-bot

Thanks for reading!


Star us on Github

💖 💪 🙅 🚩
mmabrouk
Mahmoud Mabrouk

Posted on September 6, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related