Save time reading Hacker News comments using Converse API

I share how Amazon Q Developer helped me to create summaries using the Amazon Bedrock Converse API of Hacker News items

Like a lot of folk who work in tech, I spent (probably too much) time reviewing the latest news on various sites, including Hacker News. I love reading the various comments that folk take the time to leave, although sometimes it can be hard work. I find it worth it though, as there are often great insights and points of views in those comments (amongst some of the not so good stuff!). Occasionally a topic will really get a lot of engagement, and it can be tough going through all the comments.

I decided to see if I could quickly put something together using the powers of generative AI, to help me summarise all the comments and save me the time. I had been looking for a reason to try out the new Amazon Bedrock Converse API, and had fifteen minutes to spare. In this blog post, I share how I was able to put some code together with the help of Amazon Q Developer, to create summaries using the Amazon Bedrock Converse API of Hacker News items. I first write some code using Amazon Q Developer to interact with the Hacker News API, and then use the Amazon Bedrock Converse API to summarise it for me.

You can find the code created in this GitHub repo

Consolidating all comments

The first task I had to do was find out how to consolidate all the comments. Luckily, Algolia Search's API enables developers to access Hacker News data programmatically using a REST API. I tried this in my browser, I was able to get exactly what I was looking for.

This is a sample of what the JSON data payload that you get from hitting the HackerNews API looks like. I can get Amazon Q Developer to help generate some code to make sense of this data.

{"exhaustive":{"nbHits":true,"typo":true},"exhaustiveNbHits":true,"exhaustiveTypo":true,"hits":[{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"sbeaks"},"comment_text":{"matchLevel":"none","matchedWords":[],"value":"Are we going to end up with rererecapture where to post you need something on your phone/laptop measuring typing speed and scanning your face with an IR cam? Or a checkmark showing typed out by hand? Wouldn't get rid of ai content but may slow down bots posting it."},"story_title":{"matchLevel":"none","matchedWords":[],"value":"Open source AI is the path forward"},"story_url":{"matchLevel":"none","matchedWords":[],"value":"https://about.fb.com/news/2024/07/open-source-ai-is-the-path-forward/"}},"_tags":["comment","author_sbeaks","story_41046773"],"author":"sbeaks","comment_text":"Are we going to end up with rererecapture where to post you need something on your phone&#x2F;laptop measuring typing speed and scanning your face with an IR cam? Or a checkmark showing typed out by hand? Wouldn&#x27;t get rid of ai content but may slow down bots posting it.","created_at":"2024-07-24T10:48:04Z","created_at_i":1721818084,"objectID":"41055592","parent_id":41054918,"story_id":41046773,"story_title":"Open source AI is the path forward","story_url":"https://about.fb.com/news/2024/07/open-source-ai-is-the-path-forward/","updated_at":"2024-07-24T10:52:28Z"}],"hitsPerPage":1,"nbHits":703,"nbPages":703,"page":0,"params":"tags=comment%2Cstory_41046773&hitsPerPage=1&advancedSyntax=true&analyticsTags=backend","processingTimeMS":1,"processingTimingsMS":{"_request":{"roundTrip":13}},"query":"","serverTimeMS":1}

I ask Amazon Q Developer the following,

Amazon Q Developer : "I have the following json data payload that is returned when running against an API. Provide sample code in Python that will allow me to export all iterations of the the comment_text to the console, separated by a comma. Here is the sample json"

Amazon Q Developer provides me code that extracts exactly what I want - perfect. It is not complex, but it has taken me a lot less time to put this together than if I had been figuring out the JSON payload by myself.

import json

# Sample JSON data
json_data = <copy of the above data>

# Load the JSON data
data = json.loads(json.dumps(json_data))

# Extract the comment_text values and print them separated by a comma
comment_texts = [hit["comment_text"] for hit in data["hits"]]
print(", ".join(comment_texts))

What I really want though, is to be able to access the API directly, so I ask Amazon Q Developer to update the code:

Amazon Q Developer : "can you update the code so that it gets the json data from the following URL - https://hn.algolia.com/api/v1/search_by_date?tags=comment,story_41046773&hitsPerPage=1"

Quicker than I can type, Amazon Q Developer updates my code for me:

import json
import requests

# URL to fetch JSON data
url = "https://hn.algolia.com/api/v1/search_by_date?tags=comment,story_41046773&hitsPerPage=2"

# Send a GET request to the URL
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    # Load the JSON data
    data = response.json()

    # Extract the comment_text values and print them separated by a comma
    comment_texts = [hit["comment_text"] for hit in data["hits"]]
    print(", ".join(comment_texts))
else:
    print(f"Error: {response.status_code}")

This is great, but brings back just one item, and in reality there will be more than one comment. So I need to update the code to handle this. I also update the URL so that it brings back just two items (comments) - once I have got this working I will change this again so it grabs all items.

Amazon Q Developer : "can you updated the code so that each comment is separated by a new line break"

And my script now looks like this:

import json
import requests

# URL to fetch JSON data
url = "https://hn.algolia.com/api/v1/search_by_date?tags=comment,story_41046773&hitsPerPage=2"

# Send a GET request to the URL
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    # Load the JSON data
    data = response.json()

    # Extract the comment_text values and print them separated by a new line
    comment_texts = [hit["comment_text"] for hit in data["hits"]]
    print("\n".join(comment_texts))
else:
    print(f"Error: {response.status_code}")

The last thing I want to do is to output this to a file, so yup, you guessed it, lets ask Amazon Q Developer

Amazon Q Developer : "if i wanted to output to a file, how would i update the code"

The code now runs and creates a text file with the output

import json
import requests

# URL to fetch JSON data
url = "https://hn.algolia.com/api/v1/search_by_date?tags=comment,story_41046773&hitsPerPage=2"

# Send a GET request to the URL
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    # Load the JSON data
    data = response.json()

    # Extract the comment_text values
    comment_texts = [hit["comment_text"] for hit in data["hits"]]

    # Open a file for writing
    with open("comments.txt", "w") as file:
        # Write each comment to the file, separated by a new line
        file.write("\n".join(comment_texts))

    print("Comments written to comments.txt")
else:
    print(f"Error: {response.status_code}")

When I run this, I now have the comments in a nice file called comments.txt, with each comment on a new line. I next update the URL in the script to return 1000 entries, and re-run it. When I use wc to check my new file, I can see I have quite a few comments in there (757 lines and 42K words)

wc comments.txt                         

 757   42864  283516 comments.txt

Updating to work on any News item

At the moment the code takes in a hard coded URL of the Hacker News item I am interested in. Lets update the code so that I can pass this in as a parameter.

Amazon Q Developer : "Can you update the code so that it takes in a command line option for the story_ in the URL"

The resulting code now checks for and expects you to provide a story ID, which you can easily find from the URL of any news item you are interested in.

import json
import requests
import sys

# Check if a command-line argument is provided
if len(sys.argv) < 2:
    print("Usage: python script.py story_id")
    sys.exit(1)

# Get the story_id from the command-line argument
story_id = sys.argv[1]

# URL to fetch JSON data
url = f"https://hn.algolia.com/api/v1/search_by_date?tags=comment,story_{story_id}&hitsPerPage=1000"

# Send a GET request to the URL
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    # Load the JSON data
    data = response.json()

    # Extract the comment_text values
    comment_texts = [hit["comment_text"] for hit in data["hits"]]

    # Open a file for writing
    with open("comments.txt", "w") as file:
        # Write each comment to the file, separated by a new line
        file.write("\n".join(comment_texts))

    print("Comments written to comments.txt")
else:
    print(f"Error: {response.status_code}")

The first part is done, now for the fun part - summarising the captured comments, to help us save some time.

Summarising the comments using the Amazon Bedrock Converse API

A few months ago, we launched a new API to work with Amazon Bedrock, called the Amazon Bedrock Converse API. The Amazon Bedrock Converse API provides a consistent way to access large language models (LLMs) using Amazon Bedrock. You can read more about this in the post, Getting started with the Amazon Bedrock Converse API. This looks like a good approach.

After enabling access to a model in Amazon Bedrock (I decide to try out Anthropic Claude Sonnet) in us-east-1, I update my local .aws profile so that it points to the us-east-1 region (see Errata for why). I check to make sure that my current federated user has access to the Amazon Bedrock service (as I am using AWS Identity Centre, I check the permissions sets) and I am now good to go. The nice thing about using the Amazon Bedrock Converse API is that I can play around with different model providers just by adjusting the ID in the code, and leave everything the same. That is pretty awesome in my book.

I decide rather than keep it with the other script, I will create a new one. I start off with creating a new file (summarise_comments.py) and then copying the example from the blog post. I then ask Amazon Q Developer to help me tweak it.

Amazon Q Developer : "I have the following code. Can you show me how to update it so that it reads from a file called comments.txt and then summarises the text. Here is the code "

It provides the following updated code.

import os
import boto3
import json

# Load the contents of the comments.txt file
with open('comments.txt', 'r') as file:
    file_contents = file.read()

# Create a Bedrock client
session = boto3.Session(region_name='us-east-1')
bedrock = boto3.client('bedrock-runtime')

# Define the model ID
model_id = 'anthropic.claude-3-sonnet-20240229-v1:0'

message_list = []

initial_message = {
    "role": "user",
    "content": [
        { "text": "Summarise the the text and provide the top three to five main talking points. Ignore any comments that use bad language, or profanities" },
        { "document": {
            "format":"txt",
            "name":"comments",
            "source":{"bytes":file_contents}
        }} 
    ],
}

message_list.append(initial_message)

response = bedrock.converse(
    modelId="anthropic.claude-3-sonnet-20240229-v1:0",
    messages=message_list,
    inferenceConfig={
        "maxTokens": 4000,
        "temperature": 0
    },
)

response_message = response['output']['message']
print(json.dumps(response_message['content'][0]['text'], indent=4))

When I run this, I get the following output.

"Here are the main talking points from the text, summarized:\n\n1. Meta (Facebook) has released its Llama 3.1 language model, including versions with 405 billion, 70 billion, and 8 billion parameters, as open source AI models. This makes frontier-level AI models freely available to most people and organizations, except Meta's major competitors.\n\n2. Meta argues that open sourcing AI models promotes safety, as it allows wider scrutiny and prevents a small number of companies from monopolizing the technology. However, there is debate around whether the Llama models truly qualify as \"open source\" given the restrictions in the license and lack of access to training data.\n\n3. Meta's motivations for open sourcing the models are seen as both promoting the public good and undercutting competitors like OpenAI, Microsoft, and Google, which have been monetizing their AI models through APIs and cloud services.\n\n4. The release of the open Llama models is expected to significantly impact the AI industry, potentially disrupting the business models of closed AI providers and accelerating the development and adoption of open AI systems.\n\n5. There are concerns around the potential misuse of open AI models, as well as questions about the practicality of truly open sourcing large language models given the challenges of sharing training data and the computational resources required for training."

I am pretty happy with this, and it has saved me a ton of time reading through all those comments.

Conclusion

In this short blog post I shared how I was able to quickly go from idea to working code with the help of Amazon Q Developer and the Amazon Bedrock Converse API. I was happy with the output, and I think this is going to be useful. One thing I am not sure about however, is whether I am going to miss out on any of the insights and points of views that I used to find, and that made wading through those comments worth while. I am not sure what the answer is there, but where my head is at at the moment is that if a summary looks interesting, it might encourage me to dive into the comments - so I am not in a necessarily worst situation. Something to think about anyway.

The code and the approach could be improved. One idea I had was to schedule this to run against the top 25 items on Hacker News and automate the generation of summaries which could be sent to me by email every morning. Keep posted, if I do make any further changes, I will update the code repo.

If you want to try something similar, always check with the API provider (review their terms and conditions, or if they have any specific API guidelines). I did this before writing this post, and it looks like I am good to go (aside from the fact that there is a limit of 10K requests per day, which for the purpose of what I need it for, is perfectly fine).

Thanks all folks, thanks for tuning in.

If you want to learn more, check out the Amazon Q Developer Dev Centre which contains lots of resources to help get you started. You can also get more Amazon Q Developer related content by checking out the dedicated Amazon Q space, and if you are looking for more hands on content, take a look at my interactive tutorial that helps you get started with Amazon Q Developer.

Errors

When I was trying to use the ConverseAPI I ran into a couple of errors, so sharing those here as it might help folk out.

Old boto3 installed

AttributeError: 'BedrockRuntime' object has no attribute 'converse'

I got this error when I was trying to run a quick hello_world example of the ConverseAPI. After checking boto3 versions, I was using 1.28.70 which I checked using print(f"Boto3 version: {boto3.version}"). I had to pip uninstall boto3 and then re-install it so that I was using the latest version (I am now using 1.34.147). That resolved this error.

AWS region not picked up

Again whilst trying to run the hello_world example, I got this error.

botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the Converse operation: Your account is not authorized to invoke this API operation.

My immediate thought was that I had forgotten to enable model access from the Amazon Bedrock console. But looking at the region (us-east-1) for the Model I was using (anthropic.claude-3-sonnet-20240229-v1:0) it all looked good, and I can do stuff in the text playground.

I added the following debug code to my hello_word example

from tqdm import tqdm
import logging
import os

# Configure logging
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)  # Set the desired logging level (DEBUG for more verbose logging)
logging.basicConfig(level=logging.INFO)



session = boto3.Session(region_name='us-east-1')
boto3.set_stream_logger('', logging.DEBUG

And despite me explicitly setting the region to us-east-1, it was using the local profile rather than the info in the code.

2024-07-24 13:30:59,591 botocore.hooks [DEBUG] Event before-parameter-build.bedrock-runtime.Converse: calling handler <function generate_idempotent_uuid at 0x103a543a0>
DEBUG:botocore.regions:Calling endpoint provider with parameters: {'Region': 'eu-west-1', 'UseDualStack': False, 'UseFIPS': False}

After updating my region in my local profile to us-east-1, the code worked fine.

Blog

Save time reading Hacker News comments using Converse API

Ricardo Sueiras

Errors

Join Our Newsletter. No Spam, Only the good stuff.

Related