Building a WNBA Analytics Dashboard with Streamlit, LangChain, and Cloudflare Workers AI

lizziepika

Lizzie Siegle

Posted on October 18, 2024

Building a WNBA Analytics Dashboard with Streamlit, LangChain, and Cloudflare Workers AI

In this tutorial, we'll walk through the process of creating an interactive WNBA (Women's National Basketball Association) analytics dashboard. This powerful tool combines data visualization, AI-driven insights, and a chatbot interface to provide a comprehensive view of WNBA player statistics and team information.

Project Overview

This WNBA analytics dashboard offers the following features:

  • Player statistics visualization
  • Team comparisons
  • Interactive map showing WNBA team locations
  • AI-powered chatbot for WNBA-related queries
  • Data filtering and sorting capabilities
  • Responsive design for various screen sizes

We'll be using the following technologies:

  • Python
  • Streamlit for the web application framework
  • LangChain for natural language processing
  • Cloudflare Workers AI for machine learning capabilities
  • Pandas for data manipulation
  • Plotly for interactive charts
  • Folium for map visualizations

Step 1: Setting Up the Environment

First, let's set up our development environment:

  1. Create a new Python virtual environment
python3 -m venv venv                                                            
source venv/bin/activate
Enter fullscreen mode Exit fullscreen mode
  1. Include the required packages at the top of your Python file. From the GitHub repo, you can download the requirements.txt file and run pip install -r requirements.txt to install them.
import base64
from dotenv import load_dotenv
import json
import os
import requests
import time
import webcolors

import streamlit as st
import folium
from streamlit_folium import st_folium

from geopy.geocoders import Nominatim
from geopy.exc import GeocoderTimedOut, GeocoderServiceError

from langchain.memory import ConversationBufferMemory
from langchain_community.llms.cloudflare_workersai import CloudflareWorkersAI
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.runnables import RunnablePassthrough

import numpy as np
import pandas as pd
from pathlib import Path

import plotly.graph_objects as go
Enter fullscreen mode Exit fullscreen mode
  1. Set up a Cloudflare account and obtain the necessary credentials for Workers AI. When you login to the dashboard, your account ID is the string of characters that follow https://dash.cloudflare.com/ To get a Workers AI auth token, click on AI on the lefthand sidebar followed by the blue Use Rest API button. Then, click Create a Workers AI API Token.

Add them to a .env file and reference them by adding these lines beneath the import statements.

load_dotenv()

# Cloudflare Workers AI setup
ACCOUNT_ID = os.getenv('CF_ACCOUNT_ID')
AUTH_TOKEN = os.getenv('CF_AUTH_TOKEN')
Enter fullscreen mode Exit fullscreen mode

Configuring the Streamlit Page

Let's set up the basic Streamlit page configuration and add some custom CSS for styling:

st.set_page_config(page_title="WNBA Player Analytics Dashboard, AI Insights, && AI Assistant", page_icon="πŸ€", layout="wide")

# Custom CSS (truncated for brevity)
st.markdown("""
<style>
    .hover-link {
        color: #1E90FF;
        text-decoration: none;
        transition: color 0.3s ease;
    }
    .hover-link:hover {
        color: #FF4500;
        text-decoration: underline;
    }
    /* ... more custom CSS -> see https://github.com/elizabethsiegle/wnba-analytics-dash-ai-insights ... */
</style>
""", unsafe_allow_html=True)
Enter fullscreen mode Exit fullscreen mode

Data Collection and Preprocessing

Create a function to fetch and preprocess WNBA player data:

def fetch_player_data(season):
    url = f"https://www.basketball-reference.com/wnba/years/{season}_per_game.html"
    dataframes = pd.read_html(url, header=0)
    df = dataframes[0]
    df = df[df.G != 'G'].fillna(0)  # Remove header rows and fill NaNs
    df = df.drop(['G'], axis=1)

    # Convert percentage columns to float
    percentage_columns = ['FG%', '3P%', '2P%', 'eFG%', 'FT%']
    for col in percentage_columns:
        if col in df.columns:
            df[col] = pd.to_numeric(df[col].astype(str).str.rstrip('%'), errors='coerce') / 100

    # Convert other numeric columns to float
    numeric_columns = ['Age', 'GS', 'MP', 'FG', 'FGA', '3P', '3PA', '2P', '2PA', 'FT', 'FTA', 'ORB', 'DRB', 'TRB', 'AST', 'STL', 'BLK', 'TOV', 'PF', 'PTS']
    for col in numeric_columns:
        if col in df.columns:
            df[col] = pd.to_numeric(df[col], errors='coerce')

    # Ensure 'Player' and 'Team' columns are strings
    df['Player'] = df['Player'].astype(str)
    df['Team'] = df['Team'].astype(str)

    # Handle any remaining problematic columns
    for col in df.columns:
        if df[col].dtype == object:
            df[col] = df[col].astype(str)

    return df

# Usage
selected_season = st.sidebar.selectbox('Season', list(reversed(range(1997, 2025))))
player_data = fetch_player_data(selected_season)
Enter fullscreen mode Exit fullscreen mode

This function fetches data from basketball-reference.comfor a given season and performs necessary data cleaning, processing it for our use.

Building the User Interface

Now, let's set up the basic structure of our Streamlit app and sidebar controls:

st.set_page_config(page_title="WNBA Analytics Dashboard", page_icon="πŸ€", layout="wide")
st.title("WNBA Player Analytics Dashboard")
# Sidebar for filters
st.sidebar.header('Filter Options')
st.sidebar.success("Filter players by season, team, and position to explore the data.")
selected_season = st.sidebar.selectbox('Season', list(reversed(range(1997, 2025))))
# Team and position selection
teams = sorted([team for team in player_data.Team.unique() if team != 'TOT'])
selected_teams = st.sidebar.multiselect('Team', teams, teams)

positions = ['C', 'F', 'G', 'F-G', 'C-F']
selected_positions = st.sidebar.multiselect('Position', positions, positions)
# Fetch and display data
player_data = fetch_player_data(selected_season)
st.write(player_data)
Enter fullscreen mode Exit fullscreen mode

This creates a simple dashboard with a season selector and displays the raw data.

Step 4: Adding Data Visualization

Here, we have a helper function to clean percentage data, converting string percentages to floats and handling various data types. This dictionary maps user-friendly names to column names in the dataset.

We create a 2x2 grid layout for organizing dashboard components.
The Chart Section (col1):

  • allows users to select a statistic and chart type (Pie or Bar).
  • filters data based on a minimum value slider.
  • creates and displays a chart of the top 5 players for the selected statistic. The stats Section (col2):
  • displays quick stats like average, highest player, and number of players shown.
  • shows a styled table of top players for the selected statistic.

Key Features include interactive elements (dropdowns, radio buttons, slider) for user customization and dynamic chart creation based on user selection.

AI Insights Section (col3) adds an AI-powered feature to the dashboard:

It creates a button that, when clicked, generates AI insights about the chart displayed above.

When the button is clicked, it checks if the necessary data is available in the session state.

If data is available, it creates a DataFrame of the top 5 players and their stats.

It then calls the generate_insights() function (which you'd need to implement using a language model like GPT-3 or GPT-4) to analyze this data.

The generated insights are displayed to the user.

A warning is shown to remind users that the insights are AI-generated and should be verified.

This feature demonstrates how to integrate AI capabilities into a data dashboard, providing users with quick, automated analysis of the visualized data.

Implementing Player Comparison

Add a section for comparing two players:

st.markdown('<div class="player-comparison-section">', unsafe_allow_html=True)
    st.subheader("πŸ€ Player Comparison (players must have played in the same season)")

    # Allow users to select players to compare
    players = player_data['Player'].unique()

    # Find the indices of Caitlin Clark and Angel Reese
    caitlin_index = players.tolist().index('Caitlin Clark') if 'Caitlin Clark' in players else 0
    angel_index = players.tolist().index('Angel Reese') if 'Angel Reese' in players else 0

    player1 = st.selectbox("Select first player", players,index=caitlin_index, key='player1')
    player2 = st.selectbox("Select second player", players, index=angel_index, key='player2')

    def normalize(value, min_value, max_value):
        try:
            value = float(value)
            return 100 * (value - min_value) / (max_value - min_value) if max_value > min_value else 50
        except (ValueError, TypeError):
            return 0  # or some default value for non-numeric entries

    if player1 and player2:
        # Get data for selected players
        stats1 = player_data[player_data['Player'] == player1].iloc[0]
        stats2 = player_data[player_data['Player'] == player2].iloc[0]

        # Select stats to compare
        stats_to_compare = ['PTS', 'AST', 'TRB', 'STL', 'BLK', 'FG%', '3P%', 'FT%']
        # Convert columns to numeric, replacing non-numeric values with NaN
        for stat in stats_to_compare:
            player_data[stat] = pd.to_numeric(player_data[stat], errors='coerce')

        normalized_stats = {}
        for stat in stats_to_compare:
            min_val = player_data[stat].min()
            max_val = player_data[stat].max()
            normalized_stats[stat] = [
                normalize(stats1[stat], min_val, max_val),
                normalize(stats2[stat], min_val, max_val)
            ]

        # Create a radar chart
        fig = go.Figure()

        fig.add_trace(go.Scatterpolar(
            r=[normalized_stats[stat][0] for stat in stats_to_compare],
            theta=stats_to_compare,
            fill='toself',
            name=player1
        ))
        fig.add_trace(go.Scatterpolar(
            r=[normalized_stats[stat][1] for stat in stats_to_compare],
            theta=stats_to_compare,
            fill='toself',
            name=player2
        ))

        fig.update_layout(
            polar=dict(radialaxis=dict(visible=True, range=[0, 100])),
            showlegend=True,
            legend=dict(
                font=dict(size=16),  # Increase font size
                itemsizing='constant',  # Make legend items a constant size
                itemwidth=30,  # Adjust item width
                yanchor="top",  # Anchor to the top
                y=0.99,  # Position at the top
                xanchor="right",  # Anchor to the right
                x=0.99,  # Position at the left
                bgcolor="rgba(255, 255, 255, 0.5)",  # Semi-transparent background
                bordercolor="Black",  # Border color
                borderwidth=2,  # Border width
            ),
            title=dict(
                text=f"{player1} vs {player2} Comparison",
                font=dict(size=24)  # Increase title font size
            ),
            width=700,  # Adjust as needed
            height=700  # Adjust as needed
        )
Enter fullscreen mode Exit fullscreen mode

This creates a radar chart comparing two selected players across multiple statistics.

Adding an Interactive Map

Let's add a map showing WNBA team locations:

@st.cache_data
def create_wnba_map():
    # WNBA teams, their locations, and home page URLs
    wnba_teams = {
        'Atlanta Dream': ('Atlanta, GA', 'https://dream.wnba.com/'),
        'Chicago Sky': ('Chicago, IL', 'https://sky.wnba.com/'),
        'Connecticut Sun': ('Uncasville, CT', 'https://sun.wnba.com/'),
        'Dallas Wings': ('Arlington, TX', 'https://wings.wnba.com/'),
        'Indiana Fever': ('Indianapolis, IN', 'https://fever.wnba.com/'),
        'Las Vegas Aces': ('Las Vegas, NV', 'https://aces.wnba.com/'),
        'Los Angeles Sparks': ('Los Angeles, CA', 'https://sparks.wnba.com/'),
        'Minnesota Lynx': ('Minneapolis, MN', 'https://lynx.wnba.com/'),
        'New York Liberty': ('Brooklyn, NY', 'https://liberty.wnba.com/'),
        'Phoenix Mercury': ('Phoenix, AZ', 'https://mercury.wnba.com/'),
        'Seattle Storm': ('Seattle, WA', 'https://storm.wnba.com/'),
        'Washington Mystics': ('Washington, D.C.', 'https://mystics.wnba.com/')
    }

    # Create a map centered on the United States
    m = folium.Map(location=[39.8283, -98.5795], zoom_start=4)

    # Geocoding to get coordinates
    geolocator = Nominatim(user_agent="wnba_app")
    # Team name to abbreviation mapping
    team_abbr = {
        'Atlanta Dream': 'ATL', 'Chicago Sky': 'CHI', 'Connecticut Sun': 'CON',
        'Dallas Wings': 'DAL', 'Indiana Fever': 'IND', 'Las Vegas Aces': 'LVA',
        'Los Angeles Sparks': 'LAS', 'Minnesota Lynx': 'MIN', 'New York Liberty': 'NYL',
        'Phoenix Mercury': 'PHO', 'Seattle Storm': 'SEA', 'Washington Mystics': 'WAS'
    }

    # Add markers for each team
    for team, (city, url) in wnba_teams.items():
        try:
            location = geocode_with_retry(geolocator, city)
            if location is None:
                # Use fallback coordinates if geocoding fails
                lat, lon = fallback_coordinates[team]
            else:
                lat, lon = location.latitude, location.longitude
            # Get team abbreviation and color
            abbr = team_abbr.get(team, 'ATL')  # Default to ATL if not found
            hex_color = team_colors.get(abbr, '#000000')  # Default to black if color not found
            rgb = webcolors.hex_to_rgb(hex_color)
            closest_folium_color = closest_color(rgb)

            # Create popup HTML with team info and link
            popup_html = f"""
            <b>{team}</b><br>
            {city}<br>
            <a href="{url}" target="_blank">Visit Team Website</a>
            """

            folium.Marker(
                [lat, lon],
                popup=folium.Popup(popup_html, max_width=300),
                tooltip=team,
                icon=folium.Icon(color=closest_folium_color, icon='basketball', prefix='fa')
            ).add_to(m)
        except Exception as e:
            st.warning(f"Couldn't add marker for {team}: {str(e)}")

    return m
Enter fullscreen mode Exit fullscreen mode

This creates an interactive map with markers for each WNBA team.

Step 8: Adding a Chatbot Interface

Finally, let's add a chatbot for answering WNBA-related questions:

    st.markdown('<div class="chatbot-section">', unsafe_allow_html=True)
    st.subheader("πŸ€ ChatπŸ’¬ w/ WNBA AI Assistant powered by LangChain && Cloudflare Workers AIπŸ€–")

    # Add a loading message
    chat_loading = st.empty()
    chat_loading.info("Chat is initializing... This may take a few moments.")

    # Initialize the LLM and conversation chain
    @st.cache_resource
    def initialize_chat(filtered_data: pd.DataFrame):
        llm = CloudflareWorkersAI(
            account_id=ACCOUNT_ID,
            api_token=AUTH_TOKEN,
            model="@cf/meta/llama-2-7b-chat-int8"
        )
        # Convert filtered_data to a string representation
        data_context = filtered_data.to_string()

        prompt = ChatPromptTemplate.from_messages([
            ("system", """You are a knowledgeable assistant specializing in WNBA statistics, players, and teams. 
            Provide accurate and helpful information about the WNBA.
            Here's the current WNBA data you have access to:
            {data_context}
            Use this data to answer questions, but don't mention the data directly unless asked."""),
            ("human", "{input}"),
            ("ai", "{agent_scratchpad}")
        ])

        memory = ConversationBufferMemory(return_messages=True, output_key="agent_scratchpad")
        def get_chat_history(inputs):
            return memory.chat_memory.messages

        chain = (
            RunnablePassthrough.assign(
                agent_scratchpad=get_chat_history,
                data_context=lambda _: data_context[:100] + "..." # Truncate for brevity 
            )
            | prompt
            | llm
        )

        return chain, memory, data_context

    # Initialize the chat
    chain, memory, data_context = initialize_chat(filtered_data)

    # Remove the loading message
    chat_loading.empty()

    # Initialize chat history
    if "messages" not in st.session_state:
        st.session_state.messages = []

    # Display chat messages from history on rerun
    for message in st.session_state.messages:
        with st.chat_message(message["role"]):
            st.markdown(message["content"])

    # React to user input
    if user_input := st.chat_input("Ask me anything about WNBA stats, players, or teams!"):
        # Display user message in chat message container
        st.chat_message("user").markdown(user_input)
        # Add user message to chat history
        st.session_state.messages.append({"role": "user", "content": user_input})

        # Get AI response
        with st.spinner("Thinking..."):
            try:
                response = chain.invoke({
                    "input": user_input,
                    "data_context": data_context
                })

                if not response or response.strip() == "":
                    response = "I apologize, but I couldn't generate a response. This could be due to an issue with the AI model or the input. Please try asking your question in a different way or try again later."
            except Exception as e:
                response = f"An error occurred: {str(e)}"
                st.error(f"Debug: Error details: {e}")

        # After getting the response from the model
        if isinstance(response, list) and len(response) > 0 and hasattr(response[0], 'content'):
            response_text = response[0].content
        elif isinstance(response, dict) and 'content' in response:
            response_text = response['content']
        elif isinstance(response, str):
            response_text = response
        else:
            response_text = str(response)


        # Display assistant response in chat message container
        with st.chat_message("assistant"):
            st.markdown(response_text)
        # Add assistant response to chat history
        st.session_state.messages.append({"role": "assistant", "content": response_text})

        # Update memory
        memory.chat_memory.add_user_message(user_input)
        memory.chat_memory.add_ai_message(response_text)


        # Add some styling to make the chat interface look better
        st.markdown("""
        <style>
        .stChatFloatingInputContainer {
            bottom: 20px;
            background-color: #f0f2f6;
            padding: 10px;
            border-radius: 10px;
            box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
        }
        </style>
        """, unsafe_allow_html=True)
Enter fullscreen mode Exit fullscreen mode

This creates a chatbot interface that can answer questions about WNBA statistics using the provided data.

Conclusion

We've now built a comprehensive WNBA analytics dashboard with data visualization, AI insights, and a chatbot interface. This project demonstrates the power of combining data analysis with machine learning to create interactive and informative tools for sports analytics.

The complete code can be found here on GitHub.

Happy coding, and enjoy exploring WNBA statistics with your new analytics dashboard!

πŸ’– πŸ’ͺ πŸ™… 🚩
lizziepika
Lizzie Siegle

Posted on October 18, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related