Lifelike Toolkit Devblog #2: Mining for Diamonds

Brand New Discord Channel! Join to keep track of our progress
Lifelike Github Repo
Cool Overprotective Dad AI Visual Novel Demo
In Progress AI Interrogation Demo - The Dinner Party Case
Mustafa on Twitter

Hey everyone, it's been a while. Super sorry for dropping this 2 weeks late, life kind of got in the way. After around 3-4 times of changing how long it has been since the supposed drop date of the blog, I'm really hoping this is the one. As an international student who just finished his junior year in university, I live a "nomadic" lifestyle. I move ever so often, which gets in the way of my work. As of writing this lengthy introduction, I do not have access to the Internet, which, interestingly, does provide quite a perspective on how reliant I have gotten to it. Unfortunately, me and my housemates did not plan this out very well so it will be like this for another few days. I suppose it does give me extra motivation to focus all my attention on Lifelike (but I'm not 100% sure how coding without documentation and StackOverflow is going to look like). Nonetheless, it is not just me that ceased to function properly without access to the Internet. It was quite a rude awakening this morning when I tried to tune my guitar with a phone app that I have used for the past 5 years, only to find out that the app requires the Internet. For the moment though, I am going through Breath of the Wild again, and the game still feels as good as it did years ago (but some of the magic is gone since I know pretty much everything already).

I don't suppose a Breath of the Wild review is what you are here for, so let's look at what we have done so far. First and foremost, we've finally packaged the toolkit! You can now use the toolkit by doing pip install git+https://github.com/lifelike-toolkit/lifelike.git. Mustafa did a great job explaining how to use the toolkit in the repository itself. Meanwhile, there will be a guide/documentation on the Sequence Manager will come out soon. Currently, with Mustafa's Brain component, you can set up a chatbot, or have multiple NPCs talk and interact with each other. For both use cases, one can set up background contexts for the characters, as well as the main context of their conversation. On the other hand, my Sequence Tree is more or less done, but I'm still doing some testing and game development work myself so that I can streamline the Sequence Manager. In my opinion, me actually trying to make a game out of it helps me find a good usage flow for the toolkit. Of course, thanks to the Human-Computer Interface course I took, I know that I am not the user, but it's pretty difficult to get people to get into this to be completely honest. For now, this is a great option.

As of right now though, the Sequence Manager controls an abstracted representation of a Sequence Tree. In this tree, each node is a Sequence Event, which stores the corresponding game state or any additional information while every edge is a Path Embedding, which stores the embedding interpreted using user input. In Cool Overprotective Dad, the Embedding represents sentiment embedding given by a sentiment analysis model using user input. With this design, there are quite a lot of things you can use it for.

If you're new here (even if you're a frequent reader, in which case, I really really appreciate that), Lifelike is designed to work like an improv drama session. Unlike most AI games that have come out so far, Lifelike allows for the human touch when it comes to story. To explain this in terms of the improv drama metaphor, you can imagine the Brain as the actors, while the Sequence Manager is the Director who is telling the actors where they should push the story. Since I love talking about Skyrim here, I will also explain it in terms of my original "Get into Windhelm" scenario that I used in the first blog. The developer here can define the number of options to get into Windhelm using the Sequence Manager: making the guard like you or explain to him that you support their cause. Here, when the player approach the guard and make the guard like you, the Sequence Manager will tell the Brain that the "Guard opens gate to Windhelm" event has been reached via the "Guard likes Player" path. Here the Sequence Manager will give the Brain the "Guard will now open the gates to Windhelm" context, causing the Brain to behave as such.

It does not end there, I recently finished the code for a AI-driven Visual Novel (although there's no Visual right now) named Cool Overprotective Dad. This is a demo that I made to demonstrate the use of the Sequence Manager, as the game was made exclusively using this tool. Before I start peeling off the layers of buzzwords, I want to give a special shout-out to my roommate and a great friend, Connor Killingbeck for writing a banger of a story specifically for the game. In the game, the story progresses based on how you talk to characters. Specifically, the game's narative rely on the emotion you convey, which leads to a Disco Elysium-esque story (I will not claim it will be as good as Disco Elysium though). For example, in the encounter with the Bouncer in Chapter 1, he will asks the player how they are feeling that night. Here, if the player responds with excitement, the bouncer will immediately let you through. Conversely, if the player appears negative, the game ends as the player cannot enter the club. However, an interesting option we included is the Neutral path (no emotion conveyed by the player), where the bouncer will asks the player for ID before they are let into the club. To make this work, each event is something that happens in the game that is the result of player's input, while each path is a sentiment embedding. Every time the player speaks, the game will process the sentiment in the input, get the embedding, and determine which event is the most likely to happen based on the pre-determined events. This is all done using the Sequence Manager. If you guys want to try the game out (only Chapter 1 is out at the moment, Connor is finishing the rest), it will take some work to set up (follow the guide on the repository) but trust me, it's totally worth it. If you don't have that kind of time though, here's a video of me playing through Chapter 1 (I did not get to all the endings in this video, maybe you can):

Now that's really awesome, but how does it perform? Surprisingly well, although it struggles if player's input rely on some pre-established context. This can be attributed to the sentiment analysis model as it is only meant for sentences that can be taken out of context. It's a very quick one that I can find on the Internet. I also make some weird design choices in this project as I was unsure of how Lifelike was going to look like with Mustafa's work. Originally, the whole thing was supposed to be a browser game only, running on ONNX, but we quickly found out that it was implausible, unless I want to translate core functionalities to Javascript. So, for now, the inference is done client-side, but the game itself is stored server-side in a vectordb. Outside of that, I also included an embedding tuner, that allows the developer to input a list of prompts and get the average embedding back. This helped greatly as I was able to program events a lot more easily. The whole process took about a few hours. In the future, I want to go back to this project, write a whole guide on how I managed to do it for a Lifelike tutorial as well as clean up the game + make a simple terminal game in Python.

If you are interested in how the game looked like a few weeks ago, below is a video of an extremely early beta for a showcase.

So, what are we working on right now? As of writing this little blog, we are working on a brand new demo that showcases the entire toolkit in action (Cool Overprotective Dad only has an AI driven story, the characters do not really respond to contextual clues). This demo (tentatively called Interrogation for now) involves you, a detective, trying to get a suspect to admit guilt. It's pretty much inspired by Ace Attorney and L.A. Noire (as I have talked about in the previous blog). Recently, I have started to also think about the Brooklyn Nine-Nine episode "The Box" where Jake had to interrogate a very intelligent murderer and try to get a confession. I really want to create a game that can show that type of emotional manipulation that Jake used to get an admission of guilt at the end of the episode. In this game, the player is given a pdf case file (very similar to how Keep Talking and Nobody Explodes works) that details crime scene evidence, witnesses' testimony, CCTV descriptions and other clues. Using these information, the player has to ask the suspect questions, and tell them things that will push them to either intentionally or accidentally admit to guilt.

But what's different from Cool Overprotective Dad? This game is not a graphic novel that is written by a human. Instead, it's 50-50 this time around. The story beats will still need to be written by a human, so we don't lose that human touch and artistry. However, the dialogues themselves are completely written by AI, allowing dynamic dialogue that actually responds to what the player has said. The game will be built using both the Sequence Manager and the Brain of the Lifelike toolkit to make both of these true. The way I want to build it is to first build a Sequence Tree with the story that I want, determining the personality, emotional response, as well as game events that correspond to what the players say. The Brain, on the other hand, will handle how the suspect will respond verbally given the current story event. Essentially , think of an improv film director who's telling the actors (the Brain) how to push the story with a script (the Sequence Manager). Unlikes Cool Overprotective Dad though,the Sequence Manager will not be run on sentiment analysis here, but rather context-embedding, possibly using LLAMA. The Sequence Manager will also be changed slightly to work more as a knowledge encyclopedia for the game. For example, an event is a topic, which will include information about the topic itself, as well as how the suspect should respond to a question/statement about it. This is a super simplified view of the actual code, but I will make sure there is a guide to recreate the game, so some dev out there (you? maybe) can make their own scenarios.

If that is too tough for you to visualize, here's a little play-by-play of an example case in the game. Imagine the suspect is being interrogated for stealing their neighbour's Amazon packages. The case file provided will include a description of a person wearing a Maple Leafs hoodie stealing the package, eyewitness testimony of the package being stolen at 8 am on the previous Thursday as well as an eyewitness seeing the suspect jogging in the neighbourhood at 9 am, also on the previous Thursday. The Sequence Manager will essentially stores these topics as events, and provide the corresponding path embeddings (provided by LLAMA) required to reach it. In the Maple Leafs topic, the suspect will be instructed to be very excited to talk about the hockey team and how much of a fan they are. However, when asked about 8am on the previous Thursday, they will lie and say they were not in the neighbourhood until 10 am on that day. However, if the player read the case file, they will realize that an eyewitness saw the suspect at 9 am, so the player can call them out on it. The suspect should then feel intimidated into admission if the player successfully invoke such topic (even passive aggresively). Each of these "context clues" will be stored in the Sequence Managers. As the game proceeds, a new component, named the Game Director, will take the user's input, embed the context and see which topic the player is trying to invoke. The topic's context clues is then passed to the Brain via the Game Director, generating the suspect's response.

Right now, I am still in the process of experimenting with what the model we chose is capable of with the toolkit. It's pretty incredible how far AI has gone to be able to generate these sorts of responses. There are some issues sticking out though. The responses take quite a bit to generate, even on my beefy desktop. It took even longer on Mustafa's laptop. Hopefully, when the game actually comes out, we'll have found a way to handle this little issue. Furthermore, every once in a while, the agent gets a little insane.

For context, the above snippet is taken when I was experimenting with Lifelike's Brain tool. In this scenario, the smiley face icon is a detective who's investigating the murder of a women; while the robot icon, the suspect and the victim's brother, is trying to hide the fact that he's guilty. The Detective is trying to get a confession out of the Suspect, which leads to this hilarious conversation. If you want to try to do the same thing, here's the code snippet that I used (courtesy of Mustafa):

"""This module demonstrated how to make a terminal chatbot with multiple characters."""

import streamlit as st
import streamlit_chat
from langchain.llms import LlamaCpp
from lifelike import brain

# Initialize LLM
llm = LlamaCpp(model_path='ggml-model-q4_0.bin')

# Initialize Characters
characters = brain.Characters('characters.json')

name1 = input ("Enter name of character 1: ")
name2 = input ("Enter name of character 2: ")
background1 = input (f"Enter background of {name1}: ")
background2 = input (f"Enter background of {name2}: ")
CONTEXT = input("Enter context: ")
first_speaker = input("Enter first speaker: ")
first_utterance = input("Enter first utterance: ")

characters.add(name1, background1)
characters.add(name2, background2)

# Initialize Conversations
conversations = brain.Conversations('conversations.json', characters, llm)

conversations.new(CONTEXT, {name1, name2})
conversations.append(CONTEXT, first_speaker, first_utterance)

st.set_page_config(
    page_title="Lifelike brain demo",
    page_icon="🧠",
)

st.header("Lifelike brain demo")


# Start chatbot
counter = 0
while True:
    if counter % 2 == 0:
        streamlit_chat.message(conversations.get(CONTEXT)['log'][-1][1])
    else:
        streamlit_chat.message(conversations.get(CONTEXT)['log'][-1][1], is_user=True)
    counter += 1
    last_speaker = conversations.get(CONTEXT)['log'][-1][0]
    out = conversations.generate(CONTEXT, {last_speaker})

To get it to work, you need to pip install git+https://github.com/lifelike-toolkit/lifelike.git streamlit streamlit_chat for lifelike and streamlit, as well as the llama-cpp model itself, which might be difficult to get. However, if you have the technical know-how, you should be able to replace it with the llm of your choice without much issues. We'll make sure to streamline this process down the line.

As for the development of Cool Overprotective Dad, I have passed it off to a web developer I know who is infinitely more skilled and knowledgeable at designing a frontend than I do. Connor is also on vacation, writing the rest of the game in the meantime, so hopefully you can expect the full game within the next few weeks.

This blog has taken quite a bit of time to write, and I suspect it might be to read as well. So thank you so much for sticking with me and Mustafa as we work on some of the most ambitious project for a bunch of undergrads. I really hope that the interrogation demo and Cool Overprotective Dad can be released as soon as possible. There are some difficulty with Cool Overprotective Dad's release, so for now, if you want to play it, use the guide in its repository. Otherwise, if you want to follow our development, follow Mustafa on Twitter and join our Discord channel! Again, thank you so much for reading, and I will (hopefully), see you next week!

Blog

Lifelike Toolkit Devblog #2: Mining for Diamonds

Khoa Nguyen

Join Our Newsletter. No Spam, Only the good stuff.

Related