Switch Game or Not?
Mark
Posted on December 20, 2018
I'm a big fan of the podcast CAGCast which is all about video games. A few months back they invented a game called 'Switch Game or Not?' where one host would inject fake Nintendo Switch games into that week's list of new releases. Why is this fun? Because so many Switch game names are ridiculous. 'Almightree: The Last Dreamer', 'Hunter's Legacy: Purrfect Edition' and 'Oxyjet' are all examples of games that came out last week. It's so easy to slip in ridiculous game names not have it be obvious what's real and what's fake.
I've fiddled around with using Markov chains to generate new text (words, tweets, etc.) and found it to be pretty quick and easy. Markov chains are a way of moving between states stochastically (randomly). Now, they way they're normally setup, there's no memory. So, if you had a Markov chain for the weather you might have three states, Sunny, Cloudy and Rainy. Each state (say Cloudy) would have a probability of moving to a new state (Sunny or Rainy) or staying the same, the total probability would be 1 (because everything either changes or stays the same, guaranteed). In this case, it doesn't matter if the previous state was Rainy or Sunny, the fact that it is current Cloudy is all that's needed to determine the next state.
When using Markov chains on text, you could think of this as generating a sequence of tokens (letters, words, etc.) where the current token informs the probability of new tokens. If we built a Markov chain on the set of words ['hi','hello','help'] and tokenized on letters then when we're in the state of 'h' there would be a 33% chance of moving to 'i' and a 67% chance of moving to 'e' and a 0% chance of moving to any other letter (or staying the same). For the state of 'l' there would be a 33% chance of moving to 'o', 33% of moving to 'p' and a 33% chance of staying 'l'. For a more complex vocabulary this will become a lot more complex. The other thing to consider is adding a character which will generate the first letter and a terminating sequence to know when the word is done. So we might pad our words to look like ['hi','*hello
','*help`'], this way we can start a new word from the state of '' and as soon as we generate a '`' we know we're done.
I mentioned that Markov chains don't have a memory, so working off of a single letter state, even in our simple vocabulary we could chain together a lot of 'l's which makes for things that don't look like words. So we can build in a memory by having a sliding window of characters. Instead of 'help`' having 6 states of '', 'h', 'e', 'l', 'p', and '' we could use two characters to predict the next single letter '**', '*h','he', 'el', 'lp', 'p
', '' (I increased the padding to reflect the size of the window). This helps ensure that 'll' will always move to a letter other than 'l'.
Given a vocabulary then, we need a way to identify all sliding windows and the next token that is predicted. Traditionally this is stored in a matrix, but in our case we have so many possible current states and so many next states (many of which will be empty) that it may not be efficient to store it like that. In Python, I used a dictionary with current states as the keys and stored next tokens as lists.
In the code below hopefully you can follow my process. I take a list of Nintendo Switch game names (I built mine on the complete list of game names) and for each, pad it with my 'memory' window of '' at the beginning and '`' at the end (these characters are rarely used and the few '' that are used I stripped out). Then I walk through each title breaking it into windows and append the next window to a dictionary using that key. When this is done, I have a raw list of each next character which contains a lot of repeats. So, I calculate the cumulative probability of each next token (to support randomly picking the next character). In the end I have a nice JSON type structure of a dict of dicts, which I write out to a file.
from collections import Counter
from functools import reduce
import re
import json
def parse_name(name, n, nl):
name = "*"*n+name+"`"*n
l = name.__len__()-n
for i in range(l):
try:
nl[name[i:(i+n)]].append(name[i+n])
except KeyError:
nl[name[i:(i+n)]] = [name[i+n]]
return nl
def calc_percent(n):
ct = dict(Counter(n))
p = [v / sum(ct.values()) for v in ct.values()]
cp = reduce(lambda c, x: c+[c[-1] + x], p, [0])[1:]
ky = list(ct.keys())
output = {}
for ir in range(cp.__len__()):
output[ky[ir]] = cp[ir]
return output
def gen_markov(known, n):
next_letter = {}
for k in known:
k = re.sub("[\*\(\)]", "", k)
next_letter = parse_name(k, n, next_letter)
for i in next_letter.keys():
next_letter[i] = calc_percent(next_letter[i])
with open("markov_probabilities_{}.json".format(str(n)), 'w') as outfile:
json.dump(next_letter, outfile, indent=4, sort_keys=True)
return 0
known_games = ["SEGA AGES Phantasy Star","RollerCoaster Tycoon Adventures","ACA NEOGEO METAL SLUG 5","Hyperide: Vector Raid","Peace, Death! Complete Edition","InkyPen","Almightree: The Last Dreamer","TRYBIT LOGIC","Firewatch","Rain World","Nippon Marathon",Omensight: Definitive Edition","Atari Flashback Classics","Donut County","Venture Towns","Lazy Galaxy: Rebel Story","City Builder","Oxyjet","Knights of Pen and Paper Bundle","RAZED","MIND: Path to Thalamus","Race Arcade","Blue Rider","Clue: The Classic Mystery Game","Hunter's Legacy: Purrfect Edition","Party Arcade","Starman","Mahjong Solitaire Refresh","Sheltered","Knights of Pen & Paper 2 Deluxiest Edition","Big Bash Boom","GRIS","Arcade Archives ATHENA"]
_ = gen_markov(known_games, 4)
Now that I have the data necessary to run a Markov process, I need a way of generating new names. Load in the json file we saved off earlier, generate our starting state ('****' here) and use a while
loop to keep going until we hit the terminating state. The advantage of having a dict of dicts is we can quickly pull out all possible next states (we've removed all the 0's), generate a single random number and the pull out the key for the first value that's greater than the random number.
import json
import random
def choose_letter(s, p):
r = random.random()
p = probs[s]
i = min([i for i, x in enumerate([r < l for l in list(p.values())]) if x])
return list(p.keys())[i]
def gen_title(title, n, p):
keep_going = True
while keep_going:
if title[-n:] == "`"*n:
keep_going = False
else:
title = title + choose_letter(title[-n:], p)
return title[n:-n]
memory_length = 4
with open('markov_probabilities_{}.json'.format(str(memory_length))) as jsonfile:
probs = json.load(jsonfile)
start = '*' * memory_length
out = gen_title(start, memory_length, probs)
print(out)
Now, there's a lot of other stuff that needs to be done to help improve this. For example, we don't want our process to produce an actual game name, or even something too close. We don't want absurdly long game names either or too much punctuation. So, I implemented some checks on that which you can see in my GitHub repo.
For the final implementation I threw together a website using node.js, Express, Angular (1.6, I know...), and MySQL on some sweet Digital Ocean hardware. I even followed a tutorial and 'dockerized' my node.js app (another post on that later maybe). The whole project is on GitHub where I'm not yet using feature branches or any CI/CD stuff, so please don't judge too hard. Also, I'd appreciate feedback, especially on the web dev stuff.
Overall, it was a fun experience. I shared the app with the hosts and they loved it (now I'm hoping for a shoutout on the show), so we'll see if it has any legs. It's always great to go from idea to execution in just a few days. The most frustrating part about it, for me, was the web dev part. There are some bits of node.js and express that I get for making APIs, but not all of it. And Angular is always a frustrating experience for me, but since web dev isn't my day job or even my side gig, it's hard to justify learning something else right now.
Posted on December 20, 2018
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.