Felix Hildén
Posted on September 6, 2020
What if your playlists were ordered by track similarity? No Taylor Swift followed by death metal, we can rearrange tracks in terms of loudness, mood, tempo and other factors. Naturally, we could also create many playlists that each contain only one type of music, but where's the fun in that? Let's get down to some data analysis with Python and Spotify. Their Web API contains lots of neat features, including the ability to analyse songs and manipulate a user's playlists.
First, to get us up and running, we'll need access to the API. This is done by authorising our application to have access to our own Spotify account. If you want to follow along, we are using a client library for the API, Tekore, to make interacting with it a lot easier. For the sake of brevity, I won't explain details about the API here, but more information can be found in Tekore's documentation.
import tekore as tk
# API credentials
client_id = 'your_client_id'
client_secret = 'your_client_secret'
redirect_uri = 'https://example.com/callback'
conf = (client_id, client_secret, redirect_uri)
token = tk.prompt_for_user_token(*conf, scope=tk.scope.every)
Once we've retrieved the token, we are free to use the API. So let's start by pulling tracks of a playlist and analysing them. You can swap in your own playlists of course. The ID of a playlist can be retrieved by right-clicking a playlist and choosing "share > Spotify URI" and deleting everything else before the actual ID.
playlist = '37i9dQZEVXbMDoHDwVN2tF' # Global Top 50
spotify = tk.Spotify(token, max_limits_on=True, chunked_on=True)
items = spotify.playlist_items(playlist)
tracks = [t.track for t in spotify.all_items(items)]
analysis = spotify.tracks_audio_features([t.id for t in tracks])
analysis
contains audio features for each track in the playlist. They look like this:
AudioFeatures with fields:
acousticness = 0.0194
danceability = 0.935
duration_ms = 187541
energy = 0.454
instrumentalness = 0
key = 1
liveness = 0.0824
loudness = -7.509
mode = 1
speechiness = 0.375
tempo = 133.073
time_signature = 4
valence = 0.357
Now we can get to the good stuff. Let's first choose the most important aspects of songs to be considered similar. You can choose your own and see how the results change. For me, duration, key, liveness, mode and time signature are second-class citizens when compared to the other features. Let's select everything else and convert the features into a NumPy array for easy numerical operations.
import numpy as np
cols = [
'acousticness', 'danceability', 'energy', 'instrumentalness',
'loudness', 'speechiness', 'tempo', 'valence'
]
features = [tuple(getattr(a, col) for col in cols) for a in analysis]
data = np.array(features)
It is important to note that while other features chosen above are between 0 and 1, loudness and tempo have much larger scales. Normalising them keeps things balanced when choosing the song ordering. We'll also use Principal Component Analysis to bring the problem dimensionality down to two for easy visualisations.
from sklearn.decomposition import PCA
for i, col in enumerate(cols):
if col in ('loudness', 'tempo'):
minimum = data[:, i].min()
data[:, i] = (data[:, i] - minimum) / (data[:, i].max() - minimum)
pca = PCA(n_components=2)
data = pca.fit_transform(data)
So... How can we represent and solve the problem of transitioning from song to another smoothly? Well, it's in the title already, so I'll cut to the chase. Since we have this data available, we can represent each track with its distance to each other track. And how can we most efficiently transition through each track? By minimising the total distance "travelled" once each song is played. Sounds a lot like the Travelling Salesman Problem. It's a hard problem, but for small numbers of tracks we can do it quite effectively by using an approximation algorithm, two-opt for example. I'll provide my implementation here, though figuring it out is a good excercise.
def two_opt(
distances: np.ndarray,
tolerance: float = 0,
verbose: bool = False
) -> np.ndarray:
n_nodes = distances.shape[0]
current_route = np.concatenate((np.arange(n_nodes, dtype=int), [0]))
candidate = np.zeros(n_nodes + 1, dtype=int)
best_cost = np.sum(distances[current_route[:-1], current_route[1:]])
cost = np.inf
while True:
for i in range(1, n_nodes):
for k in range(i + 1, n_nodes + 1):
candidate[:i] = current_route[:i]
candidate[i:k] = current_route[k-1:i-1:-1]
candidate[k:] = current_route[k:]
cost = np.sum(distances[candidate[:-1], candidate[1:]])
if cost + tolerance < best_cost:
break
if cost + tolerance < best_cost:
current_route[:] = candidate
best_cost = cost
if verbose:
print(f'2-opt: node={i}/{n_nodes}, cost={best_cost}')
break
else:
break
return current_route
Now we can use it to solve the optimal route, or at least our best guess at the optimum. First we'll construct the distance matrix, then feed it to the algorithm. If you have a large playlist, you can set the threshold
argument to reject the smallest improvements to the route to hopefully speed things up a bit.
from scipy.spatial.distance import pdist, squareform
distances = squareform(pdist(data))
route = two_opt(distances, verbose=True)
Visualising results is important. Let's see how our algorithm performed. We'll plot the PCA-reduced data and the chosen path along with song names.
from matplotlib import pyplot as plt
plt.figure()
plt.plot(data[route, 0], data[route, 1], 'r')
plt.scatter(data[:, 0], data[:, 1], s=10, c='k')
for i, t in enumerate(tracks):
plt.text(data[i, 0], data[i, 1], t.name, fontsize=6)
plt.xticks([])
plt.yticks([])
plt.show()
And it seems indeed that the playlist was ordered successfully:
I don't know much about these songs, but running the algorithm for my own playlists convinces me that it is doing the right thing! Finally, we'd of course want to listen to this version of the playlist. Let's create one that has the new song order.
# Leave the longest distance between the last and first songs
route_dist = distances[route[1:], route[:-1]]
worst = np.argmax(route_dist)
new_route = np.concatenate((route[worst + 1:], route[1:worst + 1]))
# Reorder tracks
reordered = [tracks[i].id for i in new_route]
uris = [tk.to_uri('track', i) for i in reordered]
# Create playlist
me = spotify.current_user()
pl = spotify.playlist_create(me.id, 'Reordered playlist')
spotify.playlist_add(pl.id, uris)
That's it, I hope you enjoyed following along and had some fun!
Posted on September 6, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.