Reinforcement Learning with Video Streaming
Simon Zeng
Posted on May 20, 2020
First, I wanted to include a quick disclaimer that I worked on this project with my teammates and am in no way solely responsible for this project – it was very much a team effort!
Since I'm a new grad (Spring 2020), this was written for the Github Graduation program. Thanks to them for setting it all up and for giving me the motivation to reflect on some projects!
Background Information
In video streaming, there exists a client server (such as a user's computer) and a video server (where the video being streamed resides). In order to for streaming to occur, the client periodically requests individual video chunks at a time from the video server. These video chunks contain a few seconds of video and can be requested at particular bitrates/qualities (like 240p, 480p, 1080p, etc). These chunks gradually fill up a buffer, which gets continually drained by the client server as the user is watching the video. It's important to note that if network conditions can dictate how fast video chunks are sent to the buffer and that higher quality bitrates lead to longer wait times.
However, since the rate at which the buffer is filled versus that at which it's drained is different, the buffer can become empty. If the buffer becomes empty, then the user experiences rebuffering time (the spinning symbol we all hate).
The difficulty with this is that user experience is especially important: maintaining a high video quality, minimizing rebuffering time and keeping a steady consistent stream of video quality (we don't want constant jumps in quality). This is where adaptive bitrate algorithms come in, where an algorithm is used to request video chunks at certain bitrates depending on certain network conditions.
ABR algorithms were the focus of our project, namely trying to use reinforcement learning to create a robust algorithm.
Demo Link
Below is a slide deck that was used in our final presentation. There are also video demonstrations on slides 23 and 24.
https://drive.google.com/file/d/1V86wm34yqRwKV_gTDjC7BHpaLGrNExlI/view?usp=sharing
Link to Code
https://github.com/blueleafysky/CloudFinalProject
How We Built It
Luckily for us, this application had been explored in the past by researchers at MIT, with their paper being published in 2017 (http://web.mit.edu/pensieve/). They also included a very robust reinforcement learning framework through which we were able to train our algorithm, so all the credit to them!
However, we wanted to look at this in a more experimental sense – what type of parameters would lead to the most robust ABR algorithm trained by the framework? What type of situations might one prefer an RL algorithm over the more standard ABR algorithms? As a result, we trained a variety of models for over multiple nights and selected the one whose behavior was consistent and indicative of the algorithm learning how to properly select bitrates to maintain a positive user experience (more about this experimental procedure can be found in the slide deck).
Once we had this, we built a video player using JavaScript, Dashjs and a Python Flask server. This video player would able to display a video while also allowing the user to select a bitrate strategy, for which there would be video playing metrics displayed (like bitrate and buffer levels). The Python Flask server was used to connect our RL model to this video player, effectively allowing for us to use the model in conjunction with this video player.
Ultimately, with our experiments, we were able to see that reinforcement learning in our case provided a very consistent result across many different videos and network conditions. However, MPC, a popular non-RL based algorithm, was able to achieve a higher user experience albeit with much larger variance. This makes it plausible that RL is preferred for situations where many different types of videos and network conditions are expected, whereas MPC might be preferred for a better performance on more expected data.
Additional Thoughts / Feelings / Stories
I primarily chose to write about this project because it's fresh in my mind and because it really pushed me outside my comfort zone. Particularly for me (not my teammates), my expertise has always been more with machine learning applications (especially NLP) in a research/experimental setting and more frontend app development, so the Cloud Computing course that I built this project for (and the project itself) really pushed me to further my networks and systems knowledge. As a result, this was a very eyeopening experience in that I learned more about the possible applications of machine learning, expanded my skillset and knowledge within an area I consider my weakness and applied my research knowledge. I'm keen to revisit this project to try out some more experiments and to fancy up the front end (we ran out of time in the semester 😢), but I'm still proud of it nonetheless!
Posted on May 20, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.