Spotify + AI = Tune Genie
JOOJO DONTOH
Posted on May 12, 2024
Inspiration
Even though I enjoy music, I don't listen to it as frequently as most people do. I also tend not to explore new music extensively because I prefer avoiding disappointing tracks. In contrast, my girlfriend is a true music enthusiast who listens to a wide variety of genres from around the world on a regular basis. She played SZA by Snooze in the background during one of our evening conversations, and I must say, I thoroughly enjoyed the song. After replaying the song continuously for about a week, I decided to search for similar tracks. I found it frustrating to look up songs individually and then add them one by one to a playlist. This process seemed unnecessarily tedious. As a software engineer, I began to think about how to simplify the entire process of getting song suggestions, searching for songs, and creating playlists. This is how Tune Genie was born.
Requirement clarification
Functional requirements
Songlist Suggestions
- Users should be able to provide minimal input, such as country, genre, mood, activity, and year, to receive song list suggestions.
- Users should be able to view the generated list of suggested songs.
Playlists
- Users should be able to import the suggested song list into a music service and save it as a playlist.
- Users should be able to view all playlists they have created.
- Users should have direct access to their playlists within the music service through the app.
- Users should be able to preview the songs in any created playlist.
Packages
- Users can view various available packages within the app.
- Packages are tiered based on price, offering different levels of service.
- Each package specifies the number of playlists a user can create each month and the maximum number of songs per playlist.
Subscriptions
- Users can subscribe to services by choosing from the available packages within the app.
- The application should manage user subscriptions and automatic billing (auto-debits).
Non-Functional Requirements
- Music Search Efficiency: Searches within the application should be fast, providing users with prompt results.
- Songlist Suggestions: Suggestions should appear random yet strictly adhere to user inputs such as country, genre, mood, activity, and year.
- User-Application Integration: Interaction between the user and the application should be seamless and efficient, with minimal latency.
- Page Load Speed: Pages should load quickly to ensure a smooth user experience.
- Traffic Management: The system must efficiently handle large volumes of requests to core APIs without degradation in performance.
- External Service Protection: Rate limiting and caching should be implemented to protect external services from excessive calls.
- Data Security: Third-party user data must be securely handled and protected against unauthorized access.
- Token and Permission Security: Third-party tokens and permissions should adhere to the principle of least privilege, ensuring that the application only has access necessary for functionality.
High level design
The high-level design outlines the core components of the application, as illustrated in the diagram below. Users interact with the application through the frontend interface, which supports various actions such as logging in, searching, and viewing playlists.
The frontend communicates with multiple backend APIs to execute logic tailored to different user scenarios. The backend facilitates numerous interactions with external services, including:
- Federated Login: Integrations with services like Spotify to manage user authentication.
- Music Service Provision: Connections to music services such as Spotify to fetch and manage music content.
- Payment Processing: Integration with PayPal to handle transactions related to packages and subscriptions.
- Song list Suggestions: Utilization of a large language machine learning model to generate music suggestions based on user input.
- Data Management: A well structured data store to maintain persistent state information for users.
These components work in tandem to ensure a seamless and efficient user experience across the application.
Tools
Node
I opted for the JavaScript ecosystem, using Node.js and Koa.js for the backend framework, largely due to my familiarity and comfort with these technologies. My extensive experience with Node.js in numerous projects has given me a solid understanding of its architecture and capabilities. Node.js is particularly well-suited for handling I/O-intensive operations, making it an ideal choice for this application, which requires frequent interactions with multiple external services. The efficiency and ease of building with Node.js further reinforce its suitability for managing the demands of this service.
React
I selected React.js, specifically Next.js, as the frontend framework for this application primarily for its server-side rendering capabilities and support for page caching. These features are critical to meeting the non-functional requirements related to enhancing user experience, such as fast page loads and efficient data handling.
Also, my familiarity with React.js and Next.js significantly influenced this choice. These technologies streamline the development process, making it easier to build and maintain the application due to their flexible ecosystem and comprehensive documentation.
I was faced with the decision of whether to develop a native application or a web application. I opted for a hybrid approach by choosing Next-pwa, which is built on Workbox. This tool allows for a native-like experience by enabling users to bookmark the web application on their smartphones as if it were a native app, while also providing a full-screen view without the browser controls and panels. This solution bridges the gap between web and native applications, combining the accessibility of web technology with the immersive user experience of a native app.
Github
I use github because its easy and available. I personally like the workflow functionality that manages CI/CD for me. Most other repository managers have this too.
GCP
Currently, my primary cloud provider is Google Cloud Platform (GCP), chosen mainly due to my familiarity with its services and controls. Within GCP, I have selected Cloud Run to host both my frontend and backend. Cloud Run is particularly well-suited for my needs as it allows me to specify the minimum number of instances and scales horizontally to accommodate varying loads. This level of horizontal scaling control is adequate for my requirements, as I do not need the finer granularity of management that Kubernetes or a dedicated virtual machine setup might offer in a microservices architecture.
Paypal
I selected PayPal as our payment processor due to its widespread popularity and comprehensive documentation. PayPal offers an easy-to-use abstraction layer for managing payments, packages, and subscriptions, which significantly simplifies implementation. In addition, like many payment systems, PayPal utilizes the saga pattern to manage long-running payment processes. This is achieved through the use of webhooks, which allow our application to respond appropriately to various payment events, ensuring smooth and reliable transaction handling.
ChatGPT
For the development of Tune Genie, a crucial component was integrating a language learning model (LLM) to enhance interaction capabilities and provide dynamic content responses. After evaluating various options, ChatGPT, specifically its underlying model gpt-3.5-turbo, was selected for this purpose.
- Model Choice: GPT-3.5 Turbo: This model excels in delivering high-quality outputs at a faster pace than its predecessors, such as text-davinci-003, aligning with the speed requirements outlined in our non-functional requirements. It is particularly well-suited for Tune Genie's specific needs, which include generating a diverse song list tailored to user preferences. Among the three models evaluated, including text-davinci-003 and Ada, GPT-3.5 Turbo not only yielded the best results but also efficiently formatted these results into JSON objects. This capability to process moderately complex queries quickly ensures that user interactions are both seamless and precise, enhancing the overall user experience.
Mitigating Drawbacks and Limitations:
Integrating GPT-3.5 Turbo into the Tune Genie application provides a good example of how understanding specific use-case requirements can mitigate some common drawbacks associated with using advanced AI models. Here’s how the specific context of Tune Genie aligns with the limitations of GPT-3.5 Turbo and how it addresses them:
Privacy Concerns: In the context of Tune Genie, where queries do not need to be private and are primarily related to generating music suggestions, the typical data privacy issues associated with sending data to an external AI service are less concerning. This alleviates major privacy worries as the data involved is non-sensitive and user-specific confidential data is not being processed.
General-Purpose Model Suitability: The use of GPT-3.5 Turbo as a general-purpose model is particularly advantageous for Tune Genie. Since the application’s core functionality—generating music suggestions—is not heavily specialized, the model’s general capabilities are sufficient and effective. This avoids the need for specialized AI models, which can be more complex and costly to develop and maintain.
Cost-to-Efficiency Ratio:
For Tune Genie, the cost-to-efficiency ratio of employing GPT-3.5 Turbo is highly advantageous. Despite the inherent costs associated with this model, its value is clearly demonstrated through the efficiency and quality of its outputs—especially with respect to the speed and relevance of music suggestions. This alignment ensures that it remains a cost-effective solution for the application's requirements, effectively balancing performance with operational expenses. Also, the subscription packages available to users not only extend greater access to the GPT-3.5 Turbo but also assist the Tune Genie team in managing these costs more effectively.
- Handling of Rate Limits and Vendor Lock-In: While GPT-3.5 Turbo is subject to rate limits, Tune Genie has implemented well thought out caching strategies to effectively manage these constraints. By caching commonly requested data and responses, the application reduces the number of API calls necessary, thereby minimizing the impact of rate limits. This approach not only ensures smoother user experiences during high-demand periods but also optimizes API usage to stay within the operational capacity that GPT-3.5 Turbo supports. Additionally, the use of caching as part of the application’s architecture contributes significantly to managing operational efficiency. It reduces the dependency on real-time API calls for every user interaction, thus not only conserving resources but also enhancing performance by providing faster response times. This strategic use of technology underscores a thoughtful approach to mitigating the risks associated with vendor lock-in and the limitations imposed by rate limits, ensuring that Tune Genie can deliver a consistent and reliable service to its users.
Music service
The Tune Genie team plans to begin by integrating with Spotify and later expand to include additional music services such as Apple Music. The choice to start with Spotify was driven primarily by the team's familiarity with the platform and the relative ease of implementation. Spotify offers great APIs for music search, music preview, and playlist creation, which align well with Tune Genie's functionalities. In addition, incorporating Spotify for federated login significantly simplifies the user onboarding process, enhancing the overall user experience by streamlining access and interaction.
Expanding high level design
The diagram below details the interaction between all core components of the system.
Registration and authentication ceremony
The image above explains the registration and authentication ceremony between the different layers as well as the the external service (spotify). Tune Genie leverages federated login from Spotify in addition to extracting vital user information such as spotify IDs, package types and emails. The information along with the tokens provided by spotify are used to perform actions in spotify on behalf of the user
Core functionality
The image above explains the core functionality or functional requirements of Tune Genie. It provides a guided sequence of user actions from input entry to playlist generation. It also sheds light on the system's integration with chatGpt and the involvement of spotify's playlist widget.
Payment and subscription
The image above explains the relationship between Tune Genie and the chosen payment subscription service. Though Tune Genie a number of trial song list generation attempts, it also provides a subscription model for users may want decide to use the service beyond the number of trial attempts. Tune Ginie not only leverages the payment gateway of paypal to execute one time payments, it also uses paypals subscription model to track subscription statuses and obtain recurring payments from users.
Why it never shipped
Spotify offers a development platform where engineers can register their application to obtain a client ID and API key. However, these credentials do not grant unrestricted access to user accounts. To perform actions on behalf of users within the sandbox environment of limited users, applications must whitelist user emails to enable access to user tokens during the registration or authentication process through Spotify's federated login. As part of this, to increase the allowed quota, Spotify requires developers to apply for an extension, justifying the use of their service within the application. My initial application was criticized for errors in importing artist album artwork. Although I addressed these issues, my subsequent application was met with a warning against using Spotify's content with any large language models (LLMs) or AI tools. This effectively meant whole application had to be build on the back of another platform.
Lessons
While I am proud to have single-handedly built this entire application, some thorough research about Spotify's policies regarding AI tools would have better set my expectations before I began this engineering journey.
During the development process, I gained valuable experience building a Progressive Web App (PWA) and explored various enhancements to improve its performance. Optimizing the front end to be lightweight, fast, and fluid involved a strategic combination of managing configurations, implementing effective caching, and minimizing backend requests to only those necessary.
I had the opportunity to work with GitHub Actions, which proved to be a seamless and enjoyable tool, especially when integrated with Google Cloud Platform's Cloud Builds.
A completely new challenge for me was implementing PayPal's payment and subscription models. Although it was somewhat complicated, I am grateful for the experience of integrating complex payment processes and record-keeping systems into the application.
Conclusion
In conclusion, developing Tune Genie was an exciting mix of passion and technology aimed at making music discovery easier. This project tackled a common issue for music lovers and used somewhat advanced technology to create a smooth and engaging experience. The app combines different requirements to be efficient, quick, and user friendly. Even though there were challenges with Spotify and using AI models, Tune Genie sets a meaningful precedence similar apps. Tune Genie shows that combining music with LLMs opens up exciting ways to enhance how we find and enjoy music. See you in the next one 😁
Posted on May 12, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.