Towards an OpenGL Music Visualiser

Computer graphics are amazing. I adore the demoscene aesthetic and one day want to get to the point where I'm not just able to make something functional, but it becomes expressive.

My side-piece at the moment is OpenGL, with the aim of creating a 3d demo which responds to music in realtime, inspired by the visuals of Tame Impala and deadmau5 which add a whole other dimension to the gigs.

Step one: find a way to create a basic demo. This turns out to be so insanely complex it's laughable - even starting out with a cube in c++ using OpenGL took hours because it's not a cube, it's 16 individual triangular meshes with their own vertex and fragment shader instances, buffers and bindings. And the camera, hah! Exactly what is the difference between model and world space, and how do I know whether I need an ortho or perspective camera? Let alone trying to find the right dimensions in this x,y,z space I'm trying to imagine - with every rebuild taking ~10 seconds. In the end I ended up giving up on c++ because of my lack of basic understanding of how the APIs worked and really, core GPU architecture. It's not like you can dynamically pass around object references and write to multiple buffer arrays simultaneously like I'd expected, you have to do everything procedurally when interacting with the GPU - which is fine but I found myself spending so long looking up the correct way to manage these things that it was aggravating. This is meant to be creative.

I'd come across three.js a few times, and always thought it was amazing that you could get that kind of performance from WebGL so I flocked to that. We still have the same core concepts - cameras, geometries, materials, meshes, scenes; but now it's all in a familiar language and instantly interpreted. This move took me from "damn that point is a little bit out" to "looking good, what do I want it to do next" - which is to be expected as I'd just gone up several layers in the stack. The end result is psedge.github.io/demo - a real working web 'demo'.

Getting to that point wasn't as easy as I make out. The main sticking points for me:

Frame rates: it turns out different shaders have different performance impacts, and even though I'm only dealing with ~100 meshes at a time, using anything except BasicMaterial slowed renders down to a potato. This was simply solved by reducing either the material quality or the number of objects in view at any one time, and changing from Spotlights to DirectionalLight (much less intensive to calculate).
Animation speeds: I still don't understand what role clocks play in requesting animation frames / renders. Let's say I have an animation that rotates a cube 90 degrees - I change the mesh rotation and request a frame, waiting until 1000/{FRAME_CAP}ms has passed if it hasn't already. However, in the situation where FPS < FRAME_CAP the animation is going to take ages. Eg. if we can only get 30FPS then the animations will take twice as long, which is unsuitable for time-sensitive animations. I think the solution to this is to make the rotation a function of the time elapsed since we started and desired total. My problem is that I understand the problem and know it must be a really common question but don't know how to ask. edit Yep, this is the correct way.

Step two: get audio. I've looked at pulseaudio for a previous project but that was only for controlling audio levels on a Ubuntu machine over WiFi, not for taking audio input. At this point, it's going to be necessary to re-evaluate the usefulness of three.js and WebGL, because the HTML5 Audio APIs might be a limiting factor. I've come across examples which request an audio context and start listening in a Web Worker, passing frequency data back to the graphics thread but the lack of stackoverflow/reddit posts make me hesitant this is going to be easy to make work out of the box. I'm not sure whether to do this on a local server and use WebSockets to communicate with three.js, or to move the graphics to a proper client program as well.

Step three: analyse audio. This is where my complete lack of signal processing knowledge comes into full view. I understand that from a technical stance I need to record the audio from the mic for a set time (sample window) then run a Fast Fourier Transform on it to get amplitudes of different frequencies. I have no idea about what type of values I'm going to get from that, or how to choose frequency bands - whether that should be fixed or dynamic etc. There are apparently a load of libraries available in most languages for this, some specifically for audio analysis - but more research is needed.

Step four: take analysed audio and use it in demo. At this point, the functional requirements are complete, we have the component parts to make a working visualiser, we just need to decide what to do with it. The problem here is that apparently for the warm "realtime" feeling we need to achieve a sub-10ms delay from audio heard to frames being drawn. At our likely 60fps cap each frame is being drawn every 1ms, allowing our server and WS connection a generous 9ms for everything /s. The way gig visuals probably get around this is obviously to analyse and pre-render video, then sync on play.

Needless to say I've really started to appreciate the games I play I lot more. I had no idea how complex this world was and just how much prerequisite knowledge was required across so many areas - not just soft framework knowledge but hard maths/mechanics understanding. I always used to see Game Design as a meta-topic on old PHPbb forums and wonder why it was so common to see a dedicated section for it - now I know. Also, I've decided that I enjoy at least using three.js for the rapid prototyping stage of the visual development, even if it requires then porting to c++ for performance reasons - personally I feel like for me it's the intermediate stage between mental visualisation and productionising.

Blog

Towards an OpenGL Music Visualiser

a little Fe2O3.nH2O-y

Join Our Newsletter. No Spam, Only the good stuff.

Related