Read on for the complete notes/transcript of the talk (in hopefully more coherent form than the talk itself – next time I promise to spend less time on the flashy demo and more time figuring out exactly what I’m going to say…)
Right now everyone’s jolly excited about the HTML5 audio element. At last we have a standard-compliant way to drop audio clips into web pages that avoids all the gunk with external plugins and replaces it with a simple tag (well, once you’ve fought through the details of which browsers support MP3 versus Ogg anyway). It has a comprehensive API to handle all the details of buffering, programatically pausing, playing and skipping and so on – but one thing it stops short of is being able to generate the audio data on the fly, within the browser.
Why would you want that? Well, I can’t say why you’d want it, but I can tell you what I’m hoping to do with it: I’m involved in the demo scene, a community of programmers, artists and musicians who create digital art – something roughly like music videos, but with visuals generated in real time – and I’m working on a forthcoming website that will showcase those productions. This community originates from the days of the Commodore 64, when people cracked games and added little intro animations to promote themselves, which became more and more elaborate as rival groups tried to outdo each other, until they evolved into full-scale artistic creations and the game cracking side of things took a back seat. And among the artefacts to come out of this community is a hell of a lot of music – we’re talking hundreds of thousands of tracks, all preserved in the native formats of the Commodore 64, and the Amiga, and all sorts of other things. And it would be quite neat to be able to play all of these from within my website.
I’m using this site as an excuse to play around with cool technologies, and at first I figured that this was an ideal job for Amazon EC2 – set up a bunch of instances churning away in the background converting these files to MP3. However, when I started to learn about ways to generate audio in the browser, it made sense to take advantage of that and save myself a whole lot of up-front processing (not to mention bandwidth).
At the forefront of this new development is the Mozilla Audio Data API, available in the latest nightly builds of Firefox. This extends the HTML5 audio API with a few new methods, the central ones being mozSetup – which allows you to initialise an empty audio stream with a specified sample rate and number of channels – and mozWriteAudio, which lets you pass in an array of floats representing some sample data to add to that stream. Like all good up-and-coming browser innovations, we can reasonably assume that once the Mozilla developers have got this API stable enough they’re going to submit it to WHATWG for inclusion in the HTML5 spec – but for the moment, it has somewhat limited adoption. There is a remedy for that though…
SNES NES emulator, featuring a whole host of advanced optimisations and new features, including audio support.
To achieve this, he created the dynamicaudio.js library, which sits on top of Mozilla’s Audio Data API, but also provides an invisible Flash widget for other browsers to fall back upon. Armed with this work, I was able to build the first step towards my goal: jsmodplayer, a player for the MOD music format originally introduced on the Amiga. It’s a rather messy format to implement, with every man and his dog coming up with their own custom extensions to it in a very ad-hoc way, but at its heart it consists of a set of uncompressed wave samples (typically a second or two in length), and a script detailing when to trigger them and at what pitch. Put enough of those trigger events together, throw in some effects like volume control and pitch slide, and you have a song.
This does depend on us being able to generate the audio up-front before starting playback, so it’s arguably not truly ‘real time’ – something like an emulator, which is generating audio in response to user interaction, couldn’t really do this. It’s good enough for our straightforward audio player, though.
For a while there was an unfortunate flaw in this plan: Chrome didn’t currently support WAV as an audio format. The bug tracker ticket relating to this featured some rather flimsy arguments defending this decision, such as “if we support WAV, people will start widely serving audio across the web as uncompressed WAV files” (um… just like everyone on the internet is using BMP files, which are supported by all major browsers, right?). Given the tendency of bug tickets to wander off onto unrelated subjects, it’s hard to tell what the eventual conclusion to this is – but if I’m reading it right, we can happily use WAVs as of Chrome 7.
(At this point, someone rightly mentioned that Internet Explorer imposes a 32K limit on URLs, so that won’t get you much of a wave file – especially with base64 or URL encoding to contend with. As such, all we can really do is hope that IE users will tend to have the Flash option available. Personally, I think it makes a refreshing change that we can even consider IE in our cutting-edge browser hacks once again…)
One project of mine where I really should have seen the value of generating the audio up-front was Fake Plastic Cubes, a demo I wrote this summer to play around with the size reduction tricks I’d seen coming out of the 10K Apart and JS1k contests, and to try and come up with something audiovisual in as small a size as possible. On the audio side, that meant ditching samples entirely, and doing the synthesis from first principles, building the sound up from plain sine waves – but in a classic case of project management fail, I spent a week building a really wonderful synthesiser framework and rushed everything else at the last minute, meaning that I had no chance to actually make something nice on top of it. As a result, it’s totally unpolished, and the animation keeps stuttering for a split second while it generates the next chunk of audio. It’s only a tiny fraction of the available processor time, but it’s enough to be very, very noticeable. I really should have just generated the entire audio track on startup and then kicked off the visuals – but there was no time for that, or to play around with web workers which would seem to be another potential way to generate audio as a background process…
A MIDI file consists of a simple header, followed by a list of tracks – each of which is a list of timestamped events, which are usually note on/off events but could be a tempo change, change of instrument or various other things. The timestamps are actually a bit weird: you’d expect them to be given in something like microseconds, but they’re actually expressed as a number of ‘ticks’, where the MIDI file specifies a particular number of ticks per beat, and the tempo of the song is given in beats per second, which can change over time just to add further confusion. Ultimately it probably does make sense, because it means you can accurately use any rational number (within reason) as a tempo, and that’s handy for professional MIDI equipment that has to keep exact time for extended periods – it’s just an initial hurdle that you have to get over. Once you’ve got that in place, a MIDI synthesiser boils down to a set of generator functions that can emit audio waves for as long as you tell them to, and a main loop which picks events off the queue, running the generators until it’s time to process the next event.
For this first release, the generated sounds are not particularly interesting – just plain sine waves with a bit of attack/decay volume control applied – but now that we’ve got the initial framework in place it should be relatively straightforward to add more diverse sounds, and the synth engine should hopefully be flexible enough to support effects like harmonics and reverb.
Finally, as a glimpse of what’s in store for in-browser audio creation in the future, keep an eye on the Mozilla Rainbow project, which provides APIs for capturing audio and video from microphones / webcams. For the last few years I’ve been taking part in February Album Writing Month, a song writing community where collaborations over the internet play a major part – perhaps it won’t be too long until we’re doing that over a Google-Docs-style online equivalent of Audacity / GarageBand…?