jasmid – MIDI synthesis with Javascript and HTML5 audio

The executive summary: At last weekend’s Barcamp London 8, I presented a talk entitled “Realtime audio generation for the web (because there’s not enough MIDI on webpages these days”. In it, I went over the current options for generating audio within the browser, and presented my latest hack in that direction, jasmid: a Javascript app that can read standard MIDI files, render them to wave audio (with, at present, some very simple waveforms) and play them directly from the browser, completely independently of your OS’s MIDI support.

Read on for the complete notes/transcript of the talk (in hopefully more coherent form than the talk itself – next time I promise to spend less time on the flashy demo and more time figuring out exactly what I’m going to say…)

Right now everyone’s jolly excited about the HTML5 audio element. At last we have a standard-compliant way to drop audio clips into web pages that avoids all the gunk with external plugins and replaces it with a simple tag (well, once you’ve fought through the details of which browsers support MP3 versus Ogg anyway). It has a comprehensive API to handle all the details of buffering, programatically pausing, playing and skipping and so on – but one thing it stops short of is being able to generate the audio data on the fly, within the browser.

Why would you want that? Well, I can’t say why you’d want it, but I can tell you what I’m hoping to do with it: I’m involved in the demo scene, a community of programmers, artists and musicians who create digital art – something roughly like music videos, but with visuals generated in real time – and I’m working on a forthcoming website that will showcase those productions. This community originates from the days of the Commodore 64, when people cracked games and added little intro animations to promote themselves, which became more and more elaborate as rival groups tried to outdo each other, until they evolved into full-scale artistic creations and the game cracking side of things took a back seat. And among the artefacts to come out of this community is a hell of a lot of music – we’re talking hundreds of thousands of tracks, all preserved in the native formats of the Commodore 64, and the Amiga, and all sorts of other things. And it would be quite neat to be able to play all of these from within my website.

I’m using this site as an excuse to play around with cool technologies, and at first I figured that this was an ideal job for Amazon EC2 – set up a bunch of instances churning away in the background converting these files to MP3. However, when I started to learn about ways to generate audio in the browser, it made sense to take advantage of that and save myself a whole lot of up-front processing (not to mention bandwidth).

At the forefront of this new development is the Mozilla Audio Data API, available in the latest nightly builds of Firefox. This extends the HTML5 audio API with a few new methods, the central ones being mozSetup – which allows you to initialise an empty audio stream with a specified sample rate and number of channels – and mozWriteAudio, which lets you pass in an array of floats representing some sample data to add to that stream. Like all good up-and-coming browser innovations, we can reasonably assume that once the Mozilla developers have got this API stable enough they’re going to submit it to WHATWG for inclusion in the HTML5 spec – but for the moment, it has somewhat limited adoption. There is a remedy for that though…

At my last Barcamp London, two years ago now, I gave the first public showing of JSSpeccy, my ZX Spectrum emulator written in Javascript, which has been something of a runaway internet hit – and I’ve been somewhat surprised by the number of people taking it seriously, rather than as the crazy pointless hack I built it as (not least, the guy who ripped it off and sold it on the App Store as the first ever iPhone Spectrum emulator, despite it running at about 30% speed *cough*). But one guy who’s picked up the concept of emulation in Javascript and taken it much further than I’d ever dreamed possible is Ben Firshman, who created JSNES, the Javascript SNES NES emulator, featuring a whole host of advanced optimisations and new features, including audio support.

To achieve this, he created the dynamicaudio.js library, which sits on top of Mozilla’s Audio Data API, but also provides an invisible Flash widget for other browsers to fall back upon. Armed with this work, I was able to build the first step towards my goal: jsmodplayer, a player for the MOD music format originally introduced on the Amiga. It’s a rather messy format to implement, with every man and his dog coming up with their own custom extensions to it in a very ad-hoc way, but at its heart it consists of a set of uncompressed wave samples (typically a second or two in length), and a script detailing when to trigger them and at what pitch. Put enough of those trigger events together, throw in some effects like volume control and pitch slide, and you have a song.

Now, as we all know from the Apple versus Adobe tiff, Flash is not exactly universal across the platforms we care about – so we haven’t covered all of our bases yet. However, for certain applications, there’s a possible third path (albeit one that I haven’t properly investigated yet), using another recent browser addition: data: URIs. Unlike typical URLs, which point to some external location that contains the data we want, a data: URI embeds that data directly, as a string of base64 or URL-encoded data. This means that we could generate a string containing a valid WAV file from Javascript (or indeed an MP3 or Ogg file, although generating those from Javascript is a little bit hardcore), and use that as the source of an <audio> element.

This does depend on us being able to generate the audio up-front before starting playback, so it’s arguably not truly ‘real time’ – something like an emulator, which is generating audio in response to user interaction, couldn’t really do this. It’s good enough for our straightforward audio player, though.

For a while there was an unfortunate flaw in this plan: Chrome didn’t currently support WAV as an audio format. The bug tracker ticket relating to this featured some rather flimsy arguments defending this decision, such as “if we support WAV, people will start widely serving audio across the web as uncompressed WAV files” (um… just like everyone on the internet is using BMP files, which are supported by all major browsers, right?). Given the tendency of bug tickets to wander off onto unrelated subjects, it’s hard to tell what the eventual conclusion to this is – but if I’m reading it right, we can happily use WAVs as of Chrome 7.

(At this point, someone rightly mentioned that Internet Explorer imposes a 32K limit on URLs, so that won’t get you much of a wave file – especially with base64 or URL encoding to contend with. As such, all we can really do is hope that IE users will tend to have the Flash option available. Personally, I think it makes a refreshing change that we can even consider IE in our cutting-edge browser hacks once again…)

In fact, there’s another mechanism for feeding data into URLs dynamically, which I’d all but forgotten until I started preparing this talk: if you have a ‘javascript:’ URL which returns a string when executed, that string will be used as the data. This trick was most prominently used in Wolfenstein 5K (long before the wider world caught on to the joys of Javascript size coding contests…), which constructed its display as an XBM image, an obscure text-based format. Before canvas came along, I spent many happy hours trying to replicate that trick with GIF images, discovering along the way that Internet Explorer didn’t like strings containing zero bytes, which sent me down the garden path of constructing valid GIFs containing no zeros. Ah, happy days. In short, I don’t even know if this works at all with <audio>, and even if it does, it’s almost certainly as much of a dead end as it was back in 2004…

One project of mine where I really should have seen the value of generating the audio up-front was Fake Plastic Cubes, a demo I wrote this summer to play around with the size reduction tricks I’d seen coming out of the 10K Apart and JS1k contests, and to try and come up with something audiovisual in as small a size as possible. On the audio side, that meant ditching samples entirely, and doing the synthesis from first principles, building the sound up from plain sine waves – but in a classic case of project management fail, I spent a week building a really wonderful synthesiser framework and rushed everything else at the last minute, meaning that I had no chance to actually make something nice on top of it. As a result, it’s totally unpolished, and the animation keeps stuttering for a split second while it generates the next chunk of audio. It’s only a tiny fraction of the available processor time, but it’s enough to be very, very noticeable. I really should have just generated the entire audio track on startup and then kicked off the visuals – but there was no time for that, or to play around with web workers which would seem to be another potential way to generate audio as a background process…

I also didn’t have time to compose a decent soundtrack, or experiment with the synth enough to come up with interesting sounds, or implement a vaguely sensible way to enter musical notes (as a dodgy workaround, I remapped the names of the notes in the scale to their positions on a QWERTY keyboard and prodded out a melody from those). Fortunately, I didn’t have to let a good routine go to waste: a week or so back, I encountered Sergi Mansilla’s jsmidi project, which provides an easy way to create standard MIDI files from Javascript. However, it turns out that MIDI support in browsers has more or less stagnated – while wave audio goes from strength to strength, MIDI is still stuck in the world of OS-specific plugins – so there was a clear opportunity for some Javascript synthesiser love there. And so I’ve come up with jasmid, a JS library for reading MIDI files and playing them back through its own audio synthesis engine.

A MIDI file consists of a simple header, followed by a list of tracks – each of which is a list of timestamped events, which are usually note on/off events but could be a tempo change, change of instrument or various other things. The timestamps are actually a bit weird: you’d expect them to be given in something like microseconds, but they’re actually expressed as a number of ‘ticks’, where the MIDI file specifies a particular number of ticks per beat, and the tempo of the song is given in beats per second, which can change over time just to add further confusion. Ultimately it probably does make sense, because it means you can accurately use any rational number (within reason) as a tempo, and that’s handy for professional MIDI equipment that has to keep exact time for extended periods – it’s just an initial hurdle that you have to get over. Once you’ve got that in place, a MIDI synthesiser boils down to a set of generator functions that can emit audio waves for as long as you tell them to, and a main loop which picks events off the queue, running the generators until it’s time to process the next event.

For this first release, the generated sounds are not particularly interesting – just plain sine waves with a bit of attack/decay volume control applied – but now that we’ve got the initial framework in place it should be relatively straightforward to add more diverse sounds, and the synth engine should hopefully be flexible enough to support effects like harmonics and reverb.

Finally, as a glimpse of what’s in store for in-browser audio creation in the future, keep an eye on the Mozilla Rainbow project, which provides APIs for capturing audio and video from microphones / webcams. For the last few years I’ve been taking part in February Album Writing Month, a song writing community where collaborations over the internet play a major part – perhaps it won’t be too long until we’re doing that over a Google-Docs-style online equivalent of Audacity / GarageBand…?

12 Responses to “jasmid – MIDI synthesis with Javascript and HTML5 audio”

  1. vermecou says:

    “This community originates from the days of the Commodore 64, when people cracked games and added little intro animations to promote themselves, which became more and more elaborate as rival groups tried to outdo each other, until they evolved into full-scale artistic creations and the game cracking side of things took a back seat. And among the artefacts to come out of this community is a hell of a lot of music – we’re talking hundreds of thousands of tracks, all preserved in the native formats of the Commodore 64, and the Amiga, and all sorts of other things. And it would be quite neat to be able to play all of these from within my website.”
    Where else can I read about this?

  2. Danielku15 says:

    Awesome start! I was looking for such a library for my online music notation renderer alphaTab (http://www.alphatab.net).

    Currently you are generating your Waves by your own. (SineGenerator). Do you plan to implement a WaveTable synthesis which allows you to create own soundbanks which will be used.

  3. matt says:

    Hi vermecou – sorry, just realised I never replied to this…

    demoscene.info is probably the best introduction to the topic, although it’s slightly out of date now. The easiest way to check out the productions themselves is through demoscene.tv and Capped, although purists would say that you need to run the original executables for the full experience (whether that’s on PC, C64 or Javascript).

  4. Detail, detail, but JSNES is emulating the NES, not the SNES.

  5. matt says:

    Doh. When I gave the talk I actually checked with Ben which one it was, and I could have sworn he said SNES. I guess I just have a memory like… one of those things with holes.

  6. [...] in the browserArtikel auf mudcu.be jasmid – MIDI synthesis with Javascript and HTML5 audio Artikel auf http://matt.west.co.tt, Download auf [...]

  7. [...] can make your own custom player. You can visualize audio. You can generate audio on the fly. And these are just some of the early [...]

  8. [...] too (which is available for download together with the pdf). Credits to Matt Westcott’s brilliant jasmad library, we can interpret and play a midi file in the browser simply with javascript without any real midi [...]

  9. [...] here Garmin communicator did some interesting this with communication Mozilla Audio Data API jasmid MIDI synthesis WebRTC getUserMedia Device element discussion Implementing device WebRTC Group LUFA Node.js and [...]

  10. I’ve tried downloading the current content of the git repo to my hard drive and tried running the code in the latest firefox, chrome and internet explorer. I heard… nothing. Since I probably have done something stupid, I though I would drop you a line with some request for help… What have I done wrong?

  11. [...] by Matt Westcott/Gasmid. It started with his fake plastic cubes demo, so if you follow his blog posts, you might get to know a little more about its history and the demoscene (which nvidia calls [...]

  12. freeforal ltousez says:

    it need an auto loop or is there already one?

Leave a Reply