February 2012

Volume 27 Number 02

Building HTML5 Applications - Practical Cross-Browser HTML5 Audio and Video

By John Dyer | February 2012

When the HTML5 audio and video tags were first introduced, codec and browser incompatibilities made them difficult to use and unrealistic to deploy on large-scale Web sites. The tags were great for companies writing experimental code or doing cross-browser media development, but the HTML5 media API was too unreliable for general use.

Today, things are different. Browsers and JavaScript libraries have matured to the point where you can—and should—use HTML5 media as the default for any projects that will display audio and video content. Even retrofitting existing Flash and Silverlight video content for HTML5 playback has become fairly simple. In this article, I’ll explore the benefits of using HTML5 for media playback, show some sample code, address some major issues that developers face and present solutions to those problems.

Benefits of HTML5 Media

The advantage of using HTML5 for media is that you can leverage your HTML, CSS and JavaScript skills rather than learning Flash or Silverlight. If you can create buttons in HTML and control them with JavaScript, you already know all you need to develop HTML5 media. HTML5 media has built-in support for captions and subtitles using the new track element, and proposals for additional features—such as direct byte access for video manipulation—are already being considered.

Moreover, media that uses HTML5 video and audio performs better than media played through plug-ins such as Flash or Silverlight, resulting in longer battery life and smoother playback. In addition, mobile devices running Windows Phone 7.5, Apple iOS and Android (as well as the Metro-style browser in Windows 8) support media playback only through HTML5 video and audio. Some Android devices include Flash, but Adobe has recently discontinued its mobile Flash efforts, which means that HTML5 will be the only way to play media on mobile devices in the future.

Simple HTML5 Playback and Controls

In the days of using Flash or Silverlight for video playback, the simplest possible markup to display a video (video.mp4 in this case) would have looked something like this:

<object width="640" height="360" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" codebase="https://fpdownload.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=8,0,0,0">
  <param name="src" value="player.swf?file=video.mp4">
  <embed src="player.swf?file=video.mp4" width="640"
    height="360"></embed>
</object>

Compare those nested object, param and embed tags with this HTML5 video tag used to play the same h.264-encoded video:

<video src="video.mp4" controls></video>

It’s much simpler—just plain HTML that needs very little explanation. When the browser has downloaded enough of a video to determine its native height and width, it resizes the video accordingly. But, just as with img tags, it’s always best to specify the height and width attributes so that the page doesn’t need to reflow. You can also use the style attribute to specify px or % width and height values (I’ll use both in the examples that follow).

The one attribute I added is controls. This tells the browser to display its own built-in playback control bar, which usually includes a play/pause toggle, a progress indicator and volume controls. Controls is a Boolean attribute, which means it doesn’t need to have a value assigned to it. For a more XHTML-like syntax you could write controls="controls", but the browser always considers controls to be false if it’s not present and true if it is present or assigned a value.

HTML5 Media Attributes and Child Source Tags

The audio and video elements introduce several new attributes that determine how the browser will present the media content to the end user.

  • src specifies a single media file for playback (for multiple sources with different codecs, please see the discussion below).
  • poster is a URL to an image that will be displayed before a user presses Play (video only).
  • preload determines how and when the browser will load the media file using three possible values: none means the video will not download until the user initiates playback; metadata tells the browser to download just enough data to determine the height, width and duration of the media; auto lets the browser decide how much of the video to start downloading before the user initiates playback. 
  • autoplay is a Boolean attribute used to start a video as soon as the page loads (mobile devices often ignore this to preserve bandwidth).
  • loop is a Boolean attribute that causes a video to start over when it reaches the end.
  • muted is a Boolean attribute specifying whether the video should start muted.
  • controls is a Boolean attribute indicating whether the browser should display its own controls.
  • width and height tell a video to display at a certain size (video only; can’t be a percentage).

Timed Text Tracks

Browsers are also beginning to implement the track element, which provides subtitles, closed captions, translations and commentaries to videos. Here’s a video element with a child track element:

<video id="video1" width="640" height="360" preload="none" controls>
  <track src="subtitles.vtt" srclang="en" kind="subtitles" label="English subtitles">
</video>

In this example, I’ve used four of the track element’s five possible attributes:

  • src is a link to either a Web Video Timed Text (WebVTT) file or a Timed Text Markup Language (TTML) file.
  • srclang is the language of the TTML file (such as en, es or ar).
  • kind indicates the type of text content: subtitles, captions, descriptions, chapters or metadata.
  • label holds the text displayed to a user choosing a track.
  • default is a Boolean attribute that determines the startup track element.

WebVTT is a simple text-based format that begins with a single-line declaration (WEBVTT FILE) and then lists start and end times separated by the --> characters, followed by the text to display between the two times. Here’s a simple WebVTT file that will display two lines of text at two different time intervals:

WEBVTT FILE

00:00:02.5 --> 00:00:05.1
This is the first line of text to display.

00:00:09.1 --> 00:00:12.7
This line will appear later in the video.

As of this writing, only Internet Explorer 10 Platform Preview and Chrome 19 support the track element, but other browsers are expected to do so soon. Some of the JavaScript libraries I discuss later add support for the track element to browsers that have not yet implemented it, but there’s also a standalone track library called captionator.js (captionatorjs.com) that parses track tags, reads WebVTT and TTML (as well as SRT and YouTube SBV) files and provides a UI for browsers that don’t yet have native support.

Scripting HTML5 Media

Earlier, I used the controls attribute to tell the browser to display its native controls on top of the video or audio tags. The problem with this is that each browser has a different set of controls that are unlikely to match your Web site’s design. To create a consistent experience, you can remove the browser’s controls and then add custom buttons to the page that you control with JavaScript. You can also add event listeners to track the state of the video or audio playback. In the following example, I’ve removed the controls attribute and added markup underneath the video to serve as a basic control bar:

<video id="video1" style="width:640px; height:360px" src="video.mp4"> </video> 
<div>
  <input type="button" id="video1-play" value="Play" />
  <input type="button" id="video1-mute" value="Mute" />
  <span id="video1-current">00:00</span>
  <span id="video1-duration">00:00</span>
</div>

The simple JavaScript in Figure 1 will control video playback and show the current time in the video, and will create the complete working player depicted in Figure 2(rendered in Internet Explorer 9). (Note that in HTML5, the type="text/javascript" attribute is not required on the script tag.)

Figure 1 Controlling Video Playback

<script>
// Wrap the code in a function to protect the namespace
(function() {
// Find the DOM objects
var  video = document.getElementById("video1"),
  playBtn = document.getElementById("video1-play"),
  muteBtn = document.getElementById("video1-mute"),
  current = document.getElementById("video1-current"),
duration = document.getElementById("video1-duration");

// Toggle the play/pause state
playBtn.addEventListener("click", function() {
  if (video.paused || video.ended) {
    video.play();
    playBtn.value = "Pause";
  } else {
    video.pause();
    playBtn.value = "Play";
  }
}, false);

// Toggle the mute state
muteBtn.addEventListener("click", function() {
  if (video.muted) {
    video.muted = false;
    muteBtn.value = "Mute";
  } else {
    video.muted = true;
    muteBtn.value = "Unmute";
  }
}, false);

// Show the duration when it becomes available
video.addEventListener("loadedmetadata", function() {
  duration.innerHTML = formatTime(video.duration);
}, false);

// Update the current time
video.addEventListener("timeupdate", function() {
  current.innerHTML = formatTime(video.currentTime);
}, false);

function formatTime(time) {
  var 
    minutes = Math.floor(time / 60) % 60,
    seconds = Math.floor(time % 60);

  return   (minutes < 10 ? '0' + minutes : minutes) + ':' +
           (seconds < 10 ? '0' + seconds : seconds);
}

})();

A Working Video Player That Shows the Time
Figure 2 A Working Video Player That Shows the Time

The code in Figure 1 introduces the play and pause methods, the timeupdate and loadedmetadata events, and the paused, ended, currentTime and duration properties. The full HTML5 media API (https://www.w3.org/TR/html-markup/video.html) includes much more that can be used to build a full-fledged media player. In addition to the HTML5 media tag attributes listed earlier, HTML5 media objects have other properties accessible only via JavaScript:

  • currentSrc describes the media file the browser is currently playing when source tags are used.
  • videoHeight and videoWidth indicate the native dimensions of the video.
  • volume specifies a value between 0 and 1 to indicate the volume. (Mobile devices ignore this in favor of hardware volume controls.)
  • currentTime indicates the current playback position in seconds.
  • duration is the total time in seconds of the media file.
  • buffered is an array indicating what portions of the media file have been downloaded.
  • playbackRate is the speed at which the video is played back (default: 1). Change this number to go faster (1.5) or slower (0.5).
  • ended indicates whether the video has reached the end.
  • paused is always true at startup and then false once the video has started playing.
  • seeking indicates the browser is trying to download and move to a new position.

HTML5 media objects also include the following methods for scripting:

  • play attempts to load and play the video.
  • pause halts a currently playing video.
  • canPlayType(type) detects which codecs a browser supports. If you send a type such as video/mp4, the browser will answer with probably, maybe, no or a blank string.
  • load is called to load the new video if you change the src attribute.

The HTML5 media spec provides 21 events; here are some of the most common ones:

  • loadedmetadata fires when the duration and dimensions are known.
  • loadeddata fires when the browser can play at the current position.
  • play starts the video when the video is no longer paused or ended.
  • playing fires when playback has started after pausing, buffering or seeking
  • pause halts the video.
  • ended fires when the end of the video is reached.
  • progress indicates more of the media file has been downloaded.
  • seeking is true when the browser has started seeking.
  • seeked is false when the browser has finished seeking.
  • timeupdate fires as the media resource is playing.
  • volumechange fires when muted or volume properties have changed.

These properties, methods and events are powerful tools for presenting users with a rich media experience, all driven by HTML, CSS and JavaScript. In the basic example in Figure 1, I first create variables for all of the elements on the page:

// Find the DOM objects
var video = document.getElementById("video1"),
  playBtn = document.getElementById("video1-play"),
  muteBtn = document.getElementById("video1-mute"),
  current = document.getElementById("video1-current"),
  duration = document.getElementById("video1-duration");

Then I add a click event to my buttons to control media playback. Here I toggle the play and pause state of the video and change the label on the button:

// Toggle the play/pause state
playBtn.addEventListener("click", function() {
  if (video.paused || video.ended) {
      video.play();
      playBtn.value = "Pause";
  } else {
      video.pause();
      playBtn.value = "Play";
  }
}, false);

Finally, I add events to the media object to track its current state. Here, I listen for the timeupdate event and update the control bar to the current time of the playhead, formatting the seconds to a minutes:seconds style:

// Update the current time
video.addEventListener("timeupdate", function() {
  current.innerHTML = formatTime(media.currentTime);
}, false);

Issues with HTML5 Media

Unfortunately, getting HTML5 media to work across all browsers and devices is not quite as simple as in my example. I’ve already mentioned that not all browsers support the track element, and now I’ll address three additional issues that you encounter when using the audio and video tags, along with solutions to overcome them. At the end of the article, I’ll introduce some JavaScript libraries that wrap all of these solutions into single, easily deployable packages.

HTML5 Audio and Video Codec Support The first issue you face when developing with HTML5 media is the inconsistent support for video and audio codecs. My examples work in Internet Explorer 9 and later, Chrome and Safari, but they won’t work in Firefox or Opera because although those browsers support the HTML5 video tag, they don’t support the h.264 codec. Due to copyright concerns, browser vendors have split into two codec camps, and that brings us to the familiar HTML5 Media chart in Figure 3, showing which codecs work with which browsers.

Figure 3 Codec Support in Various Browsers

Video IE8 IE9+ Chrome Safari Mobile Firefox Opera
MP4 (h.264/AAC) no yes yes yes yes no no
WebM (VP8/Vorbis) no install yes no no yes yes

 

Internet Explorer 9+, Safari, Chrome and mobile devices (iPhone, iPad, Android 2.1+ and Windows Phone 7.5+) all support the h.264 video codec, which is usually placed in an MP4 container. Firefox and Opera, in contrast, support the VP8 video codec, which is placed inside the WebM container. Chrome also supports WebM, and has pledged to remove h.264 support at some point. Internet Explorer 9+ can render WebM if the codec has been installed by the end user. Finally, Firefox, Opera and Chrome also support the Theora codec placed inside an Ogg container, but this has been largely phased out in favor of WebM (unless you need to support Firefox 3.x), so I’ve left it out of the chart and examples for simplicity.

For audio, the browser vendors are again split into two camps, with the first group (Internet Explorer 9, Chrome and Safari) supporting the familiar MP3 format and the second group (Firefox and Opera) supporting the Vorbis codec inside an Ogg container. Many browsers can also play the WAV file format. See Figure 4.

Figure 4 Audio Support in Various Browsers

Audio IE8 IE9+ Chrome Safari Mobile Firefox Opera
MP3 no yes yes yes yes no no
Ogg Theora no install yes no no yes yes
WAV no no maybe yes yes yes yes

 

To deal with these differences, the video and audio tags support multiple child source tags, which lets browsers choose a media file they can play. Each source element has two attributes:

  • src specifies a URL for a media file.
  • type specifies the mimetype and optionally the specific codec of the video.

To offer both h.264 and VP8 video codecs, you’d use the following markup:

<video id="video1" width="640" height="360">
  <source src="video.mp4" type="video/mp4">
  <source src="video.webm" type="video/webm">
</video>

Note that earlier builds of iOS and Android need the MP4 file to be listed first.

This markup will work on all modern browsers. The JavaScript code will control whichever video the browser decides it can play. For audio, the markup looks like this:

<audio id="audio1">
  <source src="audio.mp3" type="audio/mp3">
  <source src="audio.oga" type="audio/oga">
</audio>

If you’re hosting audio or video content on your own server, you must have the correct MIME type for each media file or many HTML5-ready browsers (such as Internet Explorer and Firefox) will not play the media. To add MIME types in IIS 7, go to the Features View, double-click MIME Types, click the Add button in the Actions pane, add the extension (mp4) and MIME type (video/mp4), and then press OK. Then do the same for the other types (webm and video/webm) you plan to use.

Supporting Older Browsers Including two media files (such as MP4 and WebM for video) makes HTML5 media work in all modern browsers. But when older browsers (such as Internet Explorer 8) encounter the video tag, they can’t display the video. They will, however, render the HTML put between the opening <video> and closing </video> tags. The following example includes a message urging users to get a newer browser: 

<video id="video1" width="640" height="360" >
  <source src="video.mp4" type="video/mp4">
  <source src="video.webm" type="video/webm">
  <p>Please update your browser</p>
</video>

To allow visitors with non-HTML5-ready browsers to play the video, you can provide an alternative with Flash embedded that plays the same MP4 you supply for Internet Explorer 9, Safari and Chrome, as shown in Figure 5.

Figure 5 Video Playback with Flash

<video id="video1" width="640" height="360" >
  <source src="video.mp4" type="video/mp4">
  <source src="video.webm" type="video/webm">
  <object width="640" height="360" classid="clsid:
    d27cdb6e-ae6d-11cf-96b8-444553540000" codebase=
      "https://fpdownload.macromedia.com/pub/
      shockwave/cabs/flash/swflash.cab#version=8,0,0,0">
        <param name="SRC" value="player.swf?file=video.mp4">
        <embed src="player.swf?file=video.mp4" width="640"
          height="360"></embed> 
        <p>Please update your browser or install Flash</p>
  </object>
</video>

This markup presents all browsers with some way to play back video. Browsers with neither HTML5 nor Flash will see a message urging them to upgrade. For more information on how and why this nested markup works, see Kroc Camen’s “Video for Everybody” (camendesign.com/code/video_for_everybody).

This approach has some drawbacks, however. First, there’s a lot of markup to maintain. Second, you have to encode and store at least two media files. And third, any HTML/JavaScript controls you add to the page will not work with the embedded Flash player. Later, I’ll suggest several JavaScript libraries that can help you overcome these issues, but first, let’s address one final issue.

Full-Screen Support Flash and Silverlight both include a full-screen mode that lets users watch video and other content on their entire screen. You can implement this feature by creating a simple button and tying it to an ActionScript (for Flash) or C# (for Silverlight) full-screen command.

Today’s browsers have a similar full-screen mode that users can trigger with a keyboard or menu command (often F11 or Ctrl+F). But until recently, no equivalent JavaScript API allowed developers to initiate full-screen mode from a button on a page. This meant that HTML5 video could be displayed only in a “full window” that filled the browser window but not the entire screen.

In late 2011, Safari, Chrome and Firefox added support for the W3C proposed FullScreen API, which offers capabilities similar to those in Flash and Silverlight. The Opera team is currently working on implementing it, but the Internet Explorer team has, as of this writing, not yet decided whether it will implement the API. The Metro-style browser in Windows 8 will be full screen by default, but desktop Internet Explorer users will need to enter full-screen mode manually using menu options or the F11 key.

To enter full-screen mode in browsers that support it, you call the requestFullscreen method on the element to be displayed full screen. The command to exit full screen is called on the document object: document.exitFullscreen method. The W3C proposal is still a work in progress, so I won’t go into more detail here, but I am tracking the state of the API on my blog: bit.ly/zlgxUA

HTML5 Video and Audio JavaScript Libraries

A number of developers have created JavaScript libraries that make HTML5 video and audio easier by integrating all of the relevant code into a single package. Some of the best open source libraries are MediaElement.js, jPlayer, VideoJS, Projekktor, Playr and LeanBack. You’ll find a complete list with feature comparison at praegnanz.de/html5video.

All you need to do is provide a video or audio tag and the library will automatically build a set of custom controls, as well as insert a Flash player for browsers that don’t support HTML5 Media. The only problem is that the Flash players many libraries insert don’t always look or function like the HTML5 player. This means that any HTML5 events you add won’t work with the Flash player and any custom CSS you use won’t be visible, either.

In my own work, I was asked to create an HTML5 video player with synchronized slides and transcripts (see online.dts.edu/player for a demo). We had an existing library of more than 3,000 h.264 video files and it was deemed unfeasible to transcode them to WebM for Firefox and Opera. We also needed to support older browsers such as Internet Explorer 8, but a separate Flash fallback wouldn’t work because it wouldn’t respond to events for the slides and transcripts.

To overcome these difficulties, I created one of the players mentioned previously called MediaElement.js (mediaelementjs.com). It’s an open source (MIT/GLPv2) JavaScript library that includes special Flash and Silverlight players that mimic the HTML5 API. Instead of a totally separate Flash player, MediaElement.js uses Flash only to render video and then wraps the video with a JavaScript object that looks just like the HTML5 API. This effectively upgrades all browsers so they can use the video tag and additional codecs not natively supported. To start the player with a single h.264 file using jQuery, you need only the following code:

<video id="video1" width="640" height="360" src="video.mp4" controls></video>
<script>
jQuery(document).ready(function() {
  $("video1").mediaelementplayer();
});
</script>

For browsers that don’t support the video tag (like Internet Explorer 8) or those that don’t support h.264 (Firefox and Opera), MediaElement.js will insert the Flash (or Silverlight, depending on what the user has installed) shim to “upgrade” those browsers so they gain the HTML5 media properties and events I’ve covered.

For audio support, you can use a single MP3 file:

<audio id="audio1" src="audio.mp3" controls></audio>

Alternatively, you could include a single Ogg/Vorbis file:

<audio id="audio1" src="audio.oga" controls></audio>

Again, for browsers that don’t support the audio tag (Internet Explorer 8) or those that don’t support Ogg/Vorbis (Internet Explorer 9+ and Safari), MediaElement.js will insert a shim to “upgrade” those browsers so they all function as if they supported the codec natively. (Note: Ogg/Vorbis will not be playable on mobile devices.)

MediaElement.js also includes support for the track element, as well as native full-screen mode for browsers that have implemented the JavaScript API. You can add your own HTML5 events or track properties and it will work in every browser and mobile device.

I hope I’ve shown you that despite a few quirks, the HTML5 video and audio elements, especially when paired with the excellent libraries I’ve suggested, are easy to add to existing Web sites and should be the default for any new projects.


John Dyer is the director of Web Development for the Dallas Theological Seminary (dts.edu). He blogs at j.hn.

Thanks to the following technical experts for reviewing this article: John Hrvatin and Brandon Satrom