Page loading profiling tools

Published on 24 August, 2009

In this article, seven distinctly different page loading profiling tools are compared: UA Profiler, Cuzillion, YSlow, Hammerhead, Apache JMeter, Gomez/Keynote/WebMetrics/Pingdom, Jiffy and Episodes. “Profiling” must be interpreted rather broadly: some of the tools cannot measure actual performance but are useful to gain insight in page loading performance characteristics.


If you can not measure it, you can not improve it.
— Lord Kelvin

The same applies to page loading performance: if you cannot measure it, you cannot know which parts have the biggest effect and thus deserve your focus. So before doing any real work, we will have to figure out which tools can help us analyzing page loading performance. “Profiling” turns out to be a more accurate description than â€œanalyzing”:

In software engineering, performance analysis, more commonly today known as profiling, is the investigation of a program’s behavior using information gathered as the program executes. The usual goal of performance analysis is to determine which sections of a program to optimize — usually either to increase its speed or decrease its memory requirement (or sometimes both).

So a list of tools will be evaluated: UA Profiler, Cuzillion, YSlow, Hammerhead, Apache JMeter, Gomez/Keynote/WebMetrics/Pingdom and Jiffy/Episodes. From this fairly long list, the tools that will be used while improving Drupal’s page loading performance will be picked, based on two factors:

  1. How the tool could help improve Drupal core’s page loading performance.
  2. How the tool could help Drupal site owners to profile their site’s page loading performance.

1. UA Profiler {#ua-profiler}

UA Profiler is a crowd-sourced project for gathering browser performance characteristics (on the number of parallel connections, downloading scripts without blocking, caching, et cetera). The tests run automatically when you navigate to the test page from any browser — this is why it is powered by crowd sourcing.

It is a handy reference to find out which browser supports which features related to page loading performance.

2. Cuzillion {#cuzillion}

Cuzillion was introduced on April 25, 2008 so it is a relatively new tool. Its tag line, “‘cuz there are zillion pages to check” indicates what it is about: there are a lot of possible combinations of stylesheets, scripts and images. Plus they can be external or inline. And each combination has different effects. Finally, to further complicate the situation, all these combinations depend on the browser being used. It should be obvious that without Cuzillion, it is an insane job to figure out how each browser behaves:

Before I would open an editor and build some test pages. Firing up a packet sniffer I would load these pages in different browsers to diagnose what was going on. I was starting my research on advanced techniques for loading scripts without blocking and realized the number of test pages needed to cover all the permutations was in the hundreds. That was the birth of Cuzillion.

Cuzillion is not a tool that helps you analyze any existing web page. Instead, it allows you to analyze any combination of components. That means it is a learning tool. You could also look at it as a browser profiling tool instead of all other listed tools, which are page loading profiling tools.

Here is a simple example to achieve a better understanding. How does the following combination of components (in the tag) behave in different browsers?

  1. an image on domain 1 with a 2 second delay
  2. an inline script with a 2 second execution time
  3. an image on domain 1 with a 2 second delay

First you create this setup in Cuzillion (see the attached figure: “The example situation created in Cuzillion”). This generates a unique URL. You can then copy this URL to all browsers you would like to test.

As you can see, Safari and Firefox behave very differently. In Safari (see the attached figure: “The example situation in Safari 3”), the loading of the first image seems to be deferred until the inline script has been executed (the images are displayed when the light purple bars become dark purple). In Firefox (see the attached figure: “The example situation in Firefox 3”), the first image is immediately rendered and after a delay of 2 seconds — indeed the execution time of the inline script — the second image is rendered (the images are displayed when the gray bars stop). Without going into details about this, it should be clear that Cuzillion is a simple, yet powerful tool to learn about browser behavior, which can in turn help to improve the page loading performance.

3. YSlow {#yslow}

YSlow is a Firebug extension (see the attached figure: “YSlow applied to drupal.org”) that can be used to analyze page loading performance through thirteen rules. These were part of the original fourteen rules — of which there are now thirty-four — of “Exceptional Performance”, as developed by the Yahoo! performance team.

YSlow 1.0 can only evaluate these thirteen rules and has a hardcoded grading algorithm. You should also remember that YSlow just checks how well a web page implements these rules. It analyzes the content of your web page (and the headers that were sent with it). For example, it does not test the latency or speed of a CDN, it just checks if you are using one. As an example, because you have to tell YSlow (via Firefox’ about:config) what the domain name of your CDN is, you can even fool YSlow into thinking any site is using a CDN — tricking YSlow into thinking drupal.org is using a CDN is easy (see the attached figures: “The original YSlow analysis” and “The resulting YSlow analysis”).

That, and the fact that some of the rules it analyzes are only relevant to very big web sites. For example, one of the rules (#13, “Configure ETags”) is only relevant if you are using a cluster of web servers. For a more in-depth article on how to deal with YSlow’s evaluation of your web sites, see Jeff Atwood’s “YSlow: Yahoo’s Problems Are Not Your Problems”. YSlow 2.0 aims to be more extensible and customizable: it will allow for community contributions, or even web site specific rules.

Since only YSlow 1.0 is available at the time of writing, I will stick with that. It is a very powerful and helpful tool as it stands, it will just get better. But remember the two caveats: it only verifies rules (it does not measure real-world performance) and some of the rules may not be relevant for your web site.

4. Hammerhead {#hammerhead}

Hammerhead (see the attached figure: “A sample Hammerhead run”), announced in September 2008 is a Firebug extension that should be used while developing. It measures how long a page takes to load and it can load a page multiple times, to calculate the average and mean page load times. Of course, this is a lot less precise than real-world profiling, but it allows you to profile while you are working. It is far more effective to prevent page loading performance problems due to changes in code, because you have the test results within seconds or minutes after you have made these changes!

Of course, you could also use YSlow (see the YSlow section) or FasterFox, but then you have to load the page multiple times (i.e. hammer the server, this is where the name comes from). And you would still have to set up the separate testing conditions for each page load that Hammerhead already sets up for you: empty cache, primed cache and for the latter there are again two possible situations: disk cache and memory cache or just disk cache. Memory cache is of course faster than disk cache; that is also why that distinction is important. Finally, it supports exporting the resulting data into CSV format, so you could even create some tools to roughly track page loading performance throughout time.

5 Apache JMeter {#apache-jmeter}

Apache JMeter is an application designed to load test functional behavior and measure performance. In the perspective of profiling page loading performance, the relevant features are: loading of web pages with and without its components and measuring the response time of just the HTML or the HTML and all the components it references.

However, it has several severe limitations:

  • Because it only measures from one location — the location from where it is run, it does not give a good big picture.
  • It is not an actual browser, so it does not download components referenced from CSS or JS files.
  • Also because it is not an actual browser, it does not behave the same as browsers when it comes to parallel downloads.
  • It requires more setup than Hammerhead (see the Hammerhead section), so it is less likely that a developer will make JMeter part of his workflow.

It can be very useful in case you are doing performance testing (How long does the back-end need to generate certain pages?), load testing (how many concurrent users can the back-end/server setup handle?) and stress testing (how many concurrent users can it handle until errors ensue?).To learn more about load testing Drupal with Apache JMeter, see John Quinn’s “Load test your Drupal application scalability with Apache JMeter” article and part two of that article.

6 Gomez/Keynote/WebMetrics/Pingdom {#gomez-keynote-webmetrics-pingdom}

Gomez, KeyNote, WebMetrics and Pingdom are examples of third-party (paid) performance monitoring systems. They have four major disadvantages:

  1. limited number of measurement points
  2. no real-world browsers are used
  3. unsuited for Web 2.0
  4. paid & closed source

6.1 Limited number of measurement points {#limited-number-of-measurement-points}

These services poll your site at regular or irregular intervals. This poses analysis problems: for example, if one of your servers is very slow just at that one moment that any of these services requests a page, you will be told that there is a major issue with your site. But that is not necessarily true: it might be a fluke.

6.2 No real-world browsers {#no-real-world-browsers}

Most, if not all of these services use their own custom clients (as mentioned in Scott Ruthfield’s Jiffy presentation at Velocity 2008). That implies their results are not a representation of the real-world situation, which means you cannot rely upon these metrics for making decisions: what if a commonly used real-world browser behaves completely differently? Even if the services would all use real-world browsers, they would never reflect real-world performance, because each site has different visitors and therefor also a different mix of browsers.

6.3 Unsuited for Web 2.0 {#unsuited-for-web-2-0}

The problem with these services is that they still assume the World Wide Web is the same as it was 10 years ago, where JavaScript was rather a scarcity than the abundance it is today. They still interpret the onload event as the “end time” for response time measurements. In Web 1.0, that was fine. But as the adoption of AJAX has grown, the onload event has become less and less representative of when the page is ready (i.e. has completely loaded), because the page can continue to load additional components. For some web sites, the “above the fold” section of a web page has been optimized, thereby loading “heavier” content later, below the fold. Thus the “page ready” point in time is shifted from its default.

In both of these cases, the onload event is too optimistic, as explained in Steve Souder’s Episodes white paper.

There are two ways to measure Web 2.0 web sites (covered by the Episodes presentation):

  1. manual scripting: identify timing points using scripting tools (Selenium, Keynote’s KITE, et cetera). This approach has a long list of disadvantages: low accuracy, high switching costs, high maintenance costs, synthetic (no real-world measurements).
  2. programmatic scripting: timing points are marked by JavaScript (Jiffy, Gomez Script Recorder, et cetera). This is the preferred approach: it has lower maintenance costs and a higher accuracy because the code for timing is included in the other code and measures real user traffic.
    If we would now work on a shared implementation of this approach, then we would not have to reinvent the wheel every time and switching costs would be much lower. See the Jiffy/Episodes section later on.

6.4 Paid & closed source {#paid-and-closed-source}

The end user is dependent upon the third party service to implement new instrumentations and analyses. It is typical for closed source applications to only implement the most commonly asked feature and because of that, the end user may be left out in the cold. There is a high cost for the implementation and a also a very high cost when switching to a different third party service.

7 Jiffy/Episodes {#jiffy-episodes}

7.1 Jiffy {#jiffy}

Jiffy (presented at Velocity 2008 by Scott Ruthfield — alternatively, you can view the video of that presentation) is designed to give you real-world information on what is actually happening within browsers of users that are visiting your site. It shows you how long pages really take to load and how long events that happen while or after your page is loading really take. Especially when you do not control all the components of your web site (e.g. widgets of photo and music web sites, contextual ads or web analytics services), it is important that you can monitor their performance. It overcomes four major disadvantages that were listed previously:

  1. it can measure every page load if desired
  2. real-world browsers are used, because it is just JavaScript code that runs in the browser
  3. well-suited for Web 2.0, because you can configure it to measure anything
  4. open source

Jiffy consists of several components:

  • Jiffy.js: a library for measuring your pages and reporting measurements
  • Apache configuration: to receive and log measurements via a specific query string syntax
  • Ingestor: parse logs and store in a database (currently only supports Oracle XE)
  • Reporting toolset
  • Jiffy Firebug extension, (see the attached figure: “The Jiffy Firebug extension”)

Jiffy was built to be used by the WhitePages web site and has been running on that site. At more than 10 million page views per day, it should be clear that Jiffy can scale quite well. It has been released as an open source project, but at the time of writing, the last commit was on July 25, 2008. So it is a dead project.

7.2 Episodes {#episodes}

Episodes (also see the accompanying whitepaper) is very much like Jiffy. There are two differences:

  1. Episodes’ goal is to become an industry standard. This would imply that the aforementioned third party services (Gomez/Keynote/WebMetrics/Pingdom) would take advantage of the the instrumentations implemented through Episodes in their analyses.
  2. Most of the implementation is built into browsers (window.postMessage(), addEventListener()), which means there is less code that must be downloaded. (Note: the newest versions of browsers are necessary: Internet Explorer 8, Firefox 3, WebKit Nightlies and Opera 9.5. An additional backwards compatibility JavaScript file must be downloaded for older browsers.

Steve Souders outlines the goals and vision for Episodes succinctly in these two paragraphs:

The goal is to make Episodes the industrywide solution for measuring web page load times. This is possible because Episodes has benefits for all the stakeholders. Web developers only need to learn and deploy a single framework. Tool developers and web metrics service providers get more accurate timing information by relying on instrumentation inserted by the developer of the web page. Browser developers gain insight into what is happening in the web page by relying on the context relayed by Episodes.

Most importantly, users benefit by the adoption of Episodes. They get a browser that can better inform them of the web page’s status for Web 2.0 apps. Since Episodes is a lighter weight design than other instrumentation frameworks, users get faster pages. As Episodes makes it easier for web developers to shine a light on performance issues, the end result is an Internet experience that is faster for everyone.

A couple of things can be said about the current codebase of Episodes:

  • There are two JavaScript files: episodes.js and episodes-compat.js. The latter is loaded on-the-fly when an older browser is being used that does not support window.postMessage(). These files are operational but have not had wide testing yet.
  • It uses the same query string syntax as Jiffy uses to perform logging, which means Jiffy’s Apache configuration, ingestor and reporting toolset can be reused, at least partially.
  • It has its own Firebug extension (see the attached figure: “The Episodes Firebug extension”).

So, Episodes’ very raison d’existence is to achieve a consensus on a JavaScript-based page loading instrumentation toolset. It aims to become an industry standard and is maintained by Steve Souders, who is currently on Google’s payroll to work full-time on all things related to page loading performance (which suggests we might see integration with Google’s Analytics service in the future). Add in the fact that Jiffy has not been updated since its initial release, and it becomes clear that Episodes is the better long-term choice.

8 Conclusion {#conclusion}

There is not a single, “do-it-all” tool that you should use. Instead, you should wisely combine all of the above tools. Use the tool that fits the task at hand.

However, for the scope of this thesis, there is one tool that jumps out: YSlow. It allows you to carefully analyze which things Drupal could be doing better. It is not necessarily meaningful in real-world situations, because it e.g. only checks if you are using a CDN, not how fast that CDN is. But the fact that it tests whether a CDN is being used (or Expired headers, or gzipped components, or ñ€©) is enough to find out what can be improved, to maximize the potential performance.
This kind of analysis is exactly what I will perform in the next section.

There is one more tool that jumps out for real, practical use: Episodes. This tool, if properly integrated with Drupal, would be a key asset to Drupal, because it would enable web site owners to track the real-world page loading performance. It would allow module developers to support Episodes. This, in turn, would be a good indicator for a module’s quality and would allow the web site owner/administrator/developer to carefully analyze each aspect of his Drupal web site.
I have created this integration as part of my bachelor thesis, the Episodes module. More on this in a follow-up article.


This is a republished part of my bachelor thesis text, with thanks to Hasselt University for allowing me to republish it. This is section six in the full text.

It’s comparable with YSlow. It’s got slightly different rules and arguably more “in-depth” rules — by that I mean that they’re harder to implement. I’d recommend using both.

terrific summary, wim. if the whole thesis is like this, i think we’d benefit from reading the whole thing, if that is permitted. looking forward to your next articles. also, if any of this code belongs in devel package, i’m open to that with you as co-maintainer.

Thanks for the kind words :) I did try to make both the text and the code as useful as possible — I absolutely despise work that cannot actually be used in reality.

You can find my whole thesis text here. I’ve tried to write a useful text as a whole, but some parts are of course about the implementation details, including the new article on the Episodes implementation.

I’ve created an Episodes module, which I think should stay a separate module? If you’ve got reasons to merge it with Devel, I’m definitely open to that, but I’m not sure if it’s a good fit?

C

12 years 2 months ago

Might not be in scope, but
 do you have any suggestions as to what tool can i as a “real-world” observer (not active developer of the target pages) use to monitor the loading times and bottlenecks of pages off a heavily-used and relatively-public site like wikipedia? I’d like to find out what is going on before i either suggest specific global changes or create local rules in my browsers to handle the issues
 [Solutions spanning more than one browser (preferrably all of Fx/IE6/IE8) are very welcome ;-)]

I used to recommend Yottaa for that, but since a few months, they’ve unfortunately changed their service significantly, which makes it a lot less useful as a general-purpose tool. Though even then, it’d only monitor only specific pages.

It’s impossible to automatically monitor the real-world performance of every page of a website without having access to the generated HTML. The best you could do is build a spider that browses through a site randomly, and to run such a spider from many locations around the world.

Sorry, i guess i wasn’t very clear earlier. I’m not looking for tracking “every page”, but only what i open up myself; one statically-identified page at a time is quite fine. I’ve looked into Yottaa, but can’t see how it would help. What i need is something like Stephen W. Cote’s Function Monitor, but which wouldn’t require any change to the target page (FunMon needs step 4 “Add [SCRIPT SRC=”[path]FunMon2.js”][/SCRIPT] to the web page.”)

PS Somehow, the email notification uses a wrong counter. It tried to link to /comment/1981 (which doesn’t exist yet, so it just gives a 404), when it should have been 1704 (or 1687 for my own initial comment). Any idea what happened? :-)

Also, what happened to the post times?!? I’m quite sure i didn’t post my “Might not be in scope” question on 22 Nov 2012! More to the point, i’m quite sure your commenting system wouldn’t notify me on 2 Jun 2014 about a reply from 30 Dec 2012
 ^^