Episodes

25 July, 2011

FP-Stream-powered association rule mining over data streams with support for constraints

The last blog post I wrote about my master thesis was on June 1st. The final blog post has been long overdue. To the (very few) readers interested in the technical details, I apologize for the long delay in writing about the last part.
That last blog post was about FP-Growth. This one is about FP-Stream. Whereas FP-Growth can analyze static data sets for patterns, FP-Stream is capable of finding patterns over data streams. FP-Stream relies on the FP-Growth for significant parts, but it’s considerably more advanced. So, in essence, this phase only adds the capability to mine over a stream of data. While that may sound like it is not much, the added complexity of achieving this turns it into a fairly large undertaking.

1 June, 2011

FP-Growth-powered association rule mining with support for constraints

The previous blog post covering my master thesis was about the libraries I wrote for detecting browsers and locations: QBrowsCap and QGeoIP.
On the very day that was published, I reached the first implementation milestone, which implied that it was already finding causes of slow page loads, but not over exactly specified periods of time, but rather over each chunk of 4,000 lines that was read from an Episodes log file. To achieve this, an implementation of the FP-Growth algorithm was completed, which was then modified to add support for item constraints.

FP-Growth {#FP-Growth}

Thoroughly explaining the FP-Growth algorithm would lead us too far. Hence, I’ll include a brief explanation below. For details, I refer to the original paper, “Mining frequent patterns without candidate generation” by J. Han, J. Pei, Y. Yin and R. Mao which can easily be downloaded when searched for through Google Scholar.

1 March, 2011

QBrowsCap & QGeoIP: detecting browsers and locations

In December and January, I’ve continued working on my master thesis, while simultaneously preparing for my exams in January (which I passed without problems).
In a previous blog post, I had indicated that I ran into problems while parsing dates: Qt uses the system locale for this, but on Mac OS X there turned out to be a severe performance problem with that functionality. I solved that by developing QCachingLocale, which is a class that introduces a caching layer to prevent said performance degradations.

Further parsing {#further-parsing}

Now, parsing the date was of course only one tiny part of the problem: I also had to parse the episodes information embedded in each Episodes log file line (which is trivial), as well as map the IP address to a physical location and an ISP and map the user-agent string to a platform and actual browser.
Finally, we also want to map the episode duration to either duration:slow, duration:acceptable or duration:fast. This is called ‘discretization’: continuous values (in our case: durations) are mapped to discrete values.

31 December, 2010

Performance Calendar 2010: “WPO Analytics”

This year, Performance Planet did an advent calendar again, just like last year. I was also invited to write an article, and gladly accepted the invitation. I wrote about WPO Analytics, which is what my master thesis is about. It’s quite strange to see your name appear among the big names of Yahoo, Facebook and Google, but at the same time it’s reassuring that my efforts have not been in vain.
The following article is a 1:1 copy of my “WPO Analytics” article for the 2010 Performance Calendar.

Introduction

Web performance monitoring services such as Gomez, Keynote, Webmetrics, Pingdom, Webpagetest (which was also featured in last year’s web performance advent calendar) and recent newcomers such as Yottaa are all examples of synthetic performance monitoring (SPM) tools.

25 August, 2009

Improving Drupal: Episodes integration

In this article, I explain what was required to integrate the Episodes page loading performance monitoring system with Drupal.
Episodes was written by Steve Souders, whom is well-known for his research on high performance web sites and has authored multiple books on this subject.

The work I am doing as part of bachelor thesis on improving Drupal’s page loading performance should be practical, not theoretical. It should have a real-world impact.

To ensure that that also happens, I wrote the Episodes module. This module integrates the Episodes framework for timing web pages (see the “Episodes” section in my “Page loading profiling tools” article) with Drupal on several levels — all without modifying Drupal core:

24 August, 2009

Page loading profiling tools

In this article, seven distinctly different page loading profiling tools are compared: UA Profiler, Cuzillion, YSlow, Hammerhead, Apache JMeter, Gomez/Keynote/WebMetrics/Pingdom, Jiffy and Episodes. “Profiling” must be interpreted rather broadly: some of the tools cannot measure actual performance but are useful to gain insight in page loading performance characteristics.

If you can not measure it, you can not improve it.
— Lord Kelvin

The same applies to page loading performance: if you cannot measure it, you cannot know which parts have the biggest effect and thus deserve your focus. So before doing any real work, we will have to figure out which tools can help us analyzing page loading performance. “Profiling” turns out to be a more accurate description than “analyzing”:

15 March, 2009

Episodes: Drupal integration & ingestor

In my session at DrupalCon DC, I promised an initial version of the Episodes module by March 15, which is today. I’m glad to be able to announce that I somewhat met that goal.

If you don’t know what it is exactly, I encourage you to read the project description first.

Status {#status}

It’s not yet completely finished: the basic reporting UI must still be written. But you can already look at the results of each individual page through the Firebug add-on (which I didn’t write, it’s already available). See the first screenshot for that. That’s of course much less useful, but it gives you a clear indication of the potential.
However, before I do that, I first have to work on making other deadlines for other courses.
So what’s done already? Here’s an overview:

Episodes

Episodes

Tags

FP-Growth {#FP-Growth}

Tags

Further parsing {#further-parsing}

Tags

Introduction

Tags

Tags

Tags

Status {#status}

Tags