data mining

2 January, 2012

The time has finally come.

I’m looking for a job!

After Ā±5 years of hard work at Hasselt University, I will graduate as a master in computer science next month. I finished my master thesis and courses in June 2011 and have just completed my internship at Facebook a few weeks ago (on December 16). I’ve received an awesome job offer to work full-time at Facebook.

But my super-duper awesome girlfriend, Anneleen, is studying medicine here in Belgium. If she’d continue to study medicine in the U.S., she’d have to start all over, so that’s not really an option (not to mention the ridiculous costs). This summer, we’ll move in together in a (yet to be found) apartment in Leuven, Belgium.
Also, I just like Europe better than the United States.

I’ve already talked to several companies, months ago and more recently, but since there are so many interesting companies, projects and challenges out there, I decided to write this blog post.

My main interests (and areas of expertise) are:

  • WPO (Web Performance Optimization): making websites faster
  • Drupal
  • data mining

Want to talk to me? Contact me at http://wimleers.com/contact.

25 July, 2011

The last blog post I wrote about my master thesis was on June 1st. The final blog post has been long overdue. To the (very few) readers interested in the technical details, I apologize for the long delay in writing about the last part.
That last blog post was about FP-Growth. This one is about FP-Stream. Whereas FP-Growth can analyze static data sets for patterns, FP-Stream is capable of finding patterns over data streams. FP-Stream relies on the FP-Growth for significant parts, but it’s considerably more advanced. So, in essence, this phase only adds the capability to mine over a stream of data. While that may sound like it is not much, the added complexity of achieving this turns it into a fairly large undertaking.

1 June, 2011

The previous blog post covering my master thesis was about the libraries I wrote for detecting browsers and locations: QBrowsCap and QGeoIP.
On the very day that was published, I reached the first implementation milestone, which implied that it was already finding causes of slow page loads, but not over exactly specified periods of time, but rather over each chunk of 4,000 lines that was read from an Episodes log file. To achieve this, an implementation of the FP-Growth algorithm was completed, which was then modified to add support for item constraints.

FP-Growth {#FP-Growth}

Thoroughly explaining the FP-Growth algorithm would lead us too far. Hence, I’ll include a brief explanation below. For details, I refer to the original paper, ā€œMining frequent patterns without candidate generationā€ by J. Han, J. Pei, Y. Yin and R. Mao which can easily be downloaded when searched for through Google Scholar.