pwnt.be

Business as Usual

Since we covered inverted files in class this week, I decided to apply some of the material that we discussed to my blog.

This site’s search feature was already powered by an inverted index, but it had a pretty rudimentary way of determining which of the blog posts matched. That is to say, it would just check which of them contained all the words the user had entered.

The new version, however, uses the vector-based approach explained in Justin Zobel and Alistair Moffat’s Inverted Files for Text Search Engines; while you need an ACM account to access that document, a simple Google query may actually be useful in obtaining it. As of today, a single matching word is sufficient to get a result and the most relevant results (should) come first. I haven’t implemented result navigation yet, so you’ll only have access to the 10 most relevant results for now.

The World’s Greatest Detective
The World’s Greatest Detective by practicalowl
Some rights reserved

As usual, all of the PHP code is public. Specifically, you’ll want get_blog_posts_matching_words() in Pwnt_Controller_Blog_Posts. The underlying data manager code may be useful as well, but I wouldn’t recommend it.

All in all, while it’s still a fairly trivial implementation, it gets you pretty decent results within reasonable time and with just a handful of SQL queries. I’m quite pleased with what I accomplished there.

Yes, this is how I spend my Friday nights. You don’t expect to find me at, say, Love Everyone instead, do you?

Post a Comment

This contraption supports Gravatar, as well as Markdown with SmartyPants. If none of that made sense to you, feel free to ignore it and start typing.

  • :)
  • :D
  • ;)
  • :-O
  • :P
  • :@
  • :$
  • :S
  • :(
  • :'(
  • :|
  • :-#
  • 8-|
  • ^o)
  • :-*
  • +o(
  • :^)
  • *-)
  • 8-)
  • |-)
Disorientation
Continuity
Tangentiality
Retributions
The HTC Desire Kicks the HTC Legend’s Ass
Tom, Tim, Tom, Tim, Tim, Tom
Google Chrome Still Sucks
smeezekitty, Tim, milosh
Automating OpenVPN Connection on Windows XP
Tim, Geb, 12vpn, Tim, neecom
Bizar Hairdressing & Beyond
Ruxi, Wim, Tim, Sarina, Lies, Lynn, erwin, Ano, Frederick, Jacqueline, Wazaaa, Tim, Rebecca, Charlie
Pidgin to Adium Emoticon Theme Converter
Tim, peter
Colophonics