Business as Usual
Since we covered inverted files in class this week, I decided to apply some of the material that we discussed to my blog.
This site’s search feature was already powered by an inverted index, but it had a pretty rudimentary way of determining which of the blog posts matched. That is to say, it would just check which of them contained all the words the user had entered.
The new version, however, uses the vector-based approach explained in Justin Zobel and Alistair Moffat’s Inverted Files for Text Search Engines; while you need an ACM account to access that document, a simple Google query may actually be useful in obtaining it. As of today, a single matching word is sufficient to get a result and the most relevant results (should) come first. I haven’t implemented result navigation yet, so you’ll only have access to the 10 most relevant results for now.
As usual, all of the PHP code is public. Specifically, you’ll want get_blog_posts_matching_words() in Pwnt_Controller_Blog_Posts. The underlying data manager code may be useful as well, but I wouldn’t recommend it.
All in all, while it’s still a fairly trivial implementation, it gets you pretty decent results within reasonable time and with just a handful of SQL queries. I’m quite pleased with what I accomplished there.
Yes, this is how I spend my Friday nights. You don’t expect to find me at, say, Love Everyone instead, do you?
