Postvorta Mk II: Faster And With More Features!

The Spitfire Mk II was essentially the same as the original model, just with an upgraded Merlin engine. Today I've done something similar to Postvorta my "intelligent blog search engine".

Those of you who read the initial blog posting I did on Postvorta may remember that underneath it all Postvorta relies on GATE Mímir for indexing and search. Yesterday we (i.e. the GATE group at the University of Sheffield) released new versions of most of the software we develop, including Mímir 4. While I'm not heavily involved with the development of Mímir I do use it quite substantially at work and I've been slowly updating all the systems I'm involved with, including Postvorta, to use this new version. Not only is this new version of Mímir faster it also takes a slightly different approach in the way it handles search results which is more suited to Postvorta than the old approach. I've also added some extra code to Postvorta to cache more information locally. Together these changes have resulted in Postvorta returning results an awful lot faster than before. You will also notice that switching between pages of results is significantly faster than it was before. Of course all these changes are "under the hood" so just like with the Spitfire Mk II, Postvorta should look roughly the same but work much faster. There is one new feature though that is worth talking about: result visualization.

When you search a blog using Postvorta it returns a list of relevant documents ordered from most recent to oldest. Combined with the different ways you can search a blog this ordering is usually the most useful. In some cases, however, especially when a search returns lots of results, it can be difficult to hunt through the posts to find the one you are interested in. To help with this I've started to think about different ways of visualizing the results. Whilst I've had a number of ideas the first to make it as far as a working, stable implementation is a date distribution graph.

A date distribution graph (in this context anyway) is simply a vertical bar chart showing how the results are distributed by month. The graph, just like the results, works backwards so the most recent month is on the left. The bars of the graph can be clicked on to go to the result page containing the first result for that month. Essentially it allows you to jump to posts from a given month directly without having to page through lots of irrelevant results. Currently, depending upon the number of search results, the graph can take a moment or two to be produced, but this is done asynchronously to the normal page loading to allow you to see the actual results as soon as possible.

As always I'd be interested in knowing what you think of this new feature. You can play with it either by searching this blog (the search box is just over on the right), or your own blog.

3 comments:

  1. I've tried it out on my Heb in NZ blog and was a bit puzzled. I was not sure I was getting the correct post for the month selected and then I realised (I think this is correct) that the last post on the page shown is the earliest post and the most recent in the selection is the first post. Thus searching for 'sleep' and selecting the far right graph bar (in 2007) reveals the first post 'Midnight and Morning 02/03/08 and the last post is 'My First Night 02/11/07. Are my assumptions correct?

    It could be a useful addition to what is one of the most useful Blogging tools for me.

    ReplyDelete
    Replies
    1. What happens when you click on a bar in the graph is that it switches to the page within the search results that shows the most recent post from the month the bar represents. So when you searched for 'sleep' there are 37 results which are split across 4 pages of results. Clicking on the right most bar (which represents November 2007) means "take me to the first page within the results which contains a result from November 2007". In this case that is result number 34 which is on page 4.

      Essentially you could imagine your blo, and the Postvorta search reslts, as a diary in a book (being filled from the back as the posts are always presented newest first), the graph then equates to paper dividers letting you quickly jump to the last post made within a specific month.

      Hopefully that description is a little clearer but let me know if something still doesn't make sense or if you think the graph itself needs tweaking.

      Delete