Monday, 30 July 2007

Developing the Ideas

In the EBUS5003 lab today Rafael said that the best way to move on with the project. His suggestion was that I pick a research paper from the TDT TRECs and try to emulate their results, and then extend the work to my data set and my application.

This seems like a good idea as then i will at least have a bench mark of where to go. The paper that I will use is the paper in 'Topic Detection and Tracking: Event Based Information Organization', from CMU. It is Chapter 5 (Multi-strategy learning for Topic Detection and Tracking Yang

The reason I choose this paper is that it covers all the areas that i will deal with, as well as having a quite detailed history of the way it has been constructed. The results and evaluation methodology is also well reported.

Overall the project i will be doing will concentrate on 3 tasks involved in TDT, all of which are described in detail. These tasks are:
  1. First Story Detection
  2. Topic Detection (Clustering)
  3. Tracking
My next plan is to obtain a copy of the TDT3 Corpus (which is already annotated as i do not want to spend too much time doing that). I've emailed Rafael about is as it seems you have to pay (allot) for it, which I don't really want to do.

I will also be obtaining a laptop from Dan tomorrow that i will be able to use for the duration of the project. This will be my working laptop where i will be able to use it at Vision Bytes. I will be going there more often from next week.

The full text of the article and the way that i will attack the problems will be up tomorrow.

Further I'm reinstalling Linux at home :-) For both EBUS COSC and Thesis, might even consider putting it on the laptop, but not sure. Think I'll keep this running Visa for the sanity at work.

