Monday 30 July 2007

Things I Need to write down

Here a few things that I've read or come across that I need to get down before i forget.
  • Topic Detection is a problem of real time clustering (or near real time). Its purpose is to be able to develop clusters of stories in one pass given a small (possibly zero size) history of the cluster.
  • Topic detection and first story detection go hand in hand. In fact you can use almost identical algorithms with the only difference being you threshold for FSD.
  • It seems that the best results are obtained using mixture models. That is using several techniques in order to obtain a result.
  • The main trade offs that will occur in TDT is False Alarm Probability and Missed Probability. In order to get a good system we aim to try and minimise both. However they are generally trade offs. A system with low false alarm generally has a higher miss rate. The reason can be seen in the way they generalise.
  • Most of the clustering seems to use a vector space model, and then KNN clustering techniques.
  • A main issue that i need to look into is run time. That is ensure that the system works in an acceptable time limit, while at the same time doing its job.
  • I need to also ensure that the system is a technical RESEARCH PROJECT that has some interesting research as well as the system that is business orientated. This will become clearer after i type up the initial system approach, and include references. These will then make sure I'm heading in the right technical direction for the project.

No comments: