Tuesday, 15 May 2007

meeting with rafael

I met with Rafael today to get the project confirmation signed (still need to hand in though). I also got allot of information about the way that i need to approach the topic, the summary is below.

The name of the topic seems to be almost finalised (yay cause now i can tell people what I'm doing)

"TV News Topic Detection and Tracking"

I also met Jorge a PhD student that is also doing some text mining but in the realm of e - learning, which may be useful later.

Anyways the topic for the project itself is a TDT (Topic Detection and Tracking) topic. So there are two things that I'll need to look into over the next few weeks are:
  • Clustering of data (i.e. placing text documents into n - dimensional vector spaces)
  • The detection of new and old stories.
So we have a clustering problem as shown below:
Then after this we also consider it as a time evolving problem and see how news stories come over time. We notice that there is allot of information in the first three days and then not much information after.

Some other points that Rafael suggested are:
So i have some direction now! Will read up over the next week or two about clustering and TDT in literature and update the blog on the topics.

