Waleed is in charge of the innovations team and also worked on the Story Segmentter which my system will need to talk to. Here is an overview of the meeting along with some technical descriptions and plan of attack.
The system that i will need to know about is in 4 distinct areas.
- Aquisition System: Takes the video stream, audio stream, and caption text and records it into the database. It does this line by line, so that the raw caption text is located within the database. They also save other details about the caption text which may be of use (but currently i think that it will not be useful).
- Database System: The database system is SQL server that simply is the repository.
- Story Segmenter: Here the stories that have been recorded can be further segmented (in the case of a news program). This is where all the meta data is also added. This meta data is added manually by a team of reporters that go through the story and add time in and time out stamps as well as titles and descriptions (the descriptions are somewhat automated).
- Web Interface (DCP): Displays the information from the database to the user.
- Title information (full text title)
- Timing information (when and how long it is)
- Captions (raw caption data)
- Program information (channel program name)
Further he said that allot of the information that i need is already in a Java object (Speech Object) which i can use to get an overall idea of the system and build a system that will be easier to integrate in the long run.
Early next week I'll be meeting with Waleed in order to get a copy of this object as well as a snap shot of their database, to ensure the data formats are correct and i can begin developing with their formats in mind.
The question of when this will run was also raised. Several options were discussed but the most favourable one was the use of a message or queue bean to tell my system that a new story has been completed please run you classifier over it. Also many simply run it hourly or so. Also it may be beneficial for them to run it on all their existing data, and so computational cost in the Java is important and needs to be investigated further.
From here my aim my Monday is to get something in Java that uses beans to comm that will be able to read information and then populate some objects and then run simple statistical workings on these objects. This will make sure i get the beans back in my head, and also get some initial stats about the data that i will be using.