I've been researching one of those “odd project bits” that has been stuck in my head over the past few days. But, instead of filling up a notepad full of notes and sleeping on it, I decided to make a blog post about it instead.
Concept
The concept behind the whole thing is pretty easy. Take a general purpose PC with a video card in it, and tune it into the general-issue US government channel, Cspan. Next, start a stream rip of the CC stream.
Parse and tag the CC stream and load the data into a MySQL (or other, if there's a better choice out there) database. Finally, allow for full text searching and even email alerting on particular chunks of text as they pop up. So, if the House or Senate happen to go blithering off on something you happen to be concerned about (I.E. health care, taxes, etc), then you will get a transcript for that chunk of time to review as needed.
History
I recall something quite a bit like that a number of years ago on Slashdot, however, several searches have turned up nothing actionable. Since that particular project went under (from what I recall), I decided to bring it back in my own way by seeing how far I would be able to get on my own.
While there are companies which provide this service for a fee, there's no way that I would bring myself to pay for something that is as close as a video capture card and some clever coding.
Tools
The only real tool that I'm aware of would fit the bill would be CCExtractor. While I like the idea of coding up my own stream extractor in Python or some other language, all the hard work has already been done by this clever fellow. Gluing the extractor together with parsing and other information extracting capabilities would be entertaining enough as it is.
Design Requirements
- Capture CC information in 5 minute segments, strip and parse the data so that it is flat text.
- Inject the relevant text into a database along with link to video footage.
- Implement full text search engine on the captured information.
- Write up search and alerting capabilities to the information, including links to the relevant text as well as video footage.
Seems like an interesting project to me. My one and only PC is still downstairs at the moment, acting as a DVR to record my shows. Once I get a new TiVO on order and shipped in, then I will be able to swap it out and begin the project in earnest.
So, any input? Heard of any project quite like this before?
[update]
I did run across http://metavid.org while writing this bit. It's possible that this will do exactly what I want. Still kicking things down and researching.
Cheers,
tom
0 Comments.