Abstract: We present a novel method for the generation of automatic video summaries of academic presentations. We base our investigation on a corpus of multimodal academic conference presentations combining transcripts with paralinguistic multimodal features. We first generate summaries based on keywords by using transcripts created using automatic speech recognition (ASR). Start and end times for each spoken phrase are identified from the ASR transcript, then a value for each phrase created. Spoken phrases are then augmented by incorporating scores for ...
(read more)
Topics: 
Natural language processing
Artificial intelligence
Speech recognition