BIP! Finder - Relational Graph Learning for Grounded Video Description Generation

2020 • Relational Graph Learning for Grounded Video Description Generation

Authors: Zhang, Wenqiao, Wang, Xin Eric, Tang, Siliang, Shi, Haizhou, Haocheng, Xiao, Jun, Zhuang, Yueting, William Yang

Venue: Proceedings of the 28th ACM International Conference on Multimedia

Type: Publication

Abstract: Grounded video description (GVD) encourages captioning models to attend to appropriate video regions (e.g., objects) dynamically and generate a description. Such a setting can help explain the decisions of captioning models and prevents the model from hallucinating object words in its description. However, such design mainly focuses on object word generation and thus may ignore fine-grained information and suffer from missing visual concepts. Moreover, relational words (e.g., "jump left or right") are usual spatio-temporal inference results, i.... (read more)

Topics: Artificial intelligence

DOI: 10.1145/3394171.3413746 (Found 2 versions)

BIP! social metrics: 0 1
External links: Crossref OpenAIRE

BibTex PDF

Topic-specific impact indicators

Popularity: This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
Influence: This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
Citation Count: This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
Impulse: This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.