Midge: generating image descriptions from computer vision detections. This paper introduces a novel generation system that composes humanlike descriptions of images from computer vision detections. By leveraging syntactically informed word co-occurrence statistics, the generator filters and constrains the noisy detections output from a vision system to generate syntactic trees that detail what the computer vision system sees. Results show that the generation system outperforms state-of-the-art systems, automatically generating some of the most natural image descriptions to date.

Keywords for this software

Anything in here will be replaced on browsers that support the canvas element

References in zbMATH (referenced in 1 article )

Showing result 1 of 1.
Sorted by year (citations)

  1. Aishwarya Agrawal, Jiasen Lu, Stanislaw Antol, Margaret Mitchell, C. Lawrence Zitnick, Dhruv Batra, Devi Parikh: VQA: Visual Question Answering (2015) arXiv