UCSC-SOE-23-04: Benchmarking Image Generators on Open-Vocabulary Scene Graphs

Brigit Schroeder, Adam Smith
03/24/2023 04:24 PM
Computational Media
Prompt-driven image generation systems often face common problems such as missing objects, missing attributes, and blended objects. Scene graphs, which explicitly represent the relationships between objects and their attributes, hold potential to address these challenges due to their structured nature. However, previous work in scene graph to image generation relied on closed vocabularies, where having a small fixed vocabulary limited the flexibility and richness of the image generators . To overcome this limitation, we propose the idea of open vocabularies scene graphs (OVSGs) to capture the expressive power of free-form text while describing scene structures directly. We introduce new evaluation methods to better understand how existing generators fail on OVSGs, using both qualitative coding and a visual-question-and-answering (VQA) quiz to capture common failure scenarios in OVSG image generation (OVSG2IM). We find that all of the systems we evaluated (after adapting them to take OVSG inputs) demonstrate frequent flaws associated with not expressing details given explicitly in their graph inputs. However, existing image generators still struggle with OVSGs, indicating that there is room for improvement for future OVSG2IM systems.