Found an issue? Give us feedback

https://doi.org/10.4...arrow_drop_down

https://doi.org/10.48448/54sa-...

Audiovisual . 2021

Data sources: Datacite

Select content type to embed

All Research products

arrow_drop_down

<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>

COPY SCRIPT

For further information contact us at helpdesk@openaire.eu

Improving Generation and Evaluation of Visual Stories via Semantic Consistency

Name: Improving Generation and Evaluation of Visual Stories via Semantic Consistency
Keywords: Computer Science and Engineering, Artificial Intelligence, Intelligent System, Natural Language Processing

Research datakeyboard_double_arrow_right Audiovisual 01 Jan 2021Embargo end date: 25 May 2021Publisher:Underline Science Inc.

Authors: NAACL 2021 2021; Bansal, Mohit; Hannan, Darryl; Maharana, Adyasha;

doi: 10.48448/54sa-2079

Improving Generation and Evaluation of Visual Stories via Semantic Consistency

- Summary
- Subjects
- Related research
  (1)
- Metrics

Abstract

Read the paper on the folowing link: https://www.aclweb.org/anthology/2021.naacl-main.194/ Abstract: Story visualization is an underexplored task that falls at the intersection of many important research directions in both computer vision and natural language processing. In this task, given a series of natural language captions which compose a story, an agent must generate a sequence of images that correspond to the captions. Prior work has introduced recurrent generative models which outperform text-to-image synthesis models on this task. However, there is room for improvement of generated images in terms of visual quality, coherence and relevance. We present a number of improvements to prior modeling approaches, including (1) the addition of a dual learning framework that utilizes video captioning to reinforce the semantic alignment between the story and generated images, (2) a copy-transform mechanism for sequentially-consistent story visualization, and (3) MART-based transformers to model complex interactions between frames. We present ablation studies to demonstrate the effect of each of these techniques on the generative power of the model for both individual images as well as the entire narrative. Furthermore, due to the complexity and generative nature of the task, standard evaluation metrics do not accurately reflect performance. Therefore, we also provide an exploration of evaluation metrics for the model, focused on aspects of the generated frames such as the presence/quality of generated characters, the relevance to captions, and the diversity of the generated images. We also present correlation experiments of our proposed automated metrics with human evaluations.

Keywords

Computer Science and Engineering, Artificial Intelligence, Intelligent System, Natural Language Processing

1 Research products, page 1 of 1

Improving Generation and Evaluation of Visual Stories via Semantic Consistency
2021IsAmongTopNSimilarDocuments

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

BIP!

Found an issue? Give us feedback

Average

Related to Research communities

Digital Humanities and Cultural Heritage

Knowmad Institut