arxiv:2211.12981

Improving Visual-textual Sentiment Analysis by Fusing Expert Features

Published on Nov 23, 2022

Authors:

Hanjia Lyu ,

Abstract

Visual-textual sentiment analysis aims to predict sentiment with the input of a pair of image and text. The main challenge of visual-textual sentiment analysis is how to learn effective visual features for sentiment prediction since input images are often very diverse. To address this challenge, we propose a new method that improves visual-textual sentiment analysis by introducing powerful expert visual features. The proposed method consists of four parts: (1) a visual-textual branch to learn features directly from data for sentiment analysis, (2) a visual expert branch with a set of pre-trained "expert" encoders to extract effective visual features, (3) a CLIP branch to implicitly model visual-textual correspondence, and (4) a multimodal feature fusion network based on either BERT or MLP to fuse multimodal features and make sentiment prediction. Extensive experiments on three datasets show that our method produces better visual-textual sentiment analysis performance than existing methods.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2211.12981 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2211.12981 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2211.12981 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.