Training multimodal systems for classification with multiple objectives

Armitage, Jason; Thakur, Shramana; Tripathi, Rishi; Lehmann, Jens; Maleshkova, Maria

doi:10.48550/arXiv.2008.11450

Training multimodal systems for classification with multiple objectives

Publication date

2020-01-01

Document type

Conference paper

Author

Armitage, Jason

Thakur, Shramana

Tripathi, Rishi

Lehmann, Jens

Maleshkova, Maria

Organisational unit

Universität Bonn

DOI

10.48550/arXiv.2008.11450

URI

https://openhsu.ub.hsu-hh.de/handle/10.24405/15234

Scopus ID

2-s2.0-85091068389

ISSN

1613-0073

Conference

1st International Workshop on Cross-lingual Event-centric Open Analytics co-located with the 17th Extended Semantic Web Conference (ESWC 2020) Heraklion, Crete, Greece, June 3, 2020

Series or journal

CEUR Workshop Proceedings

Periodical volume

2611

Book title

CLEOPATRA 2020: cross-lingual event-centric open analytics : proceedings of the 1st International Workshop on Cross-lingual Event-centric Open Analytics

First page

1

Last page

15

Peer-reviewed

✅

Part of the university bibliography

Nein

Keyword

Machine Learning

Multimodal Data

Probabilistic Method

Abstract

We learn about the world from a diverse range of sensory information. Automated systems lack this ability as investigation has centred on processing information presented in a single form. Adapting architectures to learn from multiple modalities creates the potential to learn rich representations of the world - but current multimodal systems only deliver marginal improvements on unimodal approaches. Neural networks learn sampling noise during training with the result that performance on unseen data is degraded. This research introduces a second objective over the multimodal fusion process learned with variational inference. Regularisation methods are implemented in the inner training loop to control variance and the modular structure stabilises performance as additional neurons are added to layers. This framework is evaluated on a multilabel classification task with textual and visual inputs to demonstrate the potential for multiple objectives and probabilistic methods to lower variance and improve generalisation.

Version

Not applicable (or unknown)

Access Right

Metadata only access

Options

Training multimodal systems for classification with multiple objectives