Enabling Acoustic Audience Feedback in Large Virtual Events

0upvotes

By: Tamay Aykut, Markus Hofbauer, Christopher Kuhn, Eckehard Steinbach, Bernd Girod

The COVID-19 pandemic shifted many events in our daily lives into the virtual domain. While virtual conference systems provide an alternative to physical meetings, larger events require a muted audience to avoid an accumulation of background noise and distorted audio. However, performing artists strongly rely on the feedback of their audience. We propose a concept for a virtual audience framework which supports all participants with the amb... more

MultimediaOctober 30, 2023 12:40pm

Comments (0)
Views (369)

Automatic Edge Error Judgment in Figure Skating Using 3D Pose Estimation from a Monocular Camera and IMUs

0upvotes

By: Ryota Tanaka, Tomohiro Suzuki, Kazuya Takeda, Keisuke Fujii

Automatic evaluating systems are fundamental issues in sports technologies. In many sports, such as figure skating, automated evaluating methods based on pose estimation have been proposed. However, previous studies have evaluated skaters' skills in 2D analysis. In this paper, we propose an automatic edge error judgment system with a monocular smartphone camera and inertial sensors, which enable us to analyze 3D motions. Edge error is one o... more

MultimediaOctober 27, 2023 9:36am

Comments (0)
Views (375)

Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions

0upvotes

By: Jinzheng Zhao, Yong Xu, Xinyuan Qian, Davide Berghi, Peipei Wu, Meng Cui, Jianyuan Sun, Philip J. B. Jackson, Wenwu Wang

Audio-visual speaker tracking has drawn increasing attention over the past few years due to its academic values and wide application. Audio and visual modalities can provide complementary information for localization and tracking. With audio and visual information, the Bayesian-based filter can solve the problem of data association, audio-visual fusion and track management. In this paper, we conduct a comprehensive overview of audio-visual ... more

MultimediaOctober 24, 2023 1:45pm

Comments (0)
Views (356)

Intuitive Multilingual Audio-Visual Speech Recognition with a Single-Trained Model

0upvotes

By: Joanna Hong, Se Jin Park, Yong Man Ro

We present a novel approach to multilingual audio-visual speech recognition tasks by introducing a single model on a multilingual dataset. Motivated by a human cognitive system where humans can intuitively distinguish different languages without any conscious effort or guidance, we propose a model that can capture which language is given as an input speech by distinguishing the inherent similarities and differences between languages. To do ... more

MultimediaOctober 24, 2023 12:47pm

Comments (0)
Views (406)

Generating Robust Adversarial Examples against Online Social Networks (OSNs)

0upvotes

By: Jun Liu, Jiantao Zhou, Haiwei Wu, Weiwei Sun, Jinyu Tian

Online Social Networks (OSNs) have blossomed into prevailing transmission channels for images in the modern era. Adversarial examples (AEs) deliberately designed to mislead deep neural networks (DNNs) are found to be fragile against the inevitable lossy operations conducted by OSNs. As a result, the AEs would lose their attack capabilities after being transmitted over OSNs. In this work, we aim to design a new framework for generating robus... more

MultimediaOctober 20, 2023 12:45pm

Comments (0)
Views (381)

Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA

0upvotes

By: Sheng Zhou, Dan Guo, Jia Li, Xun Yang, Meng Wang

Text-based visual question answering (TextVQA) faces the significant challenge of avoiding redundant relational inference. To be specific, a large number of detected objects and optical character recognition (OCR) tokens result in rich visual relationships. Existing works take all visual relationships into account for answer prediction. However, there are three observations: (1) a single subject in the images can be easily detected as multi... more

MultimediaOctober 16, 2023 7:51am

Comments (0)
Views (380)

Interactive Interior Design Recommendation via Coarse-to-fine Multimodal Reinforcement Learning

0upvotes

By: He Zhang, Ying Sun, Weiyu Guo, Yafei Liu, Haonan Lu, Xiaodong Lin, Hui Xiong

Personalized interior decoration design often incurs high labor costs. Recent efforts in developing intelligent interior design systems have focused on generating textual requirement-based decoration designs while neglecting the problem of how to mine homeowner's hidden preferences and choose the proper initial design. To fill this gap, we propose an Interactive Interior Design Recommendation System (IIDRS) based on reinforcement learning (... more

MultimediaOctober 12, 2023 5:49am

Comments (0)
Views (390)

Encoding and Decoding Narratives: Datafication and Alternative Access Models for Audiovisual Archives

0upvotes

By: Yuchen Yang

Situated in the intersection of audiovisual archives, computational methods, and immersive interactions, this work probes the increasingly important accessibility issues from a two-fold approach. Firstly, the work proposes an ontological data model to handle complex descriptors (metadata, feature vectors, etc.) with regard to user interactions. Secondly, this work examines text-to-video retrieval from an implementation perspective by propos... more

MultimediaOctober 11, 2023 12:00pm

Comments (0)
Views (375)

Encoder-Decoder-Based Intra-Frame Block Partitioning Decision

0upvotes

By: Yucheng Jiang, Han Peng, Yan Song, Jie Yu, Peng Zhang, Songping Mai

The recursive intra-frame block partitioning decision process, a crucial component of the next-generation video coding standards, exerts significant influence over the encoding time. In this paper, we propose an encoder-decoder neural network (NN) to accelerate this process. Specifically, a CNN is utilized to compress the pixel data of the largest coding unit (LCU) into a fixed-length vector. Subsequently, a Transformer decoder is employed ... more

MultimediaOctober 11, 2023 11:28am

Comments (0)
Views (375)

Write What You Want: Applying Text-to-video Retrieval to Audiovisual Archives

0upvotes

By: Yuchen Yang

Audiovisual (AV) archives, as an essential reservoir of our cultural assets, are suffering from the issue of accessibility. The complex nature of the medium itself made processing and interaction an open challenge still in the field of computer vision, multimodal learning, and human-computer interaction, as well as in culture and heritage. In recent years, with the raising of video retrieval tasks, methods in retrieving video content with n... more

MultimediaOctober 10, 2023 2:19pm