Music Augmentation and Denoising For Peak-Based Audio Fingerprinting

0upvotes

By: Kamil Akesbi, Dorian Desblancs, Benjamin Martin

Audio fingerprinting is a well-established solution for song identification from short recording excerpts. Popular methods rely on the extraction of sparse representations, generally spectral peaks, and have proven to be accurate, fast, and scalable to large collections. However, real-world applications of audio identification often happen in noisy environments, which can cause these systems to fail. In this work, we tackle this problem by ... more

SoundOctober 23, 2023 9:45am

Comments (0)
Views (215)

Two-Stage Triplet Loss Training with Curriculum Augmentation for Audio-Visual Retrieval

0upvotes

By: Donghuo Zeng, Kazushi Ikeda

The cross-modal retrieval model leverages the potential of triple loss optimization to learn robust embedding spaces. However, existing methods often train these models in a singular pass, overlooking the distinction between semi-hard and hard triples in the optimization process. The oversight of not distinguishing between semi-hard and hard triples leads to suboptimal model performance. In this paper, we introduce a novel approach rooted i... more

SoundOctober 23, 2023 9:27am

Comments (0)
Views (193)

Energy-Based Models For Speech Synthesis

0upvotes

By: Wanli Sun, Zehai Tu, Anton Ragni

Recently there has been a lot of interest in non-autoregressive (non-AR) models for speech synthesis, such as FastSpeech 2 and diffusion models. Unlike AR models, these models do not have autoregressive dependencies among outputs which makes inference efficient. This paper expands the range of available non-AR models with another member called energy-based models (EBMs). The paper describes how noise contrastive estimation, which relies on ... more

SoundOctober 20, 2023 12:37pm

Comments (0)
Views (246)

EmoDiarize: Speaker Diarization and Emotion Identification from Speech Signals using Convolutional Neural Networks

0upvotes

By: Hanan Hamza, Fiza Gafoor, Fathima Sithara, Gayathri Anil, V. S. Anoop

In the era of advanced artificial intelligence and human-computer interaction, identifying emotions in spoken language is paramount. This research explores the integration of deep learning techniques in speech emotion recognition, offering a comprehensive solution to the challenges associated with speaker diarization and emotion identification. It introduces a framework that combines a pre-existing speaker diarization pipeline and an emotio... more

SoundOctober 20, 2023 12:14pm

Comments (0)
Views (248)

Uncertainty Quantification of Bandgaps in Acoustic Metamaterials with Stochastic Geometric Defects and Material Properties

0upvotes

By: Han Zhang, Rayehe Karimi Mahabadi, Cynthia Rudin, Johann Guilleminot, L. Catherine Brinson

This paper studies the utility of techniques within uncertainty quantification, namely spectral projection and polynomial chaos expansion, in reducing sampling needs for characterizing acoustic metamaterial dispersion band responses given stochastic material properties and geometric defects. A novel method of encoding geometric defects in an interpretable, resolution independent is showcased in the formation of input space probability distr... more

SoundOctober 20, 2023 12:09pm

Comments (0)
Views (210)

Physics-informed Neural Network for Acoustic Resonance Analysis

0upvotes

By: Kazuya Yokota, Takahiko Kurahashi, Masajiro Abe

This study proposes the physics-informed neural network (PINN) framework to solve the wave equation for acoustic resonance analysis. ResoNet, the analytical model proposed in this study, minimizes the loss function for periodic solutions, in addition to conventional PINN loss functions, thereby effectively using the function approximation capability of neural networks, while performing resonance analysis. Additionally, it can be easily appl... more

SoundOctober 20, 2023 10:37am

Comments (0)
Views (205)

CLARA: Multilingual Contrastive Learning for Audio Representation Acquisition

0upvotes

By: Kari A Noriy, Xiaosong Yang, Marcin Budka, Jian Jun Zhang

This paper proposes a novel framework for multilingual speech and sound representation learning using contrastive learning. The lack of sizeable labelled datasets hinders speech-processing research across languages. Recent advances in contrastive learning provide self-supervised techniques to learn from unlabelled data. Motivated by reducing data dependence and improving generalisation across diverse languages and conditions, we develop a m... more

SoundOctober 20, 2023 10:30am

Comments (0)
Views (197)

BUT CHiME-7 system description

0upvotes

By: Martin Karafiát, Karel Veselý, Igor Szöke, Ladislav Mošner, Karel Beneš, Marcin Witkowski, Germán Barchi, Leonardo Pepino

This paper describes the joint effort of Brno University of Technology (BUT), AGH University of Krakow and University of Buenos Aires on the development of Automatic Speech Recognition systems for the CHiME-7 Challenge. We train and evaluate various end-to-end models with several toolkits. We heavily relied on Guided Source Separation (GSS) to convert multi-channel audio to single channel. The ASR is leveraging speech representations from m... more

SoundOctober 20, 2023 10:03am

Comments (0)
Views (208)

Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing

0upvotes

By: Yixiao Zhang, Akira Maezawa, Gus Xia, Kazuhiko Yamamoto, Simon Dixon

Creating music is iterative, requiring varied methods at each stage. However, existing AI music systems fall short in orchestrating multiple subsystems for diverse needs. To address this gap, we introduce Loop Copilot, a novel system that enables users to generate and iteratively refine music through an interactive, multi-round dialogue interface. The system uses a large language model to interpret user intentions and select appropriate AI ... more

SoundOctober 20, 2023 8:15am

Comments (0)
Views (193)

BeatDance: A Beat-Based Model-Agnostic Contrastive Learning Framework for Music-Dance Retrieval

0upvotes

By: Kaixing Yang, Xukun Zhou, Xulong Tang, Ran Diao, Hongyan Liu, Jun He, Zhaoxin Fan

Dance and music are closely related forms of expression, with mutual retrieval between dance videos and music being a fundamental task in various fields like education, art, and sports. However, existing methods often suffer from unnatural generation effects or fail to fully explore the correlation between music and dance. To overcome these challenges, we propose BeatDance, a novel beat-based model-agnostic contrastive learning framework. B... more

SoundOctober 17, 2023 7:25am