Announcing the VoiceMOS Challenge 2023!

In 2022 we ran the first edition of the VoiceMOS Challenge, a shared task for Mean Opinion Score prediction of synthesized speech. 22 teams from academia and industry participated, and we had a special session about the challenge at Interspeech. The main task was MOS prediction for a large-scale, diverse dataset of synthesized speech from text-to-speech and voice conversion systems, and an out-of-domain track focused on MOS prediction for a separate listening test using only a small amount of labeled data.

The 2023 challenge emphasized real-world and challenging zero-shot out-of-domain MOS prediction with three tracks for three different voice evaluation scenarios. In collaboration with this year’s Blizzard Challenge and Singing Voice Conversion Challenge, we had a French speech synthesis track and a singing voice conversion track. Samples from BC and SVCC teams were provided while those challenges’ listening tests are still ongoing, and the task was to predict MOS ahead of the true ratings becoming known. We also had a third noisy and enhanced speech track with a leaderboard on CodaLab. Unlike in 2022, no MOS-labelled audio data from the target domains was provided. This was to reflect a real-world MOS prediction scenario. Participants were encouraged to develop MOS predictors that are flexible and generalizeable to a wide variety of audio evaluation tasks. As in 2022, the primary evaluation metrics for MOS prediction focused on correct ranking of synthesis systems in each track. Teams were allowed to participate in any subset of the three tracks.

The VoiceMOS Challenge 2023 has ended!

Thanks so much to everyone who participated! The VoiceMOS Challenge 2023 was accepted as a special session at ASRU 2023.

The Track 3 TMHINT-QI(S) test phase data is described here: A Study on Incorporating Whisper for Robust Speech Assessment by Ryandhimas E. Zezario, Yu-Wen Chen, Szu-Wei Fu, Yu Tsao, Hsin-Min Wang, and Chiou-Shann Fuh.

VoiceMOS 2023 Publications

Challenge summary paper: The VoiceMOS Challenge 2023: Zero-Shot Subjective Speech Quality Prediction for Multiple Domains

LE-SSL-MOS: Self-Supervised Learning MOS Prediction with Listener Enhancement

SQAT-LD: SPeech Quality Assessment Transformer Utilizing Listener Dependent Modeling for Zero-Shot Out-of-Domain MOS Prediction

Tentative schedule

The tentative schedule for the VoiceMOS challenge is as follows:

Rules

Suggested public datasets

Open-source MOS predictors:

These were the baseline systems in last year’s challenge:

Organizers