ASVspoof 2019 Analyses

Special Session Description

The detection of fake audios to spoof automatic speaker verification (ASV) systems is in high demand to facilitate the security of a host of applications employed in smart home applications to online banking and payment solutions. The “Automatic Speaker Verification Spoofing and Countermeasures” (ASVspoof) challenges are bi-annual research challenges to accelerate anti-spoofing research with previous editions in 2015 and 2017. The ASVspoof 2019 edition has two sub-challenges involving a “Logical Access” scenario (LA; speech synthesis/conversion attacks) and a “Physical Access” scenario (PA; replay attacks) to advance the future horizons in spoofed and fake audio detection. For the Interspeech 2019 ASVspoof Challenge, more than 60 international industrial and academic teams submitted automated countermeasures without knowing: i) the attack instrument type (the used synthesis/conversion algorithm among eleven unknown attack types) in the case of the LA scenario and ii) the environment where human speech was captured and the replay setup in which (simulated and laboratory scenario testing) presentation attacks are carried out for the PA scenario.

In the ASRU “ASVspoof 2019 Analyses” Special Session, participants get full information access to analyze and strengthen countermeasures in known operational environments, in particular, the key, metadata, and additional data are released to participants via Edinburgh DataShare. The objectives of the ASVspoof 2019 Analysis special session are:

Logical Access: Classification of human (bona fide) speech signals vs. machine generated speech signals by
state-of-the-art text-to-speech (TTS) and voice conversion (VC) technology.
Physical Access: Classification of human (bona fide) speech signals vs. (simulated and laboratory scenario testing) captured and replayed speech signals.

For evaluation, ASVspoof 2019 adopted a new t-DCF evaluation metric that reflects the impact of spoofing and of countermeasures on ASV performance. The ASVspoof 2019 challenge reports on the t-DCF performance ‘pooled’ across all possible setting configurations for the “Logical Access” and for the “Physical Access” sub-challenges, respectively. Participants are free to choose metrics suitable to their purpose but should also report the primary metrics of the ASVspoof 2019 challenge for the sake of comparable results reporting. We genuinely suggest two alternatives without narrowing the creativity of participants. For one, participants could employ different ASV (or other speech technology) systems to investigate on the tandem performance of spoofed and fake audios for their operational case studies. For another, with the aim of analysing discrimination performance between bona fide and spoofed speech across operational settings, participants could report the maximum ‘min t-DCF’ across all “Logical Access” and “Physical Access” settings, respectively. Note, scores of countermeasure systems are expected to be high for bona fide speech, and low otherwise (as for ASV systems, high scores represent genuine verification attempts and low scores represent subversive access attempts). Based on the presented results, further conclusions can be drawn on the future horizons in spoofed and fake audio detection.

ASVspoof 2019 Analyses participants may wish to read our Odyssey 2018 paper which describes the t-DCF metric. It will be the default metric for ASVspoof 2019.

T. Kinnunen, K.-A. Lee, H. Delgado, N. Evans, M. Todisco, M. Sahidullah, J. Yamagishi, D.-A. Reynolds, "t-DCF: a Detection Cost Function for the Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification", Proc. Odyssey 2018 - The Speaker and Language Recognition Workshop [PDF]

Software implementations of the t-DCF metric can be downloaded in MATLAB and Python formats.

ASVspoof 2019 participants may make use of the following baseline countermeasures:

M. Sahidullah, T. Kinnunen and C. Hanilci, "A comparison of features for synthetic speech detection", Proc. INTERSPEECH 2015, pp. 2087–2091 [PDF]

M. Todisco, H. Delgado and N. Evans, "Constant Q cepstral coefficients: a spoofing countermeasure for automatic speaker verification", Computer, Speech and Language, vol. 45, pp. 516 –535, 2017 [PDF]

Software implementations of both the LFCC-GMM and CQCC-GMM baselines are also available.