Overview
The field of speech emotion recognition (SER) aims to create scientifically rigorous systems that can reliably characterize emotional behaviors expressed in speech. A key aspect of building SER systems is to obtain emotional data that is both reliable and reproducible for practitioners. However, academic researchers encounter difficulties in accessing or collecting naturalistic large-scale, reliable emotional recordings. Also, the best practices for data collection are not necessarily described or shared when presenting emotional corpora. To address this issue, this website presents the creation of an affective non-realistic database consortium (ANDC) that can encourage multidisciplinary cooperation among researchers and practitioners in the field of affective computing.
We divided the consortium website into two parts, customizable-standard pipeline and affective speech corpora. The website provides a comprehensive overview of the intelligent components of the standardized pipeline framework, allowing researchers to easily collect their own dataset based on specific criteria, such as speaker demographics, emotional categories, and audio quality. To offer data collection process transparency, a preview function is added that enables researchers to listen to short audio clips demonstrating the effects of a few intelligent components, such as music filtering and SNR filtering. Additionally, researchers can access technical specifications like audio file format, sample rate, and bit depth to aid the upgrade in the pipeline. The website includes various resources for practitioners, such as code repositories, pre-trained models, and forums for community discussion and collaboration. We offer a search function to help researchers find specific datasets based on keywords, affective states, or other criteria.
In order to make the pipeline more than just available, but also customizable, practitioners can add their own components to adapt to specific needs or requirements. For instance, if the data collection is for emotions in a different language, adjustments can be made to the emotion retrieval (ER) components to incorporate important linguistic features. To assist in this collaboration, we will provide our pipeline and code to researchers via GitHub. The GitHub repository will include all resources for researchers to replicate our data collection methodology. Additionally, we have structured our code modularly with documentation for each module. Each module is self-contained and allows switching between multiple components using a universal schema. We welcome the researchers or practitioners of \emph{Affective Computing} community to use our provided components or build and share their own code implementations, research findings, and the model components which they think are more effective for this kind of intelligent-governed data collection infrastructure, promoting a culture of collaboration and transparency in the field.
Shreya G. Upadhyay*, Woan-Shiuan Chien*, Bo-Hao Su, Lucas Goncalves, Ya-Tse Wu, Ali N. Salman, Carlos Busso and Chi-Chun Lee, “An Intelligent Infrastructure Toward Large Scale Naturalistic Affective Speech Corpora Collection,” in 2023 11th International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 2023, pp. https://ieeexplore.ieee.org/#