Intelligently-Controlled Pipeline
The customizable intelligently controlled framework shown in Figer is divided into three broad stages to serve different purposes: preprocessing, emotion retrieval, and perceptual evaluations.
Affective Naturalistic Database Consortium
The customizable intelligently controlled framework shown in Figer is divided into three broad stages to serve different purposes: preprocessing, emotion retrieval, and perceptual evaluations.
After the emotion retrieval step, the emotional utterances undergo a perceptual evaluation using label studio. This stage involves human annotating the utterances with emotional attributes (Arousal, Valence, Dominance) and categorical emotions (Happiness, Anger, Sadness, etc.), which is a common approach used in many existing affective data collections. We followed a similar approach in our pipeline The questionnaire used for annotation. The emotional utterances retrieved from the previous stage of the pipeline are annotated on a 7-point Likert scale to evaluate Valence (ranging from very negative to very positive), Arousal (ranging from very calm to very active), and Dominance (ranging from very weak to very strong). To assist the evaluators in annotating these dimensional attributes, we employ self-assessment manikins (SAMs) as a visual guide. The evaluators are asked to select one primary emotion that they perceive best characterizes the emotional utterance from a list of eight primary emotions: Anger, Sadness, Happiness, Surprise, Fear, Disgust, Contempt, and Neutral.
Naturalistic speech recordings involve real-world communication and it is challenging to elicit emotional states that cannot be adequately expressed with a single emotion. So, we also annotate for secondary emotions, where the evaluators can select all the possible emotional states they perceive in utterances (e.g., Anger + Depressed + Annoyed). Here, the list of secondary emotional states includes Amused, Frustrated, Depressed, Concerned, Disappointed, Excited, Confused, and Annoyed. To reduce the cognitive load, similar emotional categories are grouped together. In addition to annotating emotional states, we also annotate utterances for correct transcription and speaker gender.
This pipeline framework not only targets emotional content but also determines the gender of each unlabelled speech utterance. An intelligent component is utilized to predict gender and control gender balance. In summary, the retrieved emotional and gender predictions are ranked using scores from all components to prioritize high emotional content and minority emotional states. This ranking helps set thresholds to determine which gender and emotional states should be prioritized for annotation, ensuring a more comprehensive and accurate annotated dataset while minimizing bias.
After the emotion retrieval step, the emotional utterances undergo a perceptual evaluation using label studio. This stage involves human annotating the utterances with emotional attributes (Arousal, Valence, Dominance) and categorical emotions (Happiness, Anger, Sadness, etc.), which is a common approach used in many existing affective data collections. We followed a similar approach in our pipeline The questionnaire used for annotation. The emotional utterances retrieved from the previous stage of the pipeline are annotated on a 7-point Likert scale to evaluate Valence (ranging from very negative to very positive), Arousal (ranging from very calm to very active), and Dominance (ranging from very weak to very strong). To assist the evaluators in annotating these dimensional attributes, we employ self-assessment manikins (SAMs) as a visual guide. The evaluators are asked to select one primary emotion that they perceive best characterizes the emotional utterance from a list of eight primary emotions: Anger, Sadness, Happiness, Surprise, Fear, Disgust, Contempt, and Neutral.
Naturalistic speech recordings involve real-world communication and it is challenging to elicit emotional states that cannot be adequately expressed with a single emotion. So, we also annotate for secondary emotions, where the evaluators can select all the possible emotional states they perceive in utterances (e.g., Anger + Depressed + Annoyed). Here, the list of secondary emotional states includes Amused, Frustrated, Depressed, Concerned, Disappointed, Excited, Confused, and Annoyed. To reduce the cognitive load, similar emotional categories are grouped together. In addition to annotating emotional states, we also annotate utterances for correct transcription and speaker gender.