Dataset : Perception of expressive vocal cues in musical sounds
local : FR-18008901306731-2022-02-28 external : doi:10.25666/DATAOSU-2022-02-28 , doi:10.1098/rstb.2020.0396
This dataset contains 340 audio stimuli (speech, singing voice, music) designed to contain emotional cues (e.g. smiles, tremor, etc.), as well as experimental data collected on 60 listeners rating the emotional content of these sounds. This dataset corresponds to the work reported in: Bedoya et al. (2021) Even violins can cry: specifically vocal emotional behaviours also drive the perception of emotions in non-vocal music. Philosophical Transactions of the Royal Society B, 376(1840).
computer science, information systems (engineering science), neurosciences (fundamental biology), language & linguistics (humanities), music (humanities), psychology, experimental (humanities)
Data acquisition : from Feb 2018 to Jul 2018
Data provision : 1 Dec 2021
Metadata record : Creation : 28 Feb 2022 Update : 30 Jun 2022
Additional information :
All participants were tested at the Sorbonne-INSEAD Center for Behavioural Science. The experiment was approved by the Institut Européen d’Administration des Affaires (INSEAD) IRB. All participants gave their informed consent and were debriefed about the purpose of the research immediately after the experiment.
Audience : University: master, Research, Informal Education
Daniel Bedoya, Pablo Arias, Laura Rachman, Marco Liuni, Clément Canonne, Louise Goupil, Jean-Julien Aucouturier (2021): Perception of expressive vocal cues in musical sounds. Royal Society. doi:10.25666/DATAOSU-2022-02-28
Spatial coverage :
- France: latitude between 51° 20' N and 41° 15' N, longitude between 5° 15' W and 9° 50' E
Time coverage :
Spectral coverage :
- Acoustique audible: between 20 Hz and 20 kHz
Data creators : Daniel Bedoya , Pablo Arias , Laura Rachman , Marco Liuni , Clément Canonne , Louise Goupil , Jean-Julien Aucouturier 
 : Sciences et technologies de la musique et du son
 : Faculty of Medical Sciences, University of Groningen
 : Alta Voce
 : BabyDevLab, University of East London
 : Franche-Comté Electronique Mécanique Thermique et Optique - Sciences et Technologies (UMR 6174) (École Nationale Supérieure de Mécanique et des Microtechniques)
Publisher : Royal Society Publishing
Projects and funders :
REFLETS - Rétroaction Émotionnelle Faciale et Linguistique, et États de stress post-Traumatiques
- Projet-ANR-17-CE19-0020 (French National Agency for Research)
SEPIA - Processus sensoriels et émotionnels dans les troubles du spectre de l'autisme
- Projet-ANR-19-CE37-0022 (French National Agency for Research)
CREAM - Cracking the emotional code of music
- ERC Starting Grant 335536 (European project)
Fondation pour l'audition
- Fondation pour l'audition FPA-RD-2018-2 (Another national)
Access : available
Formats : audio/x-wav, text/csv
Data acquisition methods :
- Experimental data : We asked two groups of N = 29 musician and N = 31 non-musician listeners to compare pairs composed of the manipulated and non-manipulated variants of each sound using two Likert scales for expressed emotional valence and arousal, and examined whether the manipulations led to similar emotional interpretations when they occurred in speech and music.
- Simulation or computational data : We selected 14 excerpts from songs of various popular music genres (pop, jazz, rock), available as unmixed, multi-track recordings from the free online resource 'Mixing Secrets For the Small Studio'. For each recording, we selected one full musical phrase (singing + accompaniment) of average duration M = 7 sec.
For each excerpt, we then used the available multi-tracks to create variants in 4 conditions: singing (the lead vocal track, without instrumental accompaniment), singing + accompaniment (the original song, composed of lead vocal track and instrumental accompaniment), violin + accompaniment (the original song in which the lead vocal track was replaced by a violin instrumental track matching the main melody) and speech (a recording of a transcription of the lyrics of the lead vocal track, performed as non-musical speech). None of the accompaniment tracks in conditions 'singing + accompaniment' and 'violin + accompaniment' contained additional background vocals.
The instrumental track in the 'violin + accompaniment' condition was recorded on the violin by a semi-professional musician Choeurs et Orchestres des Grandes Écoles) in overdubbing conditions matching the pitch and phrasing of the original vocal track. Speech tracks in the 'speech' condition were recorded by two native English speakers (one male, one female, matching the gender of the original singer), who performed a spoken, neutral-tone rendition of the lyrics, without knowing nor hearing that these were originally singing material. All recordings were performed in music production studios in IRCAM (Paris, France) by a professional sound engineer (D.B.). In addition, we also selected 12 'scream' stimuli from a previous study (Liuni et al. Behavioral Processes, 2020), which consisted of short, isolated shouts of phoneme /a/, recorded by 6 male and 6 female actors. These resulted in 68 sets of multi-track stimuli, matched in 5 different conditions (Speech: 14; singing: 14; singing + accompaniment: 14; violin + accompaniment: 14; and an unmatched set of 12 screams).
Before mixing, the lead track (vocal in conditions 'speech', 'screams', 'singing', 'singing + accompaniment'; violin in condition 'violin + accompaniment') in each of the multi-track stimuli was then processed with three acoustic manipulations simulating specifically-vocal behaviours: smiling (two levels: smile and unsmile), vocal tremor (one level: tremor) and vocal effort (one level: tension). Finally, the tracks of each stimulus were mixed by a professional sound engineer (DB), resulting in 68 non-manipulated and 272 manipulated stereo stimuli.
Datatype : Dataset
- Even violins can cry: specifically vocal emotional behaviours also drive the perception of emotions in non-vocal music (doi:10.1098/rstb.2020.0396)