Perception of expressive vocal cues in musical sounds (2021)

[1] : Sciences et technologies de la musique et du son
[2] : Faculty of Medical Sciences, University of Groningen
[3] : Alta Voce
[4] : BabyDevLab, University of East London
[5] : Franche-Comté Electronique Mécanique Thermique et Optique - Sciences et Technologies (UMR 6174) (École Nationale Supérieure de Mécanique et des Microtechniques)
Description :
This dataset contains 340 audio stimuli (speech, singing voice, music) designed to contain emotional cues (e.g. smiles, tremor, etc.), as well as experimental data collected on 60 listeners rating the emotional content of these sounds. This dataset corresponds to the work reported in: Bedoya et al. (2021) Even violins can cry: specifically vocal emotional behaviours also drive the perception of emotions in non-vocal music. Philosophical Transactions of the Royal Society B, 376(1840).
Disciplines :
computer science, information systems (engineering science), neurosciences (fundamental biology), language & linguistics (humanities), music (humanities), psychology, experimental (humanities)
Keywords :

General metadata

Data acquisition date : from Feb 2018 to Jul 2018
Data acquisition methods :
  • Experimental data :
    We asked two groups of N = 29 musician and N = 31 non-musician listeners to compare pairs composed of the manipulated and non-manipulated variants of each sound using two Likert scales for expressed emotional valence and arousal, and examined whether the manipulations led to similar emotional interpretations when they occurred in speech and music.
  • Simulation or computational data :
    We selected 14 excerpts from songs of various popular music genres (pop, jazz, rock), available as unmixed, multi-track recordings from the free online resource 'Mixing Secrets For the Small Studio'. For each recording, we selected one full musical phrase (singing + accompaniment) of average duration M = 7 sec.

    For each excerpt, we then used the available multi-tracks to create variants in 4 conditions: singing (the lead vocal track, without instrumental accompaniment), singing + accompaniment (the original song, composed of lead vocal track and instrumental accompaniment), violin + accompaniment (the original song in which the lead vocal track was replaced by a violin instrumental track matching the main melody) and speech (a recording of a transcription of the lyrics of the lead vocal track, performed as non-musical speech). None of the accompaniment tracks in conditions 'singing + accompaniment' and 'violin + accompaniment' contained additional background vocals.

    The instrumental track in the 'violin + accompaniment' condition was recorded on the violin by a semi-professional musician Choeurs et Orchestres des Grandes Écoles) in overdubbing conditions matching the pitch and phrasing of the original vocal track. Speech tracks in the 'speech' condition were recorded by two native English speakers (one male, one female, matching the gender of the original singer), who performed a spoken, neutral-tone rendition of the lyrics, without knowing nor hearing that these were originally singing material. All recordings were performed in music production studios in IRCAM (Paris, France) by a professional sound engineer (D.B.). In addition, we also selected 12 'scream' stimuli from a previous study (Liuni et al. Behavioral Processes, 2020), which consisted of short, isolated shouts of phoneme /a/, recorded by 6 male and 6 female actors. These resulted in 68 sets of multi-track stimuli, matched in 5 different conditions (Speech: 14; singing: 14; singing + accompaniment: 14; violin + accompaniment: 14; and an unmatched set of 12 screams).

    Before mixing, the lead track (vocal in conditions 'speech', 'screams', 'singing', 'singing + accompaniment'; violin in condition 'violin + accompaniment') in each of the multi-track stimuli was then processed with three acoustic manipulations simulating specifically-vocal behaviours: smiling (two levels: smile and unsmile), vocal tremor (one level: tremor) and vocal effort (one level: tension). Finally, the tracks of each stimulus were mixed by a professional sound engineer (DB), resulting in 68 non-manipulated and 272 manipulated stereo stimuli.
Update periodicity : no update
Language : English (eng)
Formats : audio/x-wav, text/csv
Audience : University: master, Research, Informal Education


Spatial coverage :

  • France: latitude between 51° 20' N and 41° 15' N, longitude between 5° 15' W and 9° 50' E

Time coverage :

Spectral coverage :

Taxonomic coverage :

  • Psychological data on 60 young adults (mean age 23.1 yo)
    Homo sapiens MSW (Human)
Publications :
  • Even violins can cry: specifically vocal emotional behaviours also drive the perception of emotions in non-vocal music (doi:10.1098/rstb.2020.0396)
Collection :
Projects and funders :
Additional information :
All participants were tested at the Sorbonne-INSEAD Center for Behavioural Science. The experiment was approved by the Institut Européen d’Administration des Affaires (INSEAD) IRB. All participants gave their informed consent and were debriefed about the purpose of the research immediately after the experiment.
Record created 28 Feb 2022 by Jean-Julien Aucouturier.
Last modification : 30 Jun 2022.
Local identifier: FR-18008901306731-2022-02-28.


dat@SUPMICROTECH-ENSMM is a sub-portal of dat@UBFC, a metadata catalogue for research data produced at UBFC.

Terms of use
Université de Bourgogne, Université de Franche-Comté, UTBM, AgroSup Dijon, ENSMM, BSB, Arts des Metiers