Musical material

Speech as musical material

Several ideas have influenced the music in this project. From prosodic phenomena identified as significant in linguistics, expressive characteristics of speech genres, to ideas inspired by technical possibilities and original musical ideas developed in the encounter with speech. An overview of how these influences have shaped instrument design and musical choices is given below.

Prosodic phenomena

What exactly constitutes the material when using speech as a source for music? Since speech includes language and language conveys ideas, it could from a conceptual point of view be almost anything in the sphere of human activity – the historical context, the site, the identities, the topic of conversation, the poetic qualities of words, the voice as instrument, or metaphor, and so forth. Speech is of course also experienced physically as sound. Above all, highly structured sound, a feature it shares with music. One of my methods has been to first of all listen to speech as if it already is music – what kinds of qualities are already present and how little would need to be changed in order to make it work as music (and what does it actually mean for something to work as music?). I really wanted to avoid just shoehorning the sounds of speech into my already existing notions and aesthetic preconceptions of what music should be like. So I started by looking into linguistic literature on prosodic speech structures and what function and meaning they might carry.

In “The music of everyday speech”, the linguist Ann Wennerstrom gives a thorough account of how prosodic features are actively used to structure utterances and convey information (Wennerstrom, 2001). Examples of such features include how strong accents are typically used to highlight the most important words, while high-pitched syllables are used to mark new information. On the other hand, the modulation to a higher “key” is often used to signify a change of subject, (and similarly, a lower mean pitch is used to signal supplementary comments, as if they were in parentheses).

Though linguists typically operate with phonemes as the lowest level of segmentation, the syllable is regarded the basic unit of rhythm (a syllable is usually based around a voiced vowel, and having a pitch it can be viewed as corresponding to the musical concept of a note), and an interesting rhythmic phenomenon is how a shared semi-regular pulse or tempo is usually adapted. But how syllables express this pulse can be different, as languages are generally classified into two categories of timing: In stress-timed languages (e.g. Germanic languages like English, German and Norwegian) the stressed syllables are placed at regular intervals approaching an even pulse while the unstressed syllables in between are sped up or slowed down in order to match this pulse. In syllable-timed languages (e.g. Roman languages like French and Spanish), all syllables are timed more or less according to the underlying pulse (interestingly enough, this timing difference has even been demonstrated in music in a study on rhythmical differences of English and French classical music (Patel & Daniele, 2003) ). The adjustment to a shared pulse also extends across turns in a conversation, with speakers often timing their responses to the pulse implied by the former speaker.

These and other prosodic phenomena provided the background for identifying significant features that could be interesting to use as a foundation for exploring speech musically. That includes the syllable as basic rhythmic and melodic unit, the use of both stressed and high-pitched accents for creating rhythmic structures, and the possibility of rhythmical quantization to a grid derived from the underlying pulse. This background has consequently influenced design choices in the software instrument used in these explorations.

Speech genres

A popular theme that is often brought up when discussing speech as music is the apparent differences between local dialects, or the cultural stereotypes of different spoken languages. On the other hand, linguists often note how the different phonetic structures make languages sound completely different. In addition, one can observe individual speaking styles caused by all kinds of personal idiosyncrasies or physical conditions relating to age, sex and other personal traits in general, like stuttering or a hoarse voice etc.

However, when listening to recorded speech from different languages and settings, I have been surprised over how similar languages and people sound in comparable situations. The word situation is a clue here, as the style or genre used in a given situation conveys the social context and purpose of communication. It would probably sound strange in any language or culture to speak in a very formal tone to an infant (with the possible exception of ritual situations like baptism).

One explanation can perhaps be found looking at the function of speech as gestures in social situations. “Sound is touch at a distance”, Anne Fernald noted, observing how parents all started to speak in a comforting tone to stay in touch with their babies after putting them down (Radiolab, 2006). I think there is something fundamental about how speech can be perceived this way as touch – as physical sensations comprehended through the wider cognitive apparatus, including emotions. Sound is after all physical vibration, and the physiological foundation for this kind of sensation-based cognition means that it extends across cultures as something more universal than cultural codes (even across species, as dogs seem to have few problems interpreting intentions from speech gestures).

Speech genres can be viewed as formalized expressions of both such gestural sensations and social conventions. For musical purposes, this is interesting as it hints at a deeper level on which music might function as a kind of social language. This becomes even more interesting when the focus is on musical improvisation as social interaction.

Based on these perspectives, the motivation to focus on speech genres has first of all influenced the kind of recordings and speech I have used as subjects of study, but it has also defined the methods I have used for generating musical material, interpreting the character of such genres as musical expressions.

Technical possibilities

Developing an instrument has also contributed to the musical ideas, generated from the encounter with certain technical possibilities. The technical development process often involved the creative use (and abuse) of different techniques just to explore how they would sound and if anything interesting would happen. The result of heavy cepstral smoothing can serve as one example of this. Reducing the frequency spectrum to just a few frequency bands, it can result in an apparently formalistic play between high and low registers, quite abstract but nevertheless conveying some dynamic traces of the original speech gestures.

Cepstral smoothing example:


Another example of technical-derived ideas is the possibility to express vowels as musical chords. Formants are the characteristic spectral peaks that define vowels, and are not usually expressed as individual pitches but are more like filter frequencies that shape the spectrum of a source sound. It is fascinating how just a few formant frequencies are enough to express intelligible speech, even with no fundamental frequency or other spectral information present. Using the Linear Predictive technique to track formants and output the resulting frequencies as chords can result in an interesting abstraction of this “almost intelligible” spectral shape of speech.

A mel-frequency spectrum showing the four lowest formants and the corresponding musical chords transcribed from the sound example below.

Sound example of iterative transformations of formants into abstracted chord sequences:


Musical ideas

Purely musical ideas have of course also driven musical development. The motivation was always to make music and not just didactic demonstrations, so at some point it is absolutely necessary that intuitive associations and connotations take over in the further process of making music from these sources. This can for example be the creation of complex polyrhythmic layers, not really a part of speech at all, but inspired by certain speech tempo variations or based on some strong rhythmical feature typical of a speech genre and therefore still related to this material. Or it can be stretching out vowels and layering several voices to create dense microtonal choral textures, only using the vowel timbres from speech to pursue an otherwise abstract musical idea:


In this way, the prosodic features, speech genres and technical possibilities has served as starting points, presenting interesting phenomena that generate new sonic ideas, explored further through musical discourses with this material.


Patel, A. D., & Daniele, J. R. (2003). An empirical comparison of rhythm in language and music. Cognition, 87(1), pp.35-45.

Radiolab. (2006). Musical Language [Audio Podcast]. New York: WNYC Radio. Retrieved from

Wennerstrom, A. (2001). The Music of Everyday Speech: Prosody and Discourse analysis. Oxford: Oxford University Press.

← Previous page: Music Next page: Musical methods