This section presents the ideas behind the project, including the development of its aims and areas of interest, and an account of the chosen approaches and artistic focus in relation to the context of other artists that have been working with similar ideas.

Project topic

The topic for this project is the relationship between music and speech, in particular improvised music and everyday conversation. From a creative musician’s point of view, it explores how features of speech can be used as a source for making improvised music. The main methods for this exploration include the development of a new digital musical instrument system and performance concept for “orchestrating” speech into music in real time, the actual use of this instrument in practice through performance, and the method of improvisation as an intuitive way of approaching this by ear.

Key questions

To define concise research questions does not always seem relevant in artistic research. Artistic exploration is not hypothesis-led but could rather be seen as discovery-led (Rubidge, 2005), about pursuing an artistic hunch, an intuitive search on the basis of observation and sensitivity towards certain phenomena that seems interesting to explore. For such a search, as Henk Borgdorff notes, the prevailing format for research design is basically inadequate (Borgdorff, 2012, p. 164).

Nevertheless, a certain delimitation of area of interest is necessary to focus the exploration, and the initial aim of this project was defined somewhat loosely as this:

To develop an improvisational foundation for making music that is closely related to the genuine human musicality inherent in spoken language.

During the work with the project, it has gradually become clearer which aspects of the speech/music relation I am actually exploring, and which aspects are not part of this particular investigation. For instance, when grappling with the many narratives and contexts that came into play when introducing speech into the music, I had the growing insight that I perhaps was as much trying to explore the communicative implications of improvised musical utterances as I was exploring the musical potential of speech. This led to a shift away from speech as spoken word, with its implicit references to the conceptual reality of semantics and language, to a more specific focus on the nonverbal aspects of spoken utterances that are similar to improvised musical gestures, and to how these gestural aspects of speech work in actual everyday conversation. Following this shift, the initial and somewhat general aim to explore speech in relation to improvisation has been developed further to be more specifically about the kind of speech going on in real life social interactions, how such social relations are reflected in stylized speech genres, about the shape of spoken utterances primarily as vocal ‘musical’ gestures, and about how such gestures work in relation to the interaction and interpretation going on in improvised music.

This can be expressed through a range of new artistic questions, both practical and more philosophical:

What is the musical and communicative potential of vocal gestures of everyday speech when used in a setting of musical improvisation?

How can the musical exploration of everyday speech be integrated in a performance concept based on improvisation?

How does this musical use of such speech gestures affect the perception of both music and speech in the music?

So, in a sense, what I have been trying to achieve is to juxtapose the act of engaging in musical improvisation with the everyday activity of engaging in spontaneous conversation in a way that hopefully can shed some light on connections between possible meanings and functions of sound gestures in both speech and music.

Background ideas: language, speech and music relations

In a very general sense, the project is based on the idea that spoken language and music are closely related and probably share evolutionary origins, and that it is reasonable to believe that some aspects of creating and experiencing music can be related to the communicative role of musical features in speech. Such ideas have been explored from several perspectives in a growing literature on the evolutionary origins and functions of art and ritual in recent decades, as in the interdisciplinary field mapped out by Ellen Dissanayake and others through compilations like for instance Communicative Musicality (S. Malloch & Trevarthen, 2009).

The topic then, is not what we say, but how we say it – how the intonation, register, tempo, rhythm, dynamics, and voice quality form a communicative layer of its own in speech. This is what linguists call prosody (from Greek: towards song), and this music of everyday speech constitutes a huge semantic potential that with or without our knowing expresses our state of mind, our intentions, expectations, attitudes, relations, feelings, notions and views, and which in hermeneutical ways affect how our utterances are interpreted.

As the linguistic fields of prosody and conversation analysis show, these features have obvious pragmatic functions for helping structuring conversation (Szczepek Reed, 2011; Wennerstrom, 2001). But while linguist look at how such prosodic features are used for negotiating turn taking or highlighting new information, it is from a musical point of view interesting to see if these structures – nuanced and intuitively meaningful vocal gestures – also can make sense as recognizable patterns in music. In this regard, improvised music can be viewed as a particularly close parallel to conversation, as both involve a continuous dialogical negotiation of content and development, conveying intentions with many of the same means and mechanisms.

Another interesting aspect of speech from a musical point of view is how prosodic styles can express social relationships. This is what the language philosopher and literary critic Mikhail Bakhtin referred to when discussing speech genres – stylistic templates that we tend to use as formal frameworks when constructing utterances on the fly (Bakhtin, 1986). Just like literary genres, they include choice of style and wording, but speech genres also include specific prosodic traits like the use of certain registers, dynamic ranges, vocal effort, tempi etc. Taken together such traits can be seen to form musical characters that communicates something important about the social situation and thus provides the context for interpreting the actual words uttered. For instance, the degree of metric regularity of speech conveys something about the social distance, with very dynamic ‘tempo rubato’ signifying a close relationship, subjective opinion, private conversation etc., while more even, regular timing is used when referring something objective, impersonal and formal (Leeuwen, 1999). Other significant genre characteristics typically include speech rate (tempo), register (mean pitch), voice quality, loudness, phrase and pause length, melodic contour, dynamic range, etc. All these prosodic traits affect the interpretation of the possible meaning and intention behind any utterance.

This basic gestural layer of meaning is deeply embedded in spoken language, and we intuitively use different genres in different social situations, such as talking to children, to a judge, to a lover, to an audience or a reporter on live TV. The genres are a natural part of the everyday social characters we take on, and only stick out when used differently from what is expected, like for instance the patronizing way of talking to adults as if they were children. According to Bakhtin, there are as many potential speech genres as there are potential social relations. Small talk, pillow talk, baby talk, interrogation, public address, report, confession, etc. are only some examples of speech genres where the form provides an important part of the meaning of an utterance.

Bakhtin’s emphasis on genres derived from his view on language that words do not have any meaning by themselves – it is how they are used in a particular utterance with a specific social context, that actually provides the meaning. Speech genres is part of what expresses and generates this social context and thus convey a kind of social meaning of intention that we seem very attentive to. Interestingly enough, this is a kind of meaning that is expressed mainly through musical features like rhythm, melody and dynamics.

A musical exploration of the characteristics of such speech genres has therefore been one of the main themes in this project, and is one of the reasons for its focus on prosody as the main musical material of speech.

Musical context

In a historical context, to make the connection between music and language is nothing new. In Europe during the 17th century in particular, music was increasingly seen in connection with Antiquity’s highly developed art of rhetoric (Bartel, 1997; Bonds, 1991). Music theory books from the period show how much these ideas of rhetoric influenced German baroque music (Mattheson, 1739), and that this music speaks is something that early music pioneers later have pointed to as a key for interpreting and performing this kind of music (Harnoncourt, 1982). Speech has never been far away in the recitatives of Opera either, and features in the instrumental music of some composers like the speech-melodies transcribed by Janáček, and in the sprech-gesang by championed later by composers such as Schönberg, Berg and Webern.

Nevertheless, it is mainly during the last 60-70 years that the availability of sound recording technology has made possible a much more extensive musical exploration of speech and the voice. Cathy Lane has given a thorough overview of many compositional approaches and contributors in this field (Lane, 2006), many of which also feature in the compilation “Playing with words” (Lane, 2008), and are also covered by Michael Vincent (Vincent, 2010). Lane identifies several distinct compositional approaches and techniques using speech and voice in music, from pure documentaristic pieces, montages (e.g. the radiophonic pieces of Glenn Gould), performative explorations of language and the voice (Aperghis, Berio, Ligeti), sound poetry (Schwitters, Jaap Blonk), different ways of electronically transforming recorded speech and song (Herbert Eimert, Stockhausen, among others) and the use of speech fragments as melodic motives (Steve Reich). Trevor Wishart in particular has explored many aspects of the voice and speech in his compositions, such as sonic transformations (Red Bird), the voice as icon of personality and identity (Two Women, American Triptych), phonetic units as musical material (Tongues of Fire, Globolalia) etc., and has also written extensively on composing using the expressivity of the human voice (Wishart, 1994) (Wishart, 1996) (Wishart, 2012). Other approaches include the connection between sound and text explored by the Swedish tradition of Text-Sound composition after the likes of Lars-Gunnar Bodin (Brunson, 2009). Many have used speech directly as a melodic source, such as Paul Lansky, Paul DeMarinis, Robert Ashley, Scott Johnson, Florent Ghys, Jacob ter Veldhuis, Michael Vincent as well as jazz pianist Jason Moran. Others have made instrumental music based in various ways on speech, like the spectral analyses transcribed by Jonathan Harvey in his 2008 orchestra piece “Speaking”. On the video sharing web site YouTube, there is even a whole sub-genre of musicians “playing” the speech melodies in sync with videos of well-known speeches or TV shows.
Interesting technological approaches relating speech to gesture have also been explored in recent years, such as the analysis, modelling and transformation of speech expressivity by Grégory Beller and others in the speech research community at IRCAM (Beller, Schwarz, Hueber, & Rodet, 2005) (Beller, 2009). Relevant for my project’s emphasis on improvisation is the music and research of pianist Sten Sandell, who from the perspective of a performer uses the act of speaking as an integral part of improvised piano performances (Sandell, 2011, 2013). Another relevant reference is the music of Peter Ablinger, especially his cycle of “Voices and Piano” pieces and his use of a mechanical piano to render speech.

These are only some of the multitude of ways that speech has been used in relation to music. Since speech and music are universal human phenomena and thus can be related to almost any aspect of human experience, a large number of interesting perspectives are possible. So even if the subject of speech and music is common, the particular focus of each individual approach can be quite different.

The focus of Ablinger for instance is on the representation of reality. He has described his use of the mechanical piano as imposing a grid on the sonic reality, a phonorealistic music as an analogy to photorealistic painting (Ablinger, n.d.). His voice pieces have the additional character of musical portraits of famous historical persons, placing the emphasis on personal idiosyncrasies, individual stories and shared cultural history. Sten Sandell on the other hand focuses on the act of speaking primarily as a performer, like a performing poet equating speaking with playing music as two possible outcomes of the same improvisational impulse. While Wishart has treated a particularly wide range of aspects of speech in his compositions, the focus is often on the sound and the voice as a much wider phenomenon than just speech. In the piece “Encounters in the Republic of Heaven”, which with its focus on everyday speech comes close to the approach of this project, there is also the overall concept of a voice portrait of the local community in Yorkshire.

To explain my own musical approach to this topic of speech, it is perhaps necessary to detail my musical background. Educated as a performer of the piano and Hammond organ, I have worked mostly with improvised music in jazz and contemporary genres. One direct influence for doing this project has been the experience as a performer that many of the things going on in improvised interplay are quite similar to the dynamics of spoken conversation. Not just analogous or metaphorically similar, but at times actually the same, like for instance the linguistic concept of backchannels – short responses such as “uh huh” or “yeah” to affirm and acknowledge that one follows the line of thought of fellow speakers, similar to the compingfigures often used for the same purpose in jazz improvisation. In a previous project of developing a personal contemporary idiom for the Hammond organ, I used the concept of staging improvised musical dialogues with unusual instrument combinations as a method for provoking new ideas and come to new musical conclusions, not unlike how one can reach new insights through spoken dialogues with different people.

Sound example: Musical dialogue as a method. Excerpt from Hammond Dialogues vol 2: Twined (Hammond B3 organ with string trio)

That experience led to the idea of using actual speech as material in improvised music, to further explore this connection and see how this could affect the perception of both speech and music. Rather than using stylized forms like recited poetry or public speeches, I wanted to pursue the connection of dynamic interaction and dialogical interpretation being present in both improvised music and spontaneous conversation, and the focus of this project has therefore been on the improvised speech going on in real life conversations. An additional approach has been to explore speech genres as social context and musical character, and one aim has been to highlight the connection between conversation and musical improvisation as similar modes of communicative interplay. Another important concern that emerged during the project was how these topics can be integrated into an appropriate performance concept, bridging the sound realms of acoustic instrumental performances and virtual electric soundscapes of recorded speech.

Seen in relation to the historical and musical context described above, this represents a slightly different musical approach to speech and spoken conversation primarily as gestural improvised interplay, highlighting improvisation as discourse and language-like process both in music and conversation. A more detailed review of how the actual artistic results of this project relates to the different contexts will be discussed later in the chapter “Reflections on Musical Results”. But before that, an account will be given of the results themselves and a description of the processes leading to these results, as well as some wider reflections on related issues, thoughts and ideas.

← Previous page: Introduction Next page: Work and results


Ablinger, P. (n.d.). Voices and Piano program note. Retrieved December 1, 2017, from

Bakhtin, M. M. (1986). The Problem of Speech Genres. In Speech Genres and Other Late Essays (pp. 60–102). Austin: University of Texas Press.

Bartel, D. (1997). Musica poetica : musical-rhetorical figures in German Baroque music. Lincoln: University of Nebraska Press.

Beller, G. (2009). Analyse et Modèle Génératif de l ’ Expressivité, Application à la Parole et à l’Interprétation Musicale. (Doctoral thesis). Universite Paris VI, Paris.

Beller, G., Schwarz, D., Hueber, T., & Rodet, X. (2005). Hybrid Concatenative Synthesis On The Intersection of Music and Speech. In Journees d’Informatique Musicale (pp. 41–45).

Bonds, M. E. (1991). Wordless Rhetoric. Musical Form and the Metaphor of the Oration. Cambridge, Mass: Harvard University Press.

Borgdorff, H. (2012). The conflict of the faculties: perspectives on artistic research and academia. Leiden: Leiden University press.

Brunson, W. (2009). Text-Sound Composition – The Second Generation. In Proc. of EMS-09 Conference on Electronic Music Studies.

Harnoncourt, N. (1982). Musik als Klangrede : Wege zu einem neuen Musikverständnis. Salzburg: Residenz.

Lane, C. (2006). Voices from the Past: compositional approaches to using recorded speech. Organised Sound, 11(1), 3–11.

Lane, C. (Ed.). (2008). Playing with Words. London: CRiSAP.

Leeuwen, T. van. (1999). Speech, Music, Sound. London: Macmillan Press.

Malloch, S., & Trevarthen, C. (Eds.). (2009). Communicative musicality: Exploring the basis of human companionship. Oxford: Oxford University Press.

Mattheson, J. (1739). Der vollkommene Capellmeister. Hamburg: Christian Herold.

Rubidge, S. (2005). Artists in the academy: reflections on artistic practice as research. In Dance Rebooted: Initializing the Grid Conference Proceedings. Retrieved from

Sandell, S. (2011). Music inside the Language [CD]. Steninge, Sweden: LJ Records.

Sandell, S. (2013). På insidan av tystnaden : en undersökning. (Doctoral thesis). Konstnärliga fakulteten, Göteborgs universitet, Göteborg.

Szczepek Reed, B. (2011). Analysing conversation : an introduction to prosody. Basingstoke: Palgrave Macmillan.

Vincent, M. (2010). Music & Language Interrelations. (Doctoral thesis). University of Toronto.

Wennerstrom, A. (2001). The Music of Everyday Speech: Prosody and Discourse analysis. Oxford University Press.

Wishart, T. (1994). Audible design : a plain and easy introduction to practical sound composition. Orpheus the Pantomime.

Wishart, T. (1996). On sonic art. New York: Routledge.

Wishart, T. (2012). Sound Composition. Orpheus the Pantomime.

← Previous page: Introduction Next page: Work and results