Form, meaning and language

This is not meant to be a theoretical dissertation into matters of philosophy and aesthetics, but I think it nevertheless is interesting to touch upon the subjects of form, meaning and language in relation to the music developed in this project. Especially so since I am using the form of spoken language as the content for my music. In this case the form of speech refers to the prosodic structures that have been identified as significant in linguistics and conversation analysis: melodic intonation contours that express the course of utterances or cue turn-taking, change of key to signal change of subject, accents and stress to mark new or important information, convergence of tempo etc. These underlying musical structures of speech have been starting points for the musical explorations.

In relation to the significance of such structures, I think it is very interesting how infants seem to develop this kind of underlying melodic framework even before they learn a single word. Babytalk between infants and parents during the first months after birth seems to work as a fundamental coordination of vocal cues with actions, reactions, facial expressions, emotions, and intentions. This first step of language acquisition is apparently common in all cultures and involves not just learning simple vocal gestures, but complete melodic narrative structures that later provide the formal foundations for constructing utterances of speech (S. N. Malloch, 1999; S. Malloch & Trevarthen, 2009; Miall & Dissanayake, 2003; Snow & Balog, 2002). Such melodic messages in babytalk carry direct meaning by conveying the basic communicative intent of an utterance – if its function is attention, prohibition, approval, comfort or play. This is particularly apparent in infant directed speech where prosodic contours tend to be exaggerated, but is also clearly identifiable in interaction between adults (Fernald, 1989). This is interesting as it shows how this musical foundation we learn as infants constitutes a kind of underlying melodic vocabulary that we use and understand intuitively in speech, and that most likely provides a background for our perception and appreciation of music.

Language and meaning

The significance of intention has also been stressed in thoughts about language, like in the speech act theory developed by John L. Austin (Austin, 1962) and later extended by John Searle (Searle, 1969). This theory seeks to understand the meaning in speech not primarily from the semantic content of words, but from the performative function and intention of the utterance as an act – of what it is trying to achieve. This is also related to Wittgenstein’s ideas about language-games where the meaning in language depends on its actual use, and that philosophy cannot deduce some kind of essential meaning in language separate from that (Wittgenstein, 1953). According to Searle, the very act of speaking presupposes an intention, defined in his concepts of different illocutionary forces, such as declaring, demanding, ordering, warning, promising, inquiring, exclaiming, asserting etc. Seen in relation to the kind of intention conveyed by speech melodies in babytalk, these forces can be viewed as a kind of abstracted intentional meaning that probably can be a part of musical experience as well, even without the particular propositional content of spoken utterances.

This inherently social function of speech as action and interaction can also be related to the ideas of Mikhail Bakhtin. In his view, utterances are not only shaped by the intention of the sender, but by the dialogical relationship between sender, receiver and the social circumstances. This is the process whereby meaning is created, and why speech genres form an important part of the meaning of utterances (Bakhtin, 1986). Since speech genres involve the use of certain prosodic patterns akin to musical characteristics, this is also why I have found speech genres to be an interesting approach to exploring the musical content of speech. Interesting because this layer of meaning forms a common reference – a kind of shared social musical language that can have as much precision as the specific words used. The features of such genres are often well defined, and as research on automatic genre classification show, their acoustic characteristics can also be reliably identified by computer analysis (Obin, Dellwo, Lacheret, & Rodet, 2010; Obin, Lacheret-Dujour, Veaux, Rodet, & Simon, 2008).

This does not however mean that one can point at one specific meaning of a musical utterance. Music does not represent the kind of formal communication system that defines languages. The regularity, and the seemingly orderly harmonic, melodic and rhythmic “rules” observed in certain styles of music have nevertheless tempted many scholars to approach music as a formal language complete with grammar and syntax, from the highly developed rhetorical figures of the baroque music, Rousseau’s view of music as “impassioned speech” (Rousseau, 1781), Leonard Bernstein’s Harvard lectures (Bernstein, 1976), Jackendoff and Lerdahl’s musical adaption of Noam Chomsky’s generative grammar (Lerdahl & Jackendoff, 1983), and various approaches to music as semiotic systems of signs derived from the semiology of Roland Barthes etc. Raymon Monelle gives a thorough account of how such linguistic theories have been applied in musicology (Monelle, 1992). While many interesting perspectives can be gained from such perspectives, there is an underlying assumption that all music can be explained with one scientific method. In the search for universality, such approaches overlook the multitude of ways music (and speech) makes sense at the same time. And while languages are well-defined, pragmatic communication systems that can be fairly easily described by rules of grammar and syntax, music is an open-ended poetic mode of expression in the aesthetic domain that produces meaning also by challenging such rules.

Regarding linguistic ideas, I have found the creative semiotic approach of the musician, filmmaker and semiotician Theo van Leeuwen to be more fruitful: Instead of the descriptive “what is” of scientific explanation, this approach offers a creative “what if”, treating sounds as untapped semiotic resources, structures with many layers of potential meaning (Leeuwen, 1999). Such potential includes for example how sounds with similar proportions to a human breath (duration, dynamics etc.) easily can be perceived as an intentional and communicative utterance (in fact, our perception seems so overly hardwired for interpreting such patterns that we end up seeing faces in clouds and hear whispers in the wind as well). Further along this focus on communicative intent – how the characters of utterances, actions, movements and gestures are intuitively interpreted and empathically mirrored as signs for inner states, thoughts, emotions, intentions etc. This needs of course not be interpreted literally, as our infinite ability to create metaphors can easily make these signs into poetic images of something else. From the perspective of music psychology, John Sloboda has proposed that in the broadest sense, the perceptual background for experiencing the dynamic processes presented by music is our experience of the physical world in motion, and particularly the moving, living organism (Sloboda, 2012, p. 170). Not as mimicry, but on a deeper level how motions are initiated, experienced and mediated by a human agent. This could even be true in a more general sense for abstract thought as well, like the way we tend to use spatial metaphors when talking about ideas behind concepts, thinking something through, being on top of things, looking further, view from another angle, against a background etc.

Form and meaning

Regarding the relationship between content, context and concept, it seems clear that my approach to music and speech is formalistic. But that does not mean that I am not concerned with meaning. There is a popular myth that instrumental music cannot express anything specific beyond general emotions, but my experience as an improviser is that music certainly can make sense – in very particular and nuanced ways as well. With Bakhtin we could counter that words do not mean anything specific either, only how they are used. Which in turn can create meaning on several different planes, both in speech and music.

In linguistics it seems easy to make a clear distinction between the form and content of language. Such a division has also been common in thoughts about art, for example in the view by art critic Clement Greenberg, that form is the handle allowing content to be grasped (Kim-Cohen, 2009). That might sound easy enough, but according to philosopher Lars-Olof Ahlberg, this metaphor of form versus content can be misleading, as it is far from obvious what actually separates form from content in different art forms, or even within the same art form (Åhlberg, 2014).

In this work, I experienced that it quickly became very monotonous to listen to speech when the semantic meaning of the words – the content – was filtered out. At the small scale of phrases, such abstract speech sounds were still musically interesting, but at a larger timescale the uniform rate of events and the overall lack of diversity made it too monotonous to work as music.

Dealing with storytelling in theatre improvisation, Keith Johnstone has described the need to reincorporate elements to create coherence, otherwise it just becomes a meaningless sequence of events that can start and end anywhere (Johnstone, 1981). This is similar to how I have thought about improvisation in music. So it seemed that what was lacking was some kind of musical consequence, distinction or intended differentiation of the musical features. This is perhaps the kind of musical content I felt I needed to provide, to replace the semantic content that had been removed.

This can be related to the role of theme and variation in art, as discussed by Nelson Goodman in “Languages of Art” (Goodman, 1976). According to Goodman, the modification, elaboration, differentiation and transformation of motifs and patterns are processes of constructive search, and such progressive variation is a typical way of advancing knowledge. This seems to be especially true of how artworks explore the world, constructing their own formal languages through such processes of differentiation. In a more general sense, progressive variation is perhaps how knowledge is expanded in other fields as well, including in science. It can also describe how a topic might be explored in conversations or writing, but then these processes relate to the exploration of concepts, of thought ideas, while music is an organised exploration of sound ideas. Following this line of thought, improvised interplay can perhaps be viewed as the dialogical construction and exploration of such a formal language.

If I should attempt some kind of conclusion to these thoughts on meaning, it must be that music and musical utterances have no single meaning, but can convey countless meanings at the same time. This is however also the case with language, which is certainly not as precise as we like to think, and where meaning is just as often inferred from the context and intonation (the utterance “–apple?” might for example be an opening line, an inquiry about hunger, a nutrition advice, a reference to the laws of gravity, or to a computer company, or it can imply that you are a princess, or that I am a witch or serpent, or none or all of these at the same time). Not to mention that some words, like swallow or hide, refer to completely different things and concepts when used in different contexts. While some have seen this ambiguity of language as a flaw in an otherwise near-perfect communication system, it has been viewed in cognitive science as an absolutely necessary feature. Language would be far too cumbersome to use if one had to specify everything exactly and unambiguously all the time, like one have to do in computer programming languages. The same ambiguity makes it possible to say several things at the same time, like in puns or poetry. Like all utterances, gestures and actions performed by living (and imagined) beings, it is possible to interpret speech with a whole range of possible meanings. This, I think, must also be the case with music – it has a multitude of potential meanings, all at once.

As a final remark, it must be stated that the focus on speech in this project has not been an attempt to reduce the content or meaning of music to identify one underlying universal “explanation” of music. Approaches to meaning in music have often taken the form of universalistic generalisations, like Leonard Meyer’s “Emotion and Meaning in Music”, where meaning is viewed essentially as arising from tension and release relating to the expectations formed by learned styles (Meyer, 1956).

The idea in this project was rather to broaden the experience of both music and speech, by shedding light some interesting connections between these two universal human phenomena. One artistic aim was to see if it was possible to make music that could show both the musicality of speech as well as the language-like logic of music. I hope that this work also shows how music can make sense as a way of thinking, and, like speech, make sense as a way of being together.

← Previous page: Perception Next page: Resources


Austin, J. L. (1962). How to do things with words. Cambridge, Mass: Harvard University Press.

Bakhtin, M. M. (1986). The Problem of Speech Genres. In Speech Genres and Other Late Essays (pp. 60–102). Austin: University of Texas Press.

Bernstein, L. (1976). The Unanswered Question: Six Talks at Harvard. Cambridge, Mass: Harvard University Press.

Fernald, A. (1989). Intonation and Communicative Intent in Mothers’ Speech to Infants: Is the Melody the Message? Child Development, 60(6), 1497–1510.

Goodman, N. (1976). Languages of art: An approach to a theory of symbols. Indianapolis: Hackett.

Johnstone, K. (1981). Impro : improvisation and the theatre. London: Methuen.

Kim-Cohen, S. (Ed.). (2009). In the blink of an ear : towards a non-cochlear sonic art. New York: Continuum.

Leeuwen, T. van. (1999). Speech, Music, Sound. London: Macmillan Press.

Lerdahl, F., & Jackendoff, R. (1983). A generative theory of tonal music. Cambridge, Mass: MIT Press.

Malloch, S. N. (1999). Mothers and infants and communicative musicality. Musicae Scientiae, 3(1_suppl), 29–57.

Malloch, S., & Trevarthen, C. (2009). Musicality: Communicating the vitality and interests of life. In Communicative musicality: Exploring the basis of human companionship (pp. 1–11). Oxford: Oxford University Press.

Meyer, L. B. (1956). Emotion and meaning in music. Chicago: University of Chicago Press.

Miall, D. S., & Dissanayake, E. (2003). The poetics of babytalk. Human Nature, 14(4), 337–364.

Monelle, R. (1992). Linguistics and Semiotics in Music. Harwood Academic.

Obin, N., Dellwo, V., Lacheret, A., & Rodet, X. (2010). Expectations for Discourse Genre Identification: a Prosodic Study. In Interspeech-2010 (pp. 3070–3073). Retrieved from

Obin, N., Lacheret-Dujour, A., Veaux, C., Rodet, X., & Simon, A. C. (2008). A method for automatic and dynamic estimation of discourse genre typology with prosodic features. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 1204–1207).

Rousseau, J. J. (1781). Essai sur l’origine des langues, Ou il est parle de la Mélodie & de l’imitation Musicale. In Œuvres posthumes de J.J. Rousseau (le Pléiade, pp. 371–429). Genève.

Searle, J. R. (1969). Speech acts : an essay in the philosophy of language. Cambridge: Cambridge University Press.

Sloboda, J. (2012). Exploring the Musical Mind: Cognition, emotion, ability, function. Oxford: Oxford University Press.

Snow, D., & Balog, H. L. (2002). Do children produce the melody before the words? A review of developmental intonation research. Lingua, 112(12), 1025–1058.

Wittgenstein, L. 2001. (1953). Philosophical Investigations. Blackwell Publishing.

Åhlberg, L.-O. (2014). On Form and Content. In Notions of the Aesthetic and of Aesthetics : Essays on Art, Aesthetics, and Culture (pp. 123–141). Frankfurt: Peter Lang.

← Previous page: Perception Next page: Resources