Possibilities and limitations

The present instrument system is a result of a long chain of decisions based on both articulated and unarticulated aesthetic notions of what kind of music would be interesting to make, at every step imposing certain limitations and possibilities. It was designed to be quite flexible and allow for a wide range of musical ideas, but any instrument will have practical and aesthetic constraints that shape its affordances. This is an attempt at describing the musical possibilities and limitations of the system, both in terms of what kind of musical ideas it can and cannot realize, but also possibilities and limitations relating to musical form, interaction and instrumentation.


Though conceived as a specialized instrument for exploring the musical potential of speech gestures, the system has been designed to be able to realize a wide variety of musical ideas based on such gestures, from pure melodic passages, rhythmic imprints, dynamic envelopes, spectral shapes, collages of fragments, to sound textures and abstract timbre-based ideas very far removed from the gestural qualities of the speech sources used. The basic speech phrases can be radically transformed, either simplified into abstracted shapes, or ornamented into greatly more complex gestures by for instance inserting rhythmical subdivisions, arpeggiating through the partials of every syllable, assembling dense clouds of similar speech segments, played back at multiple times the original speed or paused in the middle of a vowel to dwell on the details of its spectral characteristics on the microscopic timescale. One of the core ideas is the notion of a continuum between speech on one hand and a conventional melodic music on the other. This is reflected in the ability to reproduce speech sources in a continuous field of transformation anywhere from the original free flowing and continuously changing speech sounds, to a highly abstracted rendering of the speech source in a stylized musical expression of conventional instruments with clear attacks, tempered pitches and quantized rhythms, not unlike the musically quite traditional accompaniments derived from spoken poems in the music of Paul Lansky.

Regarding form, it is also possible to pursue many different ways of developing musical ideas, from working with single soloistic ideas, supplementing a main motif with background and middle-ground accompaniments, to layering several voices and counter voices in multi-layered and multi-subject narratives, to churning round in cyclic repetitive structures, ending parts through gradual transitions or abrupt changes, or coming to a total halt with stretched-out static soundscapes, and any combination of the above.

Another aspect that also relates to form is the possibility of interaction and having the system react in an apparently responsive way to sound input. This allows for a completely different way of relating to the instrument and the speech sources, as it highlights the dynamics of dialogue and active interpretation as a topic in itself in the performance situation. Either in direct call-and-response exchanges, or providing responses as shadowing accompaniment, as automatically generated textures or repeating background patterns, or accumulating live recordings and reusing the input sound allowing one to develop ‘conversations with oneself’.

Finally, the system has possibilities for very different ways of producing sound through the ideas of instrumentation or orchestration, using a combination of both conventional loudspeakers, low fidelity radios and acoustic instrument/loudspeaker hybrids as sound sources. This allows for a thematic play with sound sources and sound realms, as well as complementing the electronic soundscape typical of digital musical instruments with the richness and spatial qualities of acoustic instrumental performances.


What kind of musical ideas that are not possible to create on this instrument is harder for me to define, as the whole instrument is designed according to the kinds of musical ideas I am interested in working with in this project. It is nevertheless possible to identify some practical limitations of the instrument system, mostly resulting from aesthetic decisions, that will affect the kind of musical ideas that are possible to realize. For instance, even though the playback of recordings is handled as granular synthesis, few of the common and idiomatic granular synthesis techniques are implemented, such as backward playback and random or fluctuating variation of pitch transposition, grain rate, grain size, grain clouds etc. Segments are also not analysed with any spectral descriptors, meaning that they cannot be used to create audio mosaics based on spectral characteristics. The decision to use syllables (vowels) as the lowest musical unit also imposes limitations, since it means that the system cannot make musical textures based on individual phonetic units. Considering that the instrument is more like a real-time interactive compositional aid or arrangement tool than a conventional gesture-to-sound instrument, it is also clear that it is not particularly suitable for synthesizing voice and speech directly based on expressive performer gestures.

Another constraint that is related to form and musical development is the fairly limited way the system incorporates interaction. It is basically unable to act on its own, and it can really only react in a direct response to input. This is largely due to it being conceived primarily as a solo instrument made for performance and not as an automatic accompaniment system that can function like a semi-autonomous dialogue partner. One limitation this imposes on the music is perhaps that it will be perceived as mostly expressing one musical subject, that of the performer who controls the overall aspects of musical development, even when individual details and seemingly polyphonic layers might be generated automatically by the system.

If we look beyond the explicit aims of facilitating improvisation in this project, the strict real-time mode of operation adapted in the system is another limitation. Performing with the instrument, it is quite hard to repeat something exactly the same way every time and generally no way of recording the output symbolically as a score or other exact representation. This makes it hard to use for sketching ideas for composing written music, and also hard to use the instrument for performing detailed compositions with exact notation.

Finally, there is an obvious artistic potential on the semantic level of speech which is not touched at all in this project and therefore not part of the system. There is no segmentation or recognition on the semantic level of words. One could easily imagine functions also dealing with semantic content and wider poetic use of words, like for instance in the Mask Mirror performance system developed by Alessandro Bosetti (Bosetti, n.d.), where recordings of words are classified according to categories like adjectives, places, times, etc. and then used by the performer to dialogically construct and explore semi-random poetic narratives. So even if these kinds of features were omitted by choice, it is clear that this choice severely limits any ideas where semantic meaning is important.


Bosetti, A. (n.d.). Mask Mirror. Retrieved June 12, 2018, from http://www.melgun.net/live-projects/mask-mirror/

← Previous page: Design and development Next page: Speech sources and Concept