<a name="article"></a><h3>1 Introduction</h3>
<p>Music is a temporal art: the sequencing of events in time is at its core. This

means that a listener can form expectations of what is to follow, and those

expectations may reflect every aspect of the music (pitch, timbre, loudness, event

timing) such that they may change not only as new events occur (<a href="#Jones1992">Jones, 1992</a>), 

but also as past information is integrated (<a href="#Bailes2013">Bailes <em>et al.</em>, 2013</a>). Concomitant 

with the musical flow and its accompanying expectations most listeners may perceive and

experience affect.</p>
<p>My purpose here is to use time series analysis (TSA) models of live-

performed musical flow and its cognition (<a href="#Dean2010">Dean &amp; Bailes, 2010</a>), to develop <em>BANG</em>, the beat and note generator, a live algorithm that generates further musical streams.

The Live Algorithms in Music network was created by Tim Blackwell and Michael

Young in 2004, who spearheaded this movement (<a href="#Blackwell2004">Blackwell &amp; Young, 2004</a>). Most 

approaches to generating musical streams are

consequent on input which have used Markov-chain models that primarily aim to generate

outputs that retain the conventions of the input system, be it tonality in music, or

syntacticality in text. <em>BANG</em> does not aim to retain conventions, and is thus widely

applicable; time series models, particularly multivariate models, seem not to have

been used previously in this generative context and are quite distinct from Markov

models. TSA recognises that streams of events such as those in music (<a href="#Beran2004">Beran, 2004</a>) or in most perceptions are autocorrelated: that is, a certain number of previous events

to predict (and may influence) the next few events to some degree and in a systematic

manner. Consequently the most common form of TSA is called autoregressive

modelling (AR), and may also use other “external” time series as predictors. Time

series of acoustic data are usually highly autoregressive and thus to some extent self-

predictive (e.g., <a href="#Dean2010">Dean &amp; Bailes, 2010</a>; <a href="#Hazan2009">Hazan <em>et al.</em>, 2009</a>; <a href="#Serra2012">Serra <em>et al.</em>, 2012</a>). A time series of acoustic intensity measures (now 

treated as external or “exogenous”) may also be useful in modelling listeners' perceptions (“endogenous”) 

of the accompanying structural change and expressed affect: more broadly, musical structure may aid

predicting perceived affect. AR models are a large and common univariate subset of

the broad field of time series analysis, which I will describe more fully in a later

section of the Introduction, where I also discuss somewhat more ecological

multivariate models.</p>
<p>A long-standing approach to understanding musical affect indeed focuses on

expectation. In Meyer’s original view (<a href="#Meyer1956">Meyer, 1956</a>), the fulfilment of an 

expectation might induce an impression of a positive progression, and vice versa, and these ideas

have been developed extensively (<a href="#Huron2006">Huron, 2006</a>). Since Meyer, two complementary

types of computational modelling approaches have been used to understand the

interfaces between music structure and affect: Markov chain modelling and its

application in information theoretic analysis (<a href="#Cope1991">Cope, 1991</a>; <a href="#Pearce2006">Pearce &amp; Wiggins, 2006</a>), and time series analysis. Some studies have begun to show how these 
    
approaches may complement each other (<a href="#Pearce2011">Pearce, 2011</a>; <a href="#Gingras2016">Gingras <em>et al.</em>, 2016</a>). Given their application to modelling both structure and affect, the 
    
approaches may be considered as potentially not only modelling music itself, but also music cognition. We 

aim to bring these two approaches together in the generation of music, and our purpose in the

present article is to develop a generative application of the time series modelling

approach. For Markovian models, Pachet’s <em>Continuator</em> is a well established approach

(<a href="#Pachet2003">Pachet, 2003</a>): Pachet and Roy view a Markov chain primarily as a local 

prediction mechanism: “items at locations early on in the sequence have no effect on the items at

locations further on” (<a href="#Pachet2013">Pachet &amp; Roy, 2013, p. 2</a>). We will compare <em>Continuator</em> with the present work in Section 6. There are numerous other data learning

algorithms, most reviewed in <a href="#Fernandez2013">Fernández &amp; Vico (2013)</a>, and some 

algorithms are newly attracting attention in event-based systems like music because of their power,

such as the extreme learning algorithm (<a href="#Tapson2015">Tapson <em>et al.</em>, 2015</a>).</p>
<p>The organisation of the article is as follows. In the remaining sections of the

Introduction, I describe briefly the background in information theoretic and more

extensively that in time series models of music, concluding with a description of the

objectives of <em>BANG</em> and some background in generative music systems. After a

description of a univariate implementation (Section 2), I proceed to a multivariate

system (Section 3). Section 4 considers transformations of the <em>BANG</em> output that a user may

apply, and Section 5 considers memory and autonomy. Section 6 addresses

creative and generative uses of <em>BANG</em>, and its future development and a path towards

evaluation.</p>
<h4>1.1 Information Theoretic Models of Music</h4>
<p>Information theoretic analysis stems from the classic work of Shannon, and

considers music, for example, as a discrete alphabet of pitches, whose sequencing is

determined systematically by a music creator, and may provide statistical

predictabilities to a listener. Modelling approaches, notably those encoded in IDyOM

(the information dynamics of music model: see <a href="#Pearce2005">Pearce, 2005</a>; <a href="#Pearce2012">Pearce &amp; Wiggins, 2012</a>)

treat note sequences as variable “order” Markov chains, and allow prediction of future

sequences on the basis of both current and past ones, treated as symbolic events.

“Order” refers to the number of immediately preceding events used in the model; note

that musical events are not usually evenly spaced in time, an issue to which we return

in the discussion. Multiple “viewpoints” may be used (<a href="#Conklin2013">Conklin, 2013</a>; <a href="#Conklin1995">Conklin &amp; Witten, 1995</a>), where pitch, velocity and duration (i.e., rhythmic) 

information streams, and derived information streams (such as pitch interval, the distance between 

pitches) can all be used for mutual prediction. An information theoretic approach can take

account both of the current piece, and also of the larger corpus of music in which it is

embedded, as part of its process of prediction. In spite of this, with IDyOM, the

longest order of Markov chain which is usually found useful in models is about 10

events, so that it provides a combination of local and corpus-wide modelling. What it

can predict at any point is the ongoing sequence.</p>
<p>While Markovian approaches have been developed so far primarily for

symbolic (that is note-based) music, such as that of the piano, they are extensible to

sound-based music. By sound-based music (<a href="#Landy2009">Landy, 2009</a>) we mean music 

that does not necessarily emphasise pitch and regular rhythms, nor even necessarily use

instrumental sounds (such as those of the piano or the violin). An information

theoretic approach to such music can for example use sequences of MFCC (mel-

frequency cepstral coefficient) time-windowed analyses of the spectral content, either

directly for Markovian analysis, or as the basis for discovery of recurrent MFCC-

sequence patterns that can be grouped as symbolic representations for treatment in a

similar way to pitch in note-based music. <em>PyOracle</em> is an open source Python tool for

this, and <em>OMAX</em> a commercial system, developed by Dubnov, together with colleagues

at IRCAM with large resources (<a href="#Bloch2008">Bloch <em>et al.</em>, 2008</a>; <a href="#Surges2013">Surges 

&amp; Dubnov, 2013</a>). These have been used both on note- and sound-based sequences for 

generative purposes. In all approaches, controlled temporal variation and imprecision occur (an

event in a notated piece shown as representing one unit of time may commonly

occupy anything between 0.75 and 1.5 units), and grouping needs to be able to allow

for this, a confluence of the continuous variable, time, and discrete event parameters

in the case of note-based music. In using this approach to generate music, simulation

(that is using a formed model to generate events) or analysis and response (that is,

using incoming data as the sole or a contributing source for ongoing modelling and

prediction or simulation) may be useful. IDyOM has been used generatively only to a

limited extent, to make (tonal) music (<a href="#Wiggins2009">Wiggins <em>et al.</em>, 2009</a>).</p>
<h4>1.2 Time Series Models of Music</h4>
<p>Autoregressive time series analysis (hereafter referred to as TSA) is important

in many fields from oceanography and ecology to cognitive science (<a href="#Box1994">Box <em>et al.</em>, 1994</a>; <a href="#Cryer2008">Cryer &amp; Chan, 2008</a>). In all these fields, serial correlations (termed 

autocorrelation) between events occur, in that an event is to some degree influenced and/or predicted

by the preceding event(s). In some cases, such as the motor activity of reaching for an

object, it is an obvious necessity that the next discernible position of the grasping

hand will be close to the last, which thus becomes both a strong predictor and

influence. In other cases, the expectation of such serial correlation is less obvious, and

its neglect has caused (and does cause) many serious errors of interpretation in data

analysis (<a href="#Bailes2015">Bailes <em>et al.</em>, 2015</a>). An autocorrelated series of data is not an 

independent set, and so most of the conventional statistical approaches to analysis (which assume

normal distributions of independent variables) are not meaningfully applicable. For

example, even a recurrent sequence of simple cognitive tests may reveal that the

result of one is influenced by those of preceding events (<a href="#Dyson2010">Dyson &amp; Quinlan, 2010</a>), and lack of attention to autocorrelation has been noted as a huge problem contributing to

the unreliability of many results in neuroscience (<a href="#Button2013">Button <em>et al.</em>, 2013</a>; <a href="#Carp2012">Carp, 2012</a>).</p>
<p>So one essence of time series analysis is discerning the nature of the

autocorrelations, and incorporating this into the development of autoregressive

models of the process. TSA has a long history in music as last reviewed extensively in

2004 (<a href="#Beran2004">Beran, 2004</a>). <a href="#Beran2004">Beran (2004)</a> especially discusses 

applications to the analysis of symbolic musical structure, such as note pitches, but also event timings. 
Early uses included identification of beats and rhythmic structure, evidenced by autocorrelation

(<a href="#Brown1993">Brown, 1993</a>), and many audio analyses involve elements of it (e.g., linear 

predictive coding). <a href="#Eck2006">Eck, (2006)</a> showed how an autocorrelation phase matrix could 

be useful for metrical and temporal structure analysis. It is perhaps helpful to note that

a temporal model of a signal, such as that of TSA, has a precise counterpart in

spectral models; and related to this, some acoustic filters have time series properties,

though only a few, such as the ARMA and Kalman filters (e.g., <a href="#Deng2015">Deng &amp; Leung,

2015</a>) are data-driven models, rather than fixed filters. <a href="#Harvey1990">Harvey (1990)</a> provides a clear in-depth explanation of these relationships.</p>
<p>TSA is also used in music information retrieval, often as a component of

systems for tasks such as detection of genre and instrument classification. For

example, in a study on the detection of cover songs in popular music (<a href="#Serra2012">Serra <em>et al.</em>, 

2012</a>), it was found that AR and threshold AR (in which different AR models apply in

different parameter ranges), could predict time series of acoustic and musical

descriptors, though modestly in most cases such as tonal descriptors. Prediction was

best for rhythm descriptors, probably because of the longer time windows over which

these are necessarily determined. The purpose of this study was not to test hypotheses

about the “process underlying music signals” (p. 21) but rather to develop models of the

descriptors on a piece-by-piece basis so that they could be used for cross-prediction

(the model of one piece predicting the descriptor series of another). The degree to

which this was successful was then harnessed successfully as a measure to detect the

similarities the covers share with their originals, notably in tonal descriptors. It should

be noted that one common acoustic analysis trait, often seen in MIR, is to measure

descriptors with overlapping windows, and then combine a group of window

measurements to form an element of the descriptor series studied: for TSA this has

dangers, as the overlap itself creates a degree of autocorrelation, since successive

measures share some data. Care needs to be taken with this.</p>
<p>Time series models can take account both of the autocorrelations and of so-

called “exogenous” factors in music perception, those external feature streams which

play a role in modelling and probably influencing listener perceptions. For a music

listener perceiving the expressed structural change and affect of a piece to which they

listen, predictive exogenous factors include the temporal pattern of pitch, timbre, and

loudness. We showed that perceived change and affect are substantially predicted by

temporal sequences of acoustic intensity (<a href="#Bailes2012">Bailes &amp; Dean, 2012</a>; <a href="#Dean2014b">Dean <em>et al.</em>, 2014b</a>; <a href="#Dean2014c">Dean <em>et al.</em>, 2014</a>), and translated this 

into a successful causal intervention experiment (<a href="#Dean2011">Dean <em>et al.</em>, 2011</a>). 

Subsequently we have obtained predictive models of

affect utilising also continuous measures of engagement, perceived effort, and sonic

source diversity (e.g., <a href="#Olsen2016">Olsen &amp; Dean, 2016</a>). Some of the more powerful and

informative models within this work are obtained using multivariate Vector

Autoregression (VAR), in which multiple outcome variables are considered, and they

may be mutually predictive (particularly in the case of perceptual features). For this

reason, we develop multivariate generative models later in this article.</p>
<p>Fig. 1 shows a simple example of univariate TSA, modelling the pitch

structure of part of a solo keyboard improvisation (the first 200 events). A

purely autoregressive model provides a reasonable fit (squared residuals are less than

0.4% of squared data values). Note that a precise fit of a model to a pre-existent time

series depends on knowledge of the whole series: it is not a prediction. If a well fitting

model is made of the first 90% of a time series, and then used to predict the remaining

10% without any further input data on which to autoregress, then its fit will be far

worse.</p>
<figure>
<img alt="AR modelling" class="centerImg" src="/media/jcms_old/9/Figure1.png" width="100%"/>
<figcaption>Figure 1 – An AR model of a segment of free improvisation.</figcaption>
</figure>
<p>The improvisation was amongst those by professional improvisers analysed in

published (<a href="#Dean2014a">Dean <em>et al.</em>, 2014a</a>; <a href="#Dean2013">Dean &amp; Bailes, 2013</a>) and 

forthcoming work, and was the first item performed by the musician (prior to any experimental 

instructions, which applied to subsequent pieces, as discussed there). The performer used a

Yamaha Grand Piano with MIDI-attachment (the Disklavier), as detailed previously.

The first 200 events of this improvisation are displayed as the “data” (solid line), and

the pitch series is provided within the supplementary material. An AR model derived

by our standard code is graphed as as the “model fit” (dotted line). The values

displayed are MIDI note numbers (where 60 corresponds to middle C of the piano).

The model was AR4 (four autoregressive lags), and required first differencing. The

statistical fit shown is that to the undifferenced original data (dotted line). It is

apparent that the fit has a more limited range than the data, but the squared residuals

are only about 0.4% of the squared data points, so the fit profile is fair.</p>
<p>Our purpose here is not to obtain a best fit (smallest residuals) model of the

past data, which is commonly a purpose in MIR analyses, though not in that described

above (<a href="#Serra2012">Serra <em>et al.</em>, 2012</a>). Nor is our purpose to obtain the optimised (lowest

information criterion) model. It is rather to obtain a representation of the process (in

this case, what creates the pitch progression) at work as fast as possible. So the

process is represented in the model as the autoregressive lags, and in later cases lags

of other predictors, together with corresponding coefficients. In some such models

and often in IDyOM, models can be improved by adding derived parameters such as

pitch class (recognising that pitches an octave apart, which have a frequency ratio of

1:2, bear a special relation and belong to the same pitch class). We do not pursue this

here because the aim is an approach that is neutral to tonality (loosely, the use of key 

centres) and to the choice of tuning system: to operate both with 12-tone equal

temperament (in which the piano and the orchestra are normally tuned) and in other

microtonal systems were there may not even be octave relationships (<a href="#Dean2009">Dean, 2009</a>).

Other work (<a href="#Gingras2016">Gingras <em>et al.</em>, 2016</a>) shows that the two approaches (<em>IDyOM</em> and TSA) can contribute mutually in modelling aspects of performance and reception. While the

IDyOM approach, as mentioned, is primarily a local model of upcoming events in the

music, though influenced by statistical “knowledge” of all the preceding events and

potentially of a corpus of related music, the TSA approach constitutes solely a model

of the piece as a whole, which it assumes represents a steady state single process. This

means that the coefficients remain constant and the selection of predictor variables in

the model remains unchanged throughout (though of course the variable values

change).</p>
<p>As the article proceeds, we focus first on univariate time series models: models

which predict a single outcome variable, regardless of how many other input variables

may potentially complement the autoregressive component. For example, a pitch

sequence may be modelled and simulated either on the basis solely of autoregression

(as illustrated in Fig. 1) or more plausibly, on the basis also of preceding exogenous

variables such as dynamics and rhythms (noting that these too are almost always

highly autoregressive). The article subsequently addresses such multivariate time

series analyses and simulations, which are counterparts to the multiple-viewpoint

approach of IDyOM.</p>
<h4>1.3 The Purpose of <em>BANG</em>, the Beat and Note Generator, and some Antecedent

Generative Systems</h4>
<p>The application is aimed to provide facilities for use in live performance, in

other words, to be a live algorithm. This term is used to delineate a performance

framework in which running algorithms permit user interaction, in three forms:

a)parameter adjustment within pre-formed models; b)continuous data inflow to an

analytical module resulting not only in new output (cf. classics such as <em>Cypher</em>

(<a href="#Rowe2001">Rowe, 2001</a>) and <em>Voyager</em> (<a href="#Lewis1993">Lewis, 1993</a>; <a href="#Lewis1999">1999</a>); but also c) creating new model structures on the fly, as with machine learning 

approaches, neural nets in general, and

the present case in particular. The normal performance mode is for a keyboard

improviser to play note data into <em>BANG</em> and simultaneously interact with its interface.

The live algorithm constructed in this article (<em>BANG</em>) is also capable of functioning

autonomously, that is, without requiring any live input whatsoever. Nor is real-time

(essentially instant) computational generation required, because this involves response

to very short chunks of input events, whereas <em>BANG</em> will model and generate larger

scale musical processes.</p>
<p>In his influential book on interactive music systems (<a href="#Rowe1993">Rowe, 1993</a>), Rowe uses

three dimensions to provide a useful classification of interactive systems. The first

separates <em>score-driven</em> from <em>performance-driven</em> systems. The second 

distinguishes <em>transformative</em>, <em>generative</em> or <em>sequenced</em> methods. And the third 

describes <em>instrument</em> and <em>player paradigms</em>. The first two dimensions are fairly self 

explanatory, while the third may require slight elaboration. The <em>instrument</em> approach provides an 

interface that interprets performer gestures; these are expected to be non-audio gestures, in that

Rowe indicates that a single performer would produce what seemed to be a solo using

the system. The <em>player</em> on the other hand tries “to construct an artificial player … [that]

may vary the degree to which it follows the lead of a human partner”. In its normal

usage with a single keyboard player also controlling the interface, <em>BANG</em> falls into

the <em>performance-driven</em> and <em>player</em> categories, but hybridises the <em>transformative</em> and <em>generative</em> methods. Rowe analyses several other interesting systems, 

amongst which is an early form of what is now known as <em>Voyager</em>. In one of several interesting

articles discussing and analysing this program, its author George Lewis explains its

“non-hierarchical” interactive environment for improvisation (<a href="#Lewis2000">Lewis, 2000</a>) 

and its embodiment of “African-American aesthetics and musical practices” (p. 33). A core of

this is the concept of multidominance, which includes treating multiple rhythms,

metres, melodies, timbres, tonalities as co-dominant rather than hierarchical. This is in

part reflected in the structure of the software, in part in its use in performance. Using

ensembles of instruments, 15 melody algorithms, and probabilistic control of most

elements which is generally reset at intervals between 5 and 7 seconds, it produces

“multiple parallel streams of music” (p. 34). It pays attention to input information, and

may imitate, oppose or ignore. It uses this information in both an immediate time

frame, and one smoothed or averaged over a “mid-level” duration (p. 35). Voyager is a

complex and powerful system, designed to avoid “uniformity” (p. 36).</p>
<p>A single instance of <em>BANG</em> is much simpler, but largely deals in the details of

longer periods of input information. But it can produce “beats”, short repetitive

rhythmic patterns like some used in electronic dance music (EDM). This occurs when

the input and corresponding output sequences are short, and then used repeatedly (and

sounded with appropriate midi instruments). It is interesting that traditionally, beats

(and more so glitches) have sometimes been used while hardly metrical, in that they

display little internal repetition to emphasise the overall periodic repetition of the

whole beat (<a href="#Kelly2009">Kelly, 2009</a>). In common with the ethos of <em>BANG</em>, these do 

not necessarily retain normal conventions, in this case, that of hierarchical isochronic

meters. Putting this another way, strong metricality not only requires a repeating

periodic structure, but also substructures which have durations that are either all the

same, or comprise multiples of a common duration, thus creating at least two

hierarchical layers. In sum, the concern in <em>BANG</em> is flexible computational generative

music making with relatively few stylistic constraints. The multiple strands of

information it outputs show modest multidominance, in Lewis’s terminology, and this

can be extended by use of multiple instances and co-improvisors.</p>
<p><a href="#Boden2009">Boden &amp; Edmonds (2009)</a> have elaborated on the

question “what is generative art?” They provide a definitional taxonomy of the field,

and I would like to indicate the intended position of the present work within it. Of the

11 categories they enumerate, <em>BANG</em>, the beat and note generator I describe below,

belongs to the first six, centrally category 5, “G-art” (generative art) but also category 6,

in which “CG-art is produced by leaving a computer program to run by itself, with

minimal or zero interference from a human being”. <em>BANG</em> can make “CG-art” but it

does so by use of either pre-stored human input, or random input. In Section 6, we consider the 

respects in which <em>BANG</em> provides computational creativity

so far. <em>BANG</em> seems to be the first system to significantly exploit multivariate TSA

for live generative music-making.</p>
<h3>2 Basic Implementation of the Beat and Note Generator (<em>BANG</em>): Univariate

Models</h3>
<p>The digital creativity design task undertaken here is now apparent: it requires

a) an input module that can receive a musical stream from a keyboard or other MIDI-

note-based performer; b) a module which can analyse appropriate large segments of

this stream, creating a time series model, and then use the time series model by

simulation and potentially further transformation to create a new musical stream; c)

an interface for user interaction in controlling parameters of the creation and/or

realisation of that new stream; and finally d), an output module which can create the

sounded musical counterpart to the resultant musical information. Note that the

MIDI-performer may play silently (in effect solely offer source materials) or audibly

(i.e., directly participate in the musical dialogue with <em>BANG</em>). Besides a–d, <em>BANG</em>

also offers: e) random sequence generation facilities (which can dispense with the live

performer); f) preformed time series models which can themselves be used without

further pitch sequence input, but which are based on ecologically valid models

observed in performance; and g) sequence stores (memory), so that preformed or

previously performed sequences can be used as the basis for live generation. The

random sequences may be obtained in <em>MAX</em> or by using any of the wide range of

statistical distributions available in R. Classic music generative systems operating live

or less frequently in real-time, like <em>Cypher</em> and <em>Voyager</em> share a similar general

framework, but the unusual feature in <em>BANG</em> is the use of time series modelling.</p>
<p>There is no single computational software platform which combines all our

required features, thus two platforms are used, each highly developed towards

efficient fulfilment of the functions they serve. R, the open access statistical platform

with probably the largest library of coded statistical and analytical tools available,

serves the time series analysis and simulation part. No other platform can compete

with it in this respect: Lisp (in which IDyOM was originally written) and C are poorly

supported with statistical code, and <em>Python</em> only moderately in comparison with R;

<em>Matlab</em> is well supported but not comparably. <em>MAXMSP</em> serves the other three

functions, and is well developed for this. It is object-oriented and has a

comprehensible graphic user interface, and is commercially maintained and widely

used. Alternatively Pd, its open source counterpart could be used, and for those

preferring line-coding, <em>XSuperCollider</em>.</p>
<p>Thus communication between <em>MAX</em> and R is required, and this is

implemented using simple computer sockets. A socket is an address-specified

connection between two computational threads, which may be on a single machine, or

distributed on several. We use a TCP socket which is capable of stream delivery.

Musical events are received by <em>MAX</em>, and stored as indexed collections of pitch, key

velocity, note duration, and event inter-onset time in a <em>MAX</em> “coll” object. A coll is

simply a collection of data with indices that permit the separate storage and retrieval

of individual items in the collection. The coll object corresponding to a particular

musical stream is continuously replenished, and initially a 200 event sequence is

chosen as its capacity. At any time, the data in this object can be sent by socket for

analysis on an R “server” present on the same (or another) computer. In R, there are

several alternative organisations of the univariate analyses available. We will

describe the simplest, most readily comprehensible organisation, and then summarise

some of the more complex forms already built, or potentially useful.</p>
<p>In the simplest implementation, each of the four 200-event streams of musical

information, pitch, velocity, note duration, and inter-onset interval (IOI), are each

modelled separately, and four independent model simulations are returned to <em>MAX</em>,

and reassembled into a single event stream combining the four components (this is

somewhat similar to the mode of operation of the <em>Continuator</em>). These models are thus

purely autoregressive: no additional factors are considered. Fig. 1 already

illustrates the modelling of the pitch sequence of a free improvisation, which happens

to be post-tonal, and non-metrical. The piece was performed on a Yamaha Grand

Piano with MIDI-attachment (the Disklavier).</p>
<p>The models are obtained by selection amongst candidates using the R function

auto.arima within Hyndman’s Forecast package (<a href="#Hyndman2011">Hyndman, 2011</a>). Automated

stepwise model selection (chosen for speed) is based on minimising the Bayesian

Information Criterion (BIC), and not simply on <em>MAX</em>imising the degree of fit. The use

of information criteria such as BIC has been well reviewed (<a href="#Lewandowsky2011">Lewandowsky &amp; 

Farrell, 2011</a>). This information criterion penalises strongly for increasing model complexity,

and thus helps to avoid overfitting, which otherwise tends to reduce the degree to

which a model can fit new or related data sets. Model selection is based on the search

of a space of permitted models: here autoregressive terms up to order 5 are permitted

(because such orders are common when irregular rhythms – determined by inter-onset

interval, IOI – are performed, but higher orders have not been detected (<a href="#Launay2013">Launay <em>et al.</em>, 2013</a>). Lower orders are often selected, and in the case of rhythms where the pattern

is fundamentally repetitive and isochronic (based on units of roughly constant length)

order 1 may suffice (<a href="#Wing2014">Wing <em>et al.</em>, 2014</a>), though this seems not to have been

exhaustively investigated.</p>
<p>We choose pure AR models for two reasons. One, that alternative moving

average components can be reformulated as AR terms, and quite often introduce

issues of model identifiability. And two, that our simplifying decision reduces the

search space and hence the time taken for model selection: <em>BANG</em> is used in live

performance, and so its model is made available very soon after the performed data

arrives. One step of differencing, which may be required to make the input data series

statistically stationary, is permitted. A differenced series is simply one whose values

correspond to the difference between successive members of the original series, and

which is thus one member shorter. As part of inducing stationarity, differencing also

removes long term (low frequency) patterns, somewhat akin to high pass filtering

(indeed a highly restricted form on non-ideal high pass filter can achieve precise

differencing). Stationarity basically implies that the autocorrelations between events

any fixed number of steps apart are constant across the series, which also has a

constant mean and variance (technical details of TSA are elaborated in depth in

<a href="#Hamilton1994">Hamilton (1994)</a>). In many circumstances, including performed series which 

show a gradual change in a fixed direction (e.g., generally moving up the keyboard or

gradually getting louder), such first differencing is necessary. The resultant model of

the differenced variable can still provide fits for the undifferenced data at its original

length (as in Fig. 1). The models are not allowed to include drift (that is trend),

because if it is originally present, differencing normally suffices to remove it from the

stream being modelled.</p>
<p>Once R has defined the selected model, it can be used to simulate a new data

series. If the model has been derived from a differenced input series, then it is

reconstituted into an output which also corresponds to an undifferenced data series,

like the input. So the model creates (by simulation) an output which itself contains the

same autoregressive features as the model itself: this is a key part of the structural

continuity (with concomitant diversity and change) that <em>BANG</em> creates. At this point

there is a choice for the user amongst a range of simulation options. The default here

is to set the mean and s.d. of the simulation to those of the input set (but the user may

change this). The range of values obtained from the simulation may still breach the

limits of the usable musical parameter values: 20–108 for MIDI pitch, covering the

conventional piano keyboard range and the majority of the audible range as used in

music; 1–127 for MIDI velocity, though velocities below about 20 are commonly

virtually inaudible; durations from 75–2000 msec; and IOIs from either 5–30msec or

75–2000msec (more than about 13 notes or chords per second is generally too much

of an auditory blur to be very useful, and at say 20 notes/second may sound more like

a single “brrr”). IOIs of 5–30msec are perceived as simultaneous, i.e., forming a chord

rather than a single note (<a href="#Dean2014a">Dean <em>et al.</em>, 2014a</a>; <a href="#Pressing1978">Pressing, 
1987</a>), so simulation IOI values

&gt;30&lt;75msec are shifted to 5msec. Normally I choose simply to shift the simulated

values overall into the appropriate range, and only rescale them if they have a greater

range than specified. Thus simulation streams which are narrow in range, responding

often to performance elements which are equally narrow, are normally maintained as

narrow (as in Fig. 1), and vice versa. Any of these choices might be varied by the

user: for example, the simulation s.d. could be reset to an arbitrary parameter or to

one calculated from any aspect of the data passing (or which has passed since startup)

through the system. Fig. 2 illustrates a sequence of three successive pitch

simulations from a single model, that of Fig. 1, to reveal that diversity is generated,

even without further transformations.</p>
<figure>
<img alt="Comparing Pitch Data" class="centerImg" src="/media/jcms_old/9/Figure2.png" width="100%"/>
<figcaption>Figure 2 – Three successive simulations of Pitch Series from an AR model.</figcaption>
</figure>
<p>The model of the input pitch series from Fig. 1 is shown here again as the solid line

in the top graph. The remaining three lines are successive simulations from the model,

illustrating the generation of diversity in comparison with the input: both forms of

mirroring and profile changes across segments are apparent.</p>
<h3>3 Multivariate Models within <em>BANG</em></h3>
<p>Modelling note sequence without regard for the other parameters of the stream,

or of duration without regard for IOI, may contribute towards a form of

multidominance (as delineated above) but is a limited approach if what is sought in the

model accuracy which permits imitation and transformation of the overall process in

play. On the other hand, as found with the approach just described, if what is sought is

a model from which we can generate diversity by simulation while maintaining some

pattern similarities with the input material, then the univariate approach has value.

Given this, I also developed a multivariate approach within <em>BANG</em>. As noted recently,

continuous-valued vector grammars have hardly, if at all, been applied to music

(<a href="#Rohrmeier2015">Rohrmeier <em>et al.</em>, 2015</a>). Vector autoregression with continuous 

values, which we have used extensively within models of perception of musical affect, constitutes the

TSA counterpart to such vector grammars, and so it is interesting and novel to apply it

here. The temporal variables are continuous in performed value, and pitch and

velocity are treated as continuous variables and then quantised.</p>
<p>Therefore, instead of modelling the four musical parameters independently and

recombining the results, as above, joint models of all four are made by Vector

AutoRegression (VAR), using the MTS (multivariate time series) R package (<a href="#Tsay2013">Tsay,

2013</a>). MTS can treat each musical stream as potentially a predictor of each other (an

“endogenous” variable in statistical terminology): prediction can be bidirectional, but

model selection will indicate which directions of influence are important (showing

significant coefficients) and which not. It is common in VAR to consider issues of

stationarity, as described above for AR. If one of the endogenous (to be modelled)

variables is not stationary, then all variables are brought to stationarity by the same

mechanism prior to modelling: usually by differencing. But there is a major debate as

to whether this is necessary or even desirable within VAR (e.g., <a href="#Sims1988">Sims, 1988</a>), and

since model output quality can be assessed, it is not a required step and it is not

included here. The four musical parameters pitch, velocity, duration and ioi are all

endogenous variables in a VAR, but just as there may be ARX models, so VARX

models may use eXternal predictors: for example, a desired overall loudness profile,

or perceptual affect profile, could be represented as a variable that impacts on a

VARX output simulation.</p>
<p>The first step of VAR analysis in R is to determine the BIC-optimised

regressive order, so that only variable lags up to that order are included in the joint

VAR analysis to follow. Most commonly the BIC-selected order is 1, and

occasionally it is 2. Given the desired constraints on analysis time within our

algorithm, we then seek simply the removal of any uniformly unnecessary higher

orders of the predictors, all of which remain endogenous (that is potentially mutually

influential). It is additionally feasible in such modelling, having determined for

example that variable X influences Y, but vice versa is not the case, to define (by

providing a constraint) that the coefficient on Y for the contained model of X is 0,

thereby effectively removing one predictor parameter, which no longer has to be

estimated. This was considered a priori not a worthwhile investment of computational

time in balancing considerations of minimising BIC vs. saving CPU time, and

subsequent observations supported this.</p>
<p>The VAR analysis in Tsay’s MTS is the step which consumes much time, and

not the subsequent model simulation. For an order 1 VAR (that is, one with one

autoregressive lag of each variable), there are 20 parameters to be determined in the

MTS function used (VARMA); for order 2 there are 36 parameters, and

correspondingly the modelling commonly takes about twice as long. As with AR

above, we do not use MA components in VAR (i.e., the corresponding parameters in

the models are set to 0). The highest order initially considered was 4, to be

coherent with the chosen maximum of 5 for the AR models, and given that VAR

virtually always selects lower orders. A VAR of order 4 has 68 parameters, and

commonly took almost ten times as long as order 1. A much faster version of the

VARMA function is available within MTS, VARMACpp (which in places directly

calls C code instead of using R code). We arbitrarily determined that for live

performance a model and its simulation had to be returned to <em>MAX</em> within 30 seconds

of request and hence constrained the order of the VAR to be 3 or lower, and used

VARMACpp. Although arbitrary, 30 seconds is reasonable given that most musical

phrases are complete within such a time period. The time required can be reduced

further, at the cost of slightly loss of flexibility in the models, by using the basic

multivariate autoregressive Yule-Walker function in R core’s stats package (ar.yw)

instead of VARMACpp from MTS. Then simulation (which is fast) from the resultant

model can still be done with MTS functions.</p>
<p>As with the univariate AR model simulations, a range of options apply before

and during VARMA simulation: many can be applied within the covariance matrix

upon which the simulation operates. As with AR, the resultant simulation again

reflects the autoregressive but now also the inter-variable relationships of the model

derived from the input, but transforms them further.</p>
<p>For illustration of the modelling, Fig. 3 shows a joint VAR model of all four

variables (p,v,d,ioi) within the improvisation performance analysed already, and

compares each with the original data. All variables are highly autoregressive, as

expected. The overall model preserves the patterns of all four features, and provides a

good model of duration and ioi, which turn out on inspection of the model to be

mutually influential. It provides a fair but poorer model of pitch than the AR model

described above (squared residuals are 0.8% of squared data values), and a fair model

of velocity. We do not require that such a model will necessarily be an improvement

in fit for any individual data component over an AR model, but rather that it will more

fairly represent the joint and mutual processes of the four parameters, and hence be

useful for simulation, which duly follows, with some analogous options to those

described for the univariate simulations. Note again that by design the R script does

not permit differencing prior to the VAR modelling, and so this can sometimes limit

precision in any case.</p>
<figure>
<img alt="VAR Modelling" class="centerImg" src="/media/jcms_old/9/Figure3.png" width="100%"/>
<figcaption>Figure 3 – Vector Autoregressive (VAR) modelling of the free improvisation segment.</figcaption>
</figure>
<p>Here all four components of the input 200 event time series are jointly

analysed, as described in the text. Each subsection of the graph compares the original

input (solid line) with the model (dotted). It is apparent that the temporal values are

better modelled than the pitch and key velocity (both shown as MIDI numbers, where

velocity can run from 0–127). Note that correspondingly the model was not derived

for <em>MAX</em>imum precision of fit, and differencing (found necessary for the AR model of

Fig. 1, was not permitted).</p>
<p>Fig. 4 shows an example of a VAR simulation, in comparison with the

original data input streams. In this particular case, the divergence of the simulation

series from those of the data is clear, but particularly for duration and IOI.</p>
<figure>
<img alt="VAR Simulation." class="centerImg" src="/media/jcms_old/9/Figure4.png" width="100%"/>
<figcaption>Figure 4 – VAR simulation.</figcaption>
</figure>
<p>The VAR model of Fig. 3 is used to generate a new VAR set of series by

simulation. Each panel shows the input data (solid line) and the simulated output

(dotted), again showing the generation of diversity, and illustrating that even the

better modelled temporal variables (duration and IOI) can readily diversify during the

simulation.</p>
<h3>4 Other Transformations of the R Model Output during its Realisation by <em>MAX</em></h3>
<p>Once the R simulation has been obtained, a wide range of further

transformations may be effected, as coded or interactively chosen by the user, and this

is done most conveniently in the <em>MAX</em> patch. An exhaustive discussion is not needed

here, but some key features can be brought out. Perhaps the most obvious possibilities

are choosing silent periods in the operation of <em>BANG</em> (no output), and controlling the

range over which its parameters function within the <em>MAX</em> patch. For example, a

performer may choose to have considerable gaps between successive realisations of a

generated 200 event series (which is set by default to recycle) and/or between the end

of a realisation and the initiation of a new model and its realisation. User interface

objects permit (or in some cases could permit) such control, on a continuous-valued

basis, which may be randomised. Similarly, there are control parameters for the range

of velocities sounded, or the range of IOIs. Preformed and partially randomised

velocity profiles, operating on the simulation output, can be valuable (and have been

used in the previous Serial Generator algorithm (<a href="#Dean2014">Dean, 2014</a>).</p>
<h3>5 Autonomy, Memory and Automaticity within <em>BANG</em></h3>
<p>There are two autonomous generators within <em>BANG</em> (one in <em>MAX</em>, based on

randomised generation; and one in R, based on AR simulation using preformed

models and coefficients, or using random outputs from a range of statistical distributions). Within 

limits, the (V)AR simulation values can themselves be randomised, the limits being the requirement of the 
simulation algorithm that the model in question be stationary.</p>
<p>Memory functions are also contained within <em>BANG</em>. The <em>MAX</em> patch contains

a store of pre-formed MIDI-performances, which can be played in to the system

(audibly or silently), and can then form part of the current input storage in a “coll”

object. In turn, the input coll is cumulated in an “input memory” coll, which currently

accumulates up to 2000 events. Similarly, the data from the coll in which TSA-

generated event series are temporarily stored for realisation are also accumulated in a

“generated memory” coll of up to 2000 events. These memory storages could be used

at any time for any of <em>BANG</em>’s purposes, and it also is useful to be able to shuffle

them. For example, playback of an event series obtained by hybridising two different

stored series provides a new mode of variation, which may be an interesting and

unpredictable component in an improvisation.</p>
<p>A brief comparison of the computational memory functions with those of

humans may be of interest. Human memory is conventionally divided into different

time ranges, such as working memory, short term memory, and long term memory

(see <a href="#Snyder2000">Snyder (2000)</a> for an introduction to memory studies in the context of 

music). The first two components are considered to occupy up to about 30 seconds, and this

correlates well with the observation that assimilation of the statistical impact of pitch

sequences in a harmonic context takes at least 20 seconds (<a href="#Bailes2013">Bailes <em>et al.</em>, 2013</a>). Long term memory can remain for any length of time, but of course is selective and

decaying (<a href="#Lewandowsky2011">Lewandowsky &amp; Farrell, 2011</a>). Recent studies in our lab (Herff 

et al, submitted for publication) contribute to an increasing body of work that suggests,

surprisingly, that memory for musical melodies can be prolonged and even perhaps

regenerative. We suggest that regeneration may be possible because of a

representation which involves a co-relation between components such as pitch and

rhythm such that memory of two parts of this three part system (two components, one

relation) can regenerate the third. Even after up to 195 intervening melodies, we

found unchanged recognition of an earlier presented melody, and others have

presented related observations (<a href="#Schellenberg2015">Schellenberg &amp; Habashi, 2015</a>) over 

shorter delays. Thus 2000 events in our computational coll is an appropriately comparable value

(corresponding roughly to 195 10-note melodies), though computer memory need not

be restricted in this way.</p>
<p>Given all the controls described, it is obvious that <em>BANG</em> can be provided

with start-up choices automatically (randomised, or pre-selected in systematic groups

using the <em>MAX</em> preset object) such that it will run without further performer

intervention, as an automaton. The duration of the run may be open-ended or pre-

determined. The <em>MAX</em> user interface (which displays only some of the switches and

controls available to the user) is illustrated in Fig. 5.</p>
<figure>
<img alt="Screen-dump." class="centerImg" src="/media/jcms_old/9/Figure5.png" width="100%"/>
<figcaption>Figure 5 – A screen-dump of the <em>MAX</em> user interface.</figcaption>
</figure>
<p>The interface is designed for simplicity, and a user may choose only to activate the

automatic play function, or alternatively may delve further, depending on whether

they are actively performing on a MIDI-instrument themselves. Only a few of the

controls available to a performer are directly visible at this level.</p>
<h3>6 Discussion: Creative and Generative uses of <em>BANG</em> and its Future Development

and Evaluation</h3>
<h4>6.1 Generative, Creative and Cognitive Aspects</h4>
<p>In summary, <em>BANG</em> is a computational music generator for use in live

performance: a live algorithm. It operates primarily on input performed (or

secondarily on preformed or randomly generated) musical data, and uses time series

models of various levels of complexity for responsive or autonomous music

generation, with a range of memory facilities. We structure the discussion by

considering first the creative/generative uses of <em>BANG</em> and their cognitive aspects,

and then delving further into some of the interesting specifics of its design and future

development and assessment.</p>
<p>Edmonds (<a href="#Boden2009">Boden &amp; Edmonds, 2009</a>), in a personal section of that paper,

makes an interesting distinction between user “interaction”, and “influence”. In

interaction, the human agent (in our case, performer) can sense almost immediately

the impact of their action upon the system, and hence is likely to detect its origin. In a

situation of influence, the response of the system is much later, perhaps at an

unpredictable time, but certainly less likely to be detected as a response to a particular

action either by performer or audience. <em>BANG</em> provides both interaction and

influence, and both can apply to the purely generative part of the system (the TSA

simulation and modelling) and to the use of the memory stores it contains. <em>BANG</em> is

thus an improvisation prosthesis, a tool of creative hyperimprovisation (<a href="#Dean2003">Dean, 2003

</a>). It can generate “beats” in the sense of EDM, and equally, beats and notes within or

without flexible metrical hierarchies, given that as in most music making, these

involve fluctuating rather than absolutely isochronic note inter-onset intervals.</p>
<p>Is <em>BANG</em> already a contribution to computational creativity, granted that it can

be developed further at the highest level (for example, the introduction of

evolutionary processes) or at the lower levels (the nature of the musical material

processed and generated)? <a href="#Colton2012">Colton &amp; Wiggins (2012)</a> offer a

useful standpoint on what computational creativity research is: “The philosophy,

science and engineering of computational systems which, by taking on particular

responsibilities, exhibit behaviours that unbiased observers would deem to be

creative”. The concept of “unbiased observer” is delineated sufficiently to make clear

that such a person, given all the usual difficulties, is able to accept that an unfamiliar

or even new style of work may well be creative, and that a cumulation of such works

can suggest their own criteria and extend previous ones. One might choose to put

greater emphasis on the possible virtue of computational creativity being able to

emerge with outputs that are well outside current human creative norms or even

possibilities, and <a href="#Colton2012">Colton and Wiggins (2012)</a> include this. <em>BANG</em> may 

well fit within this definition, pending scientific test (discussed in the final sub-section).</p>
<p><em>BANG</em> hands over significant “responsibilities” to the computer, and there is

essentially no “curation” (that is, selection by the user as discussed by <a href="#Colton2012">Colton &amp;

Wiggins (2012)</a>) amongst the outputs, required (or offered). Indeed, in an improvisation with

performer and <em>BANG</em>, the MIDI-instrument performer has all the facilities of

retrospective integration (<a href="#Bailes2013">Bailes <em>et al.</em>, 2013</a>) and even kinds of effective 

“erasure” (<a href="#Smith1997">Smith &amp; Dean, 1997</a>) available to them according to what they 

subsequently play, given that music is temporal and the impacts of previous events depend in part on

future ones. In a very limited sense, <em>BANG</em> uses case-based reasoning (like

MuzaCazUza (<a href="#Ribeiro2001">Ribeiro <em>et al.</em>, 2001</a>)) because each 200-event time series 

that it analyses, and from which it creates an output by simulating the model, can be seen as

a case. <em>BANG</em> has been used in conjunction with the <em>Serial Collaborator</em>, a real–

;time system developed to create and manipulate serial melodies and harmonies (<a href="#Dean2014">Dean,

2014</a>).</p>
<p>There are a range of performance modes available with <em>BANG</em>, some

mentioned already. The normal usage involves one MIDI-performer who also initiates

and controls aspects of <em>BANG</em> operations, and provides played series to it. There may

be multiple <em>BANG</em> instances available, and the flow of material to them can be

separated, as can their outputs. It is interesting to use two different tuning systems

simultaneously with <em>BANG</em> having access to two different player modules (MIDI- or

VST-plugin driven). Equally there can be a second performer who operates <em>BANG</em>,

while alternatively <em>BANG</em> may function entirely autonomously once started.</p>
<p>Many current computational creativity systems are concerned with generation

of artefacts which retain conventions of prior systems, be it tonality in music, or

syntaticality in text. But this is not a necessary feature, and <em>BANG</em> does not require it.

This is because it is conceived currently as a post-tonal and microtonal system: using

a performer’s input, or stored memories of them, it generates structurally related

material. If the inputs are highly tonal, the outputs will diverge, but still show signs of

tonalness; and vice versa. This flexibility is characteristic of much note-centred music

since about 1950 in particular. Thus a number of instances of <em>BANG</em> can cooperate,

and furthermore, the memory components of <em>BANG</em> would readily allow multiple

MIDI-players in a single instance to use the generated material simultaneously if

required. In this way harmony would be generated more extensively than it is

presently with a single instance of <em>BANG</em> in play (it generates chords as a minority of

events). Generating conventional tonal harmonies is a large scale and ongoing

problem (<a href="#Whorley2013">Whorley <em>et al.</em>, 2013</a>), but generating post-tonal harmony is in 

some senses simpler, and there are clear paths for its further development using VAR in which

time series of chords are modelled as vectors (c.f. <a href="#Tymoczko2011">Tymoczko, 2011</a>).</p>
<p>A model of the statistical patterns of the pitch, rhythm, and intensity

components of a piece of music (be it a model from IDyOM or from TSA) may

constitute a model of some the cognitive processes it entails. For example, IDyOM

can be used to model certain expectancies, which in turn may influence listeners' and

performers' cognitive responses to the music in other respects. Time series analysis,

while couched in quite different terms, also models expected notes given a preceding

set, and hence has some equivalent components. Both approaches may thus be seen to

inform music generation by means of process models of preceding music, notably

whole pieces (or 200-note series as presently) and corpora of such pieces. The TSA

generative mechanism of <em>BANG</em> is a rather different complement to those which use

knowledge of cognition of individual musical components to dictate the nature and

frequency of their occurrence in new pieces (<a href="#Brown2015">Brown <em>et al.</em>, 2015</a>) or those which 

take temporal streams of (neuro)physiological data and sonify them into musical elements

(<a href="#Daly2015">Daly <em>et al.</em>, 2015</a>). TSA can also provide models of musical affect based on input

acoustic and musical variables and these models could in the future be transformed in

various ways as potential music generators. In addition, <em>BANG</em> provides “memory”

stores, which can be used in a manner akin to genuine cognitive memory stores, for

intermingling, transformation, and comparison.</p>
<h4>6.2 Technical Aspects of <em>BANG</em></h4>
<p>It is appropriate to discuss further the contrasts between <em>BANG</em> and Pachet’s

<em>Continuator</em> (<a href="#Pachet2003">Pachet, 2003</a>). The <em>Continuator</em> (2003) is based 

on a variable order Markov chain analysis, which predominantly uses the pitch sequence. It is focused on

real-time operation on short chunks of input MIDI sequence. These are “systematically

segmented” into phrases “using a variable temporal threshold (typically about

250msecs)”. It seems that this “threshold” concerns gaps in the input sequence, but this

is not detailed. The <em>Continuator</em> has facilities for “biasing” the output, for example

based on changing harmony: this supports one of its intended uses, in conventional

jazz harmonic contexts. Although it determines note onset and offset times, the

original description of the <em>Continuator</em> emphasises solely note duration.</p>
<p>Simplification functions are provided so that when an input note sequence (within the

quantised, finite pitch alphabet) is not found in the database of previous chains, a

compromise solution can be returned. An example given considers a pattern in

semitones, so that the sequence C/D/E/G is distinct from C/D/Eb/G. Then it notes that

if the sequence C/D/Eb was not found in the current database, so that there was no

continuation solution, the sequence might instead be considered in groups of three

semitones starting on C. Thus C/C#/D would be coded 1, while Eb/E/F would be

coded 2, and F#/G/G# coded 3. Then both opening three note patterns (C/D/E, and

C/D/Eb) become 1, 1, 2, and 3 could be offered as a solution continuation for either.

This is an example of a “reduction function” in <em>Continuator</em>, and it seems that duration

and velocity measures of an input are used primarily in such reduction functions

rather than being jointly modelled as in our VAR approach.</p>
<p>In the case of output rhythm, a <em>Continuator</em> user has three choices: “natural

rhythm” (that “encountered during the learning phase” when the sequence chosen from

the database was input by a performer); “linear rhythm” (streams of quavers (eighth-notes)); or

“input rhythm”, the rhythm of the input but possibly warped in length. An emphasis on

fixed metrical contexts is included by allowing an approach in which the input is

segmented according to a specified metre, which can then ensure the output also

segments in accord with it. These features form a major contrast with <em>BANG</em>, which

is agnostic about rhythmic and velocity features, and in the VAR module, allows them

to be treated quite equally. <em>Continuator</em> adapts to features of the ongoing environment,

for example loudness or harmonic field, by use of “fitness functions”. An example is

given for loudness, showing that if the input is soft, a bias can be introduced into the

output such that it tends to match. The author finds that “the system generates musical

material which is both stylistically consistent [with the performed context], and

sensitive to the input” when bias values are intermediate. This is analogous to the idea

of an eXternal regressor series being used within <em>BANG</em>, in ARX or VARX (see

below). Broadly, because of the emphasis on using <em>Continuator</em> with children,

students and specifically in jazz, there is a tendency for the descriptions of it to

emphasise normative processes, what Rowe compared with “regression to the mean”

(<a href="#Rowe2001">Rowe, 2001</a>), but it is clear that these are not necessary features of its modes 

of operation.</p>
<p>Later work from Pachet and colleagues, notably a second patent, published in

2013 (<a href="#Pachet2013">Pachet &amp; Roy, 2013</a>), continues to turn more towards the use of 

constraints upon the Markovian sequences, but it seems that the computational complexity of a

hierarchical or joint modelling system, treating pitch, velocity, note duration and IOI

equally has not yet been fulfilled, and the discussion still emphasises parallel

generation of different streams, such as pitch, velocity, and duration. The main systems

illustrated in the 2013 patent emphasise melody generation for conventional jazz with

“a coherent harmonic structure (a sequence of chords)” (p. 9). Often a (metrical) beat

generator that has fixed beat lengths is used, and it decides simply whether the output

melody for a beat should be crotchets (quarter notes) or quavers (eighth notes) and “substantially constant

within one beat” (p. 8) (“Detailed Description” section of the Patent). In its final paragraph,

the patent points towards allowing “the elements of the sequence to have variation in

more than one of their properties” (p. 18), but illustrates this by suggesting parallel

<em>Continuator</em> instances. In contrast, <em>BANG</em> uses joint modelling by virtue of its

equanimity about its post-tonal and potentially non-metrical outputs, and is possibly

more amenable to the generation of short term large contrasts (sudden changes in

dynamic or harmony) that a post-tonal free improviser such as the author might want,

rather than a “regression towards the mean”. The <em>Continuator</em> is used as a learning

tool, and Pachet’s successor constraint-based projects (for example, Virtuoso) are

sometimes aiming for the other extreme of performer/improviser expertise.</p>
<p>The issue of the irregular spacing of the events is of interest. Unlike Markov

chains, TSA normally operates on the assumption (whether or not justified by the

data) that what is being modelled is regularly spaced in time. When it is not, as here,

the time gap is most often simply used as an additional time series (our IOI series) or

vector member. But alternatively, so called variogram techniques can be used that

deal directly with this IOI variation, removing the assumption of regular temporal

spacing, and they may produce distinct output features; these are more widely used in

spatial autocorrelation models. Conversely, <em>BANG</em> can quantise output IOIs to

comply with metrical structures, or modify them through a well-formed rhythm

generator system <em>MeanTimes</em> (<a href="#Milne2016">Milne &amp; Dean, 2016</a>).</p>
<h4>6.3 Future Work, and a Proposal for Evaluation of Solo Instrument-Computer-

Interactive Music Systems</h4>
<p>Simple extensions of the current system will gradually be implemented. For

example, dynamic TSA models can also be constructed, which change coefficients

(and sometimes predictors too) at different points in the piece. Control of outputs in

part by external independent regressor time series (perhaps an audience input stream)

is simple; and projecting a time series of perceived affect into a generative process on

the basis of the time series model of its production in response to exogenous variables

such as acoustic one, is equally straightforward. The internal elaboration of fitness

functions, and hence internal valuation of outputs would be of interest; and

collaboration with other systems, notably IDyOM (and/or IDyOT) (<a href="#Wiggins2015">Wiggins &amp; Forth,

2015</a>) is planned.</p>
<p>Turning to lower levels of the <em>BANG</em> system, there is the opportunity to use

pitch-class as a derived variable (as in IDyOM). This concept can only apply if the

tuning system in play involves a repeating identically subdivided interval such as an

octave (e.g., in conventional equal temperament with twelve subdivisions per octave),

or an octave and a fifth (<a href="#Bohlen1978">Bohlen, 1978</a>). This may be particularly useful in 

elucidating and integrating the symbolic structures of materials generated simultaneously in

different tuning systems. The nature of the TSA modelling can also be widened. For

example, the order of the VAR models could be permitted to increase (at the cost of

longer computational delay, though as noted this can often be overcome by use of the

Yule-Walker algorithm). Stronger and principled data-driven modelling constraints

could equally be permitted: these would allow some of the component predictor

coefficients of a VAR model (for example lag 2 of the IOI) to be set to 0, while

retaining the other lag 2 predictors. Indeed, VAR can be used to model pitch without

reference to prior pitch sequences, and correspondingly in modelling the other

generated parameters.</p>
<p>Whereas IDyOM commonly uses both a short term (memory) model (based

solely on the piece in question) and a long-term model (based on knowledge of a

corpus or related work), <em>BANG</em> normally operates solely on the current piece,

specifically primarily on the last 200 events. However, the memory stores in <em>BANG</em>,

and the possibility of using larger accumulations of keyboard outputs than from a

single performance or performer, mean that cross-sectional techniques of time series

analysis (CSTSA) could be developed in future (again consuming considerably more

computational time). CSTSA has been used in recent publications on perceived affect

in a range of diverse works: it is essentially a form of mixed effects analysis of time

series (<a href="#Dean2014b">Dean <em>et al.</em>, 2014b</a>; <a href="#Dean2014c">2014c</a>). Thus a 

set of pieces can be treated as items within CSTSA, potentially allowing new hybridities of composition 

and improvisation.</p>
<p>Let us turn finally to the possibilities for evaluation of interactive music

systems (IMS), particularly <em>BANG</em>. For this purpose we consider an interaction by a

single performer with a keyboard and with the <em>BANG</em> interface, and simultaneously

the provision of MIDI-data streams from the keyboard to <em>BANG</em>. So far most

published interactive music systems have not been subjected to any systematic

evaluation, and this is currently true of <em>BANG</em>. It probably passes the secondary “test”

in which a work declared as creative is functional within the professional artistic

community, “under terms of engagement usually reserved for people” (<a href="#Colton2012">Colton &amp;

Wiggins, 2012</a>), as with Pachet’s <em>Continuator</em> (<a href="#Pachet2003">Pachet, 2003

</a>). The main bases for this

claim in the case of the <em>Continuator</em> seem to be that users (children, students,

performers) enjoy it, and that listeners cannot readily distinguish when and if it is the

human and/or the <em>Continuator</em> which are sounding. In addition, children seem to be

more attentive to some tasks involving it than to alternatives (<a href="#Addessi2004">Addessi &amp; Pachet,

2004</a>). Correspondingly, <em>BANG</em> is already used in professional performance and at

least passes muster with both its hands-on users (such as the author) and similarly

with other collaborating performers.</p>
<p>However, this is no more than the most preliminary form of evaluation. What

might be done to pursue this in some depth? In my earlier work on interactive music

systems, what I called hyperimprovisation (<a href="#Dean2003">Dean, 2003</a>), I argued that critical to

evaluation would indeed be performers, but also others expert in the form(s) of music

in question. Developing this in the context of free improvisation IMS, and reviewing

the available alternatives, <a href="#Linson2012">Linson <em>et al.</em> (2012)</a> argues primarily for 

qualitative assessment by idiom experts, indicating that it is “especially relevant for determining

whether or not a player-paradigm system itself performs at the level of a human

expert”. On the other hand, a post-human aspect to an IMS is of interest, and this

could not be considered in relation to human “levels”. For related reasons, a pure

Turing test is not apt here.</p>
<p>In the broader context of free improvisation at large, it has been shown that the

consensual assessment technique using experts in the field can be effective in ranking

preferences (<a href="#Eisenberg2003">Eisenberg &amp; Thompson, 2003</a>). In this study, there were 10 

expert assessors and 16 keyboard improvisations, one by each participant, who had all

played the piano for a minimum of five years but had varied improvisation

experience. They improvised at the keyboard in response to a one minute excerpt of

Prokofiev’s <em>Romeo and Juliet</em> “in any way” they chose; no information about the

kinds of music they produced is given. Inter-rater agreement was statistically

acceptable, and much of the variance in the measured preferences could be explained

on the basis of complexity, creativity, and technical goodness as perceived by the

experts (each scored separately).</p>
<p>In contrast, potentially less subjective aspects can be evaluated too, with

reference to distinguishing human from machine contributions. An interesting case is

the study of an algorithmic machine drum and bass generator, where satisfactory

examples of both human and algorithmic drum patterns were obtained as judged by a

pre-trained multi-layer perceptron critic, and then assessed by 19 human listeners, in

this case not especially expert in drum and bass (<a href="#Pearce2001">Pearce &amp; Wiggins, 2001</a>). In 

three separate experiments, the appropriately briefed listeners considered whether the

patterns were human or machine generated, and then which style each belonged to,

amongst “drum and bass”, “techno” and “other”. The comparison of results with known

statistics of inputs was used to show for example that there were perceived differences

between the human and algorithmic patterns, and that the algorithmic patterns were

not perceived as belonging to the appropriate style.</p>
<p>We can synthesise these types of evaluations, and taking account of the fact

that <em>BANG</em> is not style-bound, but does normally depend on ongoing user keyboard

sounds and interface interactions, propose a future evaluation scheme for this

situation. The essence of the proposal is that several professional keyboard

improvisers familiar with <em>BANG</em> (or other qualitatively similar IMS) would perform

several improvisations, and the outputs of the played keyboard and <em>BANG</em> would be

recorded separately (this is abbreviated as the K(i)B, keyboard-interactive-<em>BANG</em>

condition). The performances would be required each to be of a length between 115

and 125 seconds (not an unreasonable precision to demand of a professional used to

recording-studio conditions). The improvisers would then perform a solo keyboard

piece of the same length, after which they would improvise a second keyboard part

with it (creating the KK condition). Again both keyboard results would be recorded

separately. The potentially informative aspect of the design would be that listeners

would evaluate both the complete K(i)B and KK performances, but also counterparts

in which K(i) and B recordings are recombined, so that what is heard was not played

simultaneously (and is in some sense unconnected). Similarly they would hear both

original and recombined KK performances. There would be both a naïve and an

expert group of evaluators, whose data are treated as distinct. Such participants could

evaluate in a between-participants manner (to avoid multiple demands, such as liking

and complexity becoming conflated) the complexity, technical goodness, and liking

for the target items of the types described. A mixed effects linear analysis would

permit the discrimination of impacts of K(i) vs. K, and real vs. recombined

performance. Furthermore, the possible influence of complexity that might be a

consequence of comparing K with KK or K(i) with KiB performance would be

overcome. It is intended to undertake such an evaluation, hopefully at the point at

which the intended TSA and IDyO(M/T) modules are both available in the

performance system.</p>
<h3>7 Online Supplementary Material</h3>
<p><a href="Supplement.pdf" target="_blank">A text file</a> provides more detail on the

algorithms, including R code for the analyses and simulations and the Pitch data

series which is modelled.</p>
<p><a href="https://goo.gl/oPZyk0" target="_blank">A Quicktime movie</a> illustrates <em>BANG</em> at work together with the author, a

professional improviser, playing a MIDI-keyboard and controlling the patch. All the

played and generated sounds are realised as physical synthesis equal-tempered piano

sounds, using PianoTeq 4. The movie is made more for the purpose of demonstration

than as a musical item, and so the R code is shown in operation, and Garage Band is

used half-way through to notate the ongoing events: those which remain white are

those played on the keyboard, while most events heard are generated by <em>BANG</em>.

Commonly <em>BANG</em> is used in ensemble improvisation, as part of the computer-

interactive aspect of the author’s contribution, alongside acoustic piano playing. The

version of the <em>BANG</em> code used is pre-final, and some of the comments and plans in it

(seen in the video) have since been implemented, as noted in the article itself.</p>
<h3>8 Disclosure Statement</h3>
<p>There have been no financial benefits from the application

of this research and none is planned.</p>
<h3>Acknowledgements</h3>
<p>This work on

music generation using time series analysis models was initiated by the author and

Geraint Wiggins (QMUL), during reciprocal laboratory visits, and is intended to

provide functionality that can cooperate with live use of information theoretic music

analysis and generation by IDyOM and its developments (discussed within).

Discussion with colleagues at both MARCS and QMUL, in particular Jamie Forth,

Andy Milne, and Marcus Pearce, is gratefully acknowledged. The research was

conducted at MARCS Institute and at The Centre for Digital Music and the

Computational Creativity Laboratory, Queen Mary University of London, where

Rodger T. Dean is a visiting researcher.</p>
<a name="references"></a><h3>References</h3>
<ul class="referencesList">
<a name="Addessi2004"></a><li> Addessi, A.R. &amp; Pachet, F. (2004). Child/computer interaction: Observation in classroom setting. In <em>Proceedings of the Conference on Interdisciplinarity Musicology (CIM04)</em>, Graz/Austria.</li>
<li><a name="Bailes2012"></a>Bailes, F. &amp; Dean, R.T. (2012). Comparative time series analysis of perceptual responses to electroacoustic music. <em>Music Perception, 29</em>, 359–375.</li>
<li><a name="Bailes2013"></a>Bailes, F., Dean, R.T. &amp; Pearce, M.T. (2013). Music cognition as mental time travel. <em>Scientific Reports, 3</em>.</li>
<li><a name="Bailes2015"></a>Bailes, F., Dean, R.T. &amp; Broughton, M.C. (2015). How different are our perceptions of equal-tempered and microtonal intervals? A Behavioural and EEG Survey. <em>PLoS One, 10</em>(8), e0135082.</li>
<li><a name="Beran2004"></a>Beran, J. (2004). <em>Statistics in musicology</em>. CRC Press: USA.</li>
<li><a name="Blackwell2004"></a>Blackwell, T. &amp; Young, M. (2004). Self-organised music. <em>Organised Sound, 9</em>(02), 123–136.</li>
<li><a name="Bloch2008"></a>Bloch, G., Dubnov, S. &amp; Assayag, G. (2008). Introducing video features and spectral descriptors in the Omax Improvisation System, <em>ICMC</em>.</li>
<li><a name="Boden2009"></a>Boden, M.A. &amp; Edmonds, E.A. (2009). What is generative art? <em>Digital Creativity, 20</em>(1–2), 21–46.</li>
<li><a name="Bohlen1978"></a>Bohlen, H. (1978). 13 tone steps in the twelfth. <em>Acta Acustica United with Acustica, 39</em>(2), 76–86.</li>
<li><a name="Box1994"></a>Box, G.G., Jenkins, G.M. &amp; Reinsel, G.C. (1994). Time series analysis – Forecasting and control. 3rd ed. Prentice Hall: New Jersey.</li>
<li><a name="Brown2015"></a>Brown, A.R., Gifford, T. &amp; Davidson, R. (2015). Techniques for generative melodies inspired by music cognition. <em>Computer Music Journal, 39</em>(1), 11–26.</li>
<li><a name="Brown1993"></a>Brown, J.C. (1993). Determination of the meter of musical scores by the method of autocorrelation. <em>Journal of the Acoustical Society of America, 94</em>(4), 1953–1957.</li>
<li><a name="Button2013"></a>Button, K.S., Ioannidis, J.P.A., Mokrysz, C., Nosek, B.A., Flint, J., Robinson, E.S.J. &amp; Munafo, M.R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. <em>Nature Reviews Neuroscience, 14</em>, 365–376.</li>
<li><a name="Carp2012"></a>Carp, J. (2012). The secret lives of experiments: Methods reporting in the fMRI literature. <em>Neuroimage, 63</em>, 289–300.</li>
<li><a name="Colton2012"></a>Colton, S. &amp; Wiggins, G.A. (2012). Computational creativity: The final frontier?, <em>ECAI</em>.</li>
<li><a name="Conklin1995"></a>Conklin, D. &amp; Witten, I.H. (1995). Multiple viewpoint systems for music prediction. <em>Journal Of New Music Research, 24</em>(1), 51–73.</li>
<li><a name="Conklin2013"></a>Conklin, D. (2013). Multiple viewpoint systems for music classification. <em>Journal Of New Music Research, 42</em>(1), 19–26.</li>
<li><a name="Cryer2008"></a>Cryer, J.D &amp; Chan, K.S. (2008). <em>Time series analysis with applications in R</em>. Springer: New York.</li>
<li><a name="Daly2015"></a>Daly, I., Malik, A., Weaver, J., Hwang, F., Nasuto, S.J., Williams, D. &amp; Miranda, E. (2015). Towards human-computer music interaction: Evaluation of an affectively-driven music generator via galvanic skin response measures. <em>Computer Science and Electronic Engineering Conference (CEEC)</em>.</li>
<li><a name="Dean2003"></a>Dean, R.T. (2003). <em>Hyperimprovisation: Computer interactive sound improvisation</em>. AR Editions: Madison.</li>
<li><a name="Dean2009"></a>Dean, R.T. (2009). Widening Unequal Tempered microtonal pitch space for metaphoric and cognitive purposes with New Prime Number Scales. <em>Leonardo, 42</em>(1), 94–95.</li>
<li><a name="Dean2010"></a>Dean, R.T. &amp; Bailes, F. (2010). Time series analysis as a method to examine acoustical influences on real-time perception of music. <em>Empirical Musicology Review, 5</em>, 152–175.</li>
<li><a name="Dean2011"></a>Dean, R.T., Bailes, F. &amp; Schubert, E. (2011). Acoustic intensity causes perceived changes in arousal levels in music: An experimental investigation. <em>PLoS One, 6</em>(4), e18591.</li>
<li><a name="Dean2013"></a>Dean, R.T. &amp; Bailes, F. (2013). Using time series analysis to evaluate skin conductance during movement in piano improvisation. <em>Psychology of Music</em>, doi:0305735613489917.</li>
<li><a name="Dean2014"></a>Dean, R.T. (2014). The Serial Collaborator: A meta-pianist for real-time tonal and non-tonal music generation. <em>Leonardo, 47</em>(3), 260–261.</li>
<li><a name="Dean2014a"></a>Dean, R.T., Bailes, F. &amp; Drummond, J. (2014a). Generative structures in improvisation: Computational segmentation of keyboard performances. <em>Journal Of New Music Research (Ahead-of-Print)</em>, 1–13.</li>
<li><a name="Dean2014b"></a>Dean, R.T., Bailes, F. &amp; Dunsmuir, W.T.M. (2014b). Shared and distinct mechanisms of individual and expertise-group perception of expressed arousal in four works. <em>Journal of Mathematics and Music, 8</em>(3), 207–223.</li>
<li><a name="Dean2014c"></a>Dean, R.T., Bailes, F. &amp; Dunsmuir, W.T.M. (2014c). Time series analysis of real-time music perception: Approaches to the assessment of individual and expertise differences in perception of expressed affect. <em>Journal of Mathematics and Music, 8</em>(3), 183–205.</li>
<li><a name="Deng2015"></a>Deng, J.J. &amp; Leung, C.H.C. (2015). Dynamic time warping for music retrieval using time series modeling of musical emotions. <em>IEEE Transactions on Affective Computing, 6</em>(2), 137–151.</li>
<li><a name="Dyson2010"></a>Dyson, B.J. &amp; Quinlan, P.T. (2010). Decomposing the Garner Interference Paradigm: Evidence for dissociations between macrolevel and microlevel performance. <em>Attention, Perception, &amp; Psychophysics, 72</em>(6), 1676–1691.</li>
<li><a name="Eck2006"></a>Eck, D. (2006). Identifying metrical and temporal structure with an autocorrelation phase matrix. <em>Music Perception: An Interdisciplinary Journal, 24</em>(2), 167–176.</li>
<li><a name="Eisenberg2003"></a>Eisenberg, J. &amp; Thompson, W.F. (2003). A matter of taste: Evaluating improvised music. <em>Creativity Research Journal, 15</em>(2–3), 287–296.</li>
<li><a name="Fernandez2013"></a>Fernández, J.D. &amp; Vico, F. (2013). AI methods in algorithmic composition: A comprehensive survey. <em>Journal of Artificial Intelligence Research</em>, 513–582.</li>
<li><a name="Gingras2016"></a>Gingras, B., Pearce, M.T., Goodchild, M., Dean, R.T., Wiggins, G. &amp; McAdams, S. (2016). Linking melodic expectation to expressive performance timing and perceived musical tension. <em>Journal of Experimental Psychology: Human Perception and Performance</em>, (tba).</li>
<li><a name="Hamilton1994"></a>Hamilton, J.D. (1994). <em>Time series analysis</em>. Princeton University Press: Princeton.</li>
<li><a name="Harvey1990"></a>Harvey, A.C. (1990). <em>Forecasting, structural time series models and the Kalman filter</em>. Cambridge: Cambridge University Press.</li>
<li><a name="Hazan2009"></a>Hazan, A., Marxer, R., Brossier, P., Purwins, H. ,Herrera, P. &amp; Serra, X. (2009). What/when causal expectation modelling applied to audio signals. <em>Connection Science, 21</em>(2–3), 119–143.</li>
<li><a name="Huron2006"></a>Huron, D. (2006). <em>Sweet anticipation</em>. MIT Press: Cambridge.</li>
<li><a name="Hyndman2011"></a>Hyndman, R.J. (2011). <em>Forecast: Forecasting functions for time series</em>. Rpackage version 3.18.</li>
<li><a name="Jones1992"></a>Jones, M.R. (1992). Attending to musical events. In M.R. Jones &amp; S. Holleran (Eds.), <em>Cognitive bases of musical communication</em> (pp. 91–110). American Psychological Association: Washington.</li>
<li><a name="Kelly2009"></a>Kelly, C. (2009). <em>Cracked media: The sound of malfunction</em>. MIT Press: Massachusetts.</li>
<li><a name="Landy2009"></a>Landy, L. (2009). Sound-based music 4 All. In R.T. Dean (Eds.), <em>The Oxford handbook of computer music</em> (pp. 518–535). Oxford University Press: New York.</li>
<li><a name="Launay2013"></a>Launay, J., Dean, R.T. &amp; Bailes, F. (2013). Evidence for multiple strategies in off-beat tapping with anisochronous stimuli. <em>Psychological Research</em>, 1–15.</li>
<li><a name="Lewandowsky2011"></a>Lewandowsky, S. &amp; Farrell, S. (2011). Computational modeling in cognition: Principles and practice. Sage Publications: Thousand Oaks, CA.</li>
<li><a name="Lewis1993"></a>Lewis, G. (1993). <em>Voyager</em>. Tokyo: Disk Union, CD AVAN 014.</li>
<li><a name="Lewis1999"></a>Lewis, G. (1999). Interacting with latter-day musical automata. <em>Contemporary Music Review, 18</em>(3), 99–112.</li>
<li><a name="Lewis2000"></a>Lewis, G.E. (2000). Too many notes: Computers, complexity and culture in Voyager. <em>Leonardo Music Journal, 10</em>, 33–39.</li>
<li><a name="Linson2012"></a>Linson, A., Dobbyn, C. &amp; Laney, R. (2012). Critical issues in evaluating freely improvising interactive music systems. <em>Proceedings of the Third International Conference on Computational Creativity</em>.</li>
<li><a name="Meyer1956"></a>Meyer, L.B. (1956). <em>Emotion and meaning in music</em>. University of Chicago Press: Chicago.</li>
<li><a name="Milne2016"></a>Milne, A.J. &amp; Dean, R.T. (2016). Computational creation and morphing of multi-level rhythms by control of evenness. <em>Computer Music Journal, 40</em>(1), (tba).</li>
<li><a name="Olsen2016"></a>Olsen, K.N. &amp; Dean, R.T. (2016). Does perceived exertion influence perceived affect in response to music? Investigating the 'FEELA' hypothesis. <em>Psychomusicology: Music, Mind, and Brain</em>, (In Press: accepted March 8, 2016).</li>
<li><a name="Pachet2003"></a>Pachet, F. (2003). The Continuator: Musical interaction with style. <em>Journal Of New Music Research, 32</em>(3), 333–341.</li>
<li><a name="Pachet2013"></a>Pachet, F. &amp; Roy, P. (2013). Markovian-sequence generator and new methods of generating markovian sequences: Google Patents.</li>
<li><a name="Pearce2001"></a>Pearce, M. &amp; Wiggins, G. (2001). Towards a framework for the evaluation of machine compositions. <em>Proceedings of the AISB'01 Symposium on Artificial Intelligence and Creativity in the Arts and Sciences</em>.</li>
<li><a name="Pearce2005"></a>Pearce, M.T. (2005). <em>The construction and evaluation of statistical models of melodic structure in music perception and composition</em>. London: City University.</li>
<li><a name="Pearce2006"></a>Pearce, M.T. &amp; Wiggins, G.A. (2006). Expectation in melody: The influence of context and learning. <em>Music Perception, 23</em>, 377–405.</li>
<li><a name="Pearce2011"></a>Pearce, M.T. (2011). Time-series analysis of music: Perceptual and information dynamics. <em>Empirical Musicology Review, 6</em>, 125–130.</li>
<li><a name="Pearce2012"></a>Pearce, M.T. &amp; Wiggins, G.A. (2012). Auditory expectation: The information dynamics of music perception and cognition. <em>Topics in Cognitive Science, 2012</em>, 1–28.</li>
<li><a name="Pressing1987"></a>Pressing, J. (1987). The micro- and macro-structural design of improvised music. <em>Music Perception</em>, 133–172.</li>
<li><a name="Ribeiro2001"></a>Ribeiro, P., Pereira, F.C., Ferrand, M., Cardoso, A.P. &amp; de Marrocos, P. (2001). Case-based melody generation with MuzaCazUza. <em>Proceedings of the AISB’01 Symposium on Artificial Intelligence and Creativity in Arts and Science</em>.</li>
<li><a name="Rohrmeier2015"></a>Rohrmeier, M., Zuidema, W., Wiggins, G.A. &amp; Scharff, C. (2015). Principles of structure building in music, language and animal song. <em>Philosophical Transactions of the Royal Society of London B: Biological Sciences, 370</em>(1664), 20140097.</li>
<li><a name="Rowe1993"></a>Rowe, R. (1993). <em>Interactive music systems: Machine listening and composing</em>. MIT Press: Cambridge.</li>
<li><a name="Rowe2001"></a>Rowe, R. (2001). <em>Machine musicianship</em>. MIT Press: Cambridge.</li>
<li><a name="Schellenberg2015"></a>Schellenberg, E.G. &amp; Habashi, P. (2015). Remembering the melody and timbre, forgetting the key and tempo. <em>Memory &amp; Cognition, 43</em>(7), 1021–1031.</li>
<li><a name="Serra2012"></a>Serra, J., Kantz, H., Serra, X. &amp; Andrzejak, R.G. (2012). Predictability of music descriptor time series and its application to cover song detection. <em>IEEE Transactions on Audio, Speech, and Language Processing, 20</em>(2), 514–525.</li>
<li><a name="Sims1988"></a>Sims, C.A. (1988). Bayesian skepticism on unit root econometrics. <em>Journal of Economic Dynamics and Control, 12</em>(2), 463–474.</li>
<li><a name="Smith1997"></a>Smith, H. &amp; Dean, R.T. (1997). <em>Improvisation, hypermedia and the arts since 1945</em>. Harwood Academic: London.</li>
<li><a name="Snyder2000"></a>Snyder, B. (2000). <em>Music and memory: An introduction</em>. Cambridge: MIT Press.</li>
<li><a name="Surges2013"></a>Surges, G. &amp; Dubnov, S. (2013). Feature selection and composition using PyOracle. <em>Ninth Artificial Intelligence and Interactive Digital Entertainment Conference</em>.</li>
<li><a name="Tapson2015"></a>Tapson, J., Cohen, G. &amp; van Schaik, A. (2015). ELM solutions for event-based systems. <em>Neurocomputing, 149</em>, 435–442.</li>
<li><a name="Tsay2013"></a>Tsay, R.S. (2013). <em>Multivariate time series analysis: With R and financial applications</em>. John Wiley &amp; Sons: USA.</li>
<li><a name="Tymoczko2011"></a>Tymoczko, D. (2011). <em>A geometry of music: Harmony and counterpoint in the extended common practice</em>. Oxford University Press: Oxford.</li>
<li><a name="Whorley2013"></a>Whorley, R., Wiggins, G.A., Rhodes, C. &amp; Pearce, M.T. (2013). Multiple viewpoint systems: Time complexity and the construction of domains for complex musical viewpoints in the harmonization problem. <em>Journal of New Music Research, 42</em>(3), 237–266.</li>
<li><a name="Wiggins2009"></a>Wiggins, G.A., Pearce, M.T. &amp; Mullensiefen, D. (2009). Computational modeling of music cognition and musical creativity. In R.T. Dean (Eds.), <em>The Oxford handbook of computer music</em> (pp. 383–420).</li>
<li><a name="Wiggins2015"></a>Wiggins, G.A. &amp; Forth, J. (2015). IDyOT: A computational theory of creativity as everyday reasoning from learned information. <em>Computational Creativity Research: Towards Creative Machines</em> (pp. 127–148). Springer: USA.</li>
<li><a name="Wing2014"></a>Wing, A.M., Endo, S., Bradbury, A. &amp; Vorberg, D. (2014). Optimal feedback correction in string quartet synchronization. <em>Journal of The Royal Society Interface, 11</em>(93), 20131125.</li>
</ul>
<hr/>
<a name="footnotes"></a><h3>Notes</h3>
<p>There are no Notes.</p>
<hr/>
<a name="contacts"></a><h3>Author Contacts</h3>
<ul class="contactList">
<li><em>Roger Dean</em></li>
<li>Roger.Dean@westernsydney.edu.au</li>
<li>University of Western Sydney</li>
<li>Locked Bag 1797</li>
<li>Penrith</li>
<li>Australia</li>
<li>NSW 2751</li>
</ul>
<br/>
<br/>
<a href="#"><div class="btn" id="go-to-top">Back to top</div></a>