Design and main ideas

Notes on the design of the saugns software (and SAU language), as it has evolved, and the main ideas involved. (The SAU language has evolved in parallel with the software implementing it.)

While somewhat structured according to the history of development, the new design largely builds on the old, and the early program shows main concepts that endure.

Contents

Roots: Early 2011 design

The program was developed from scratch, as a hobby experiment beginning in early 2011; the language started out as the most straightforward way I could get a trivial program to generate sounds and handle timing flexibly according to a script (some early examples are shown on the history page). While details (and further features) have been added and changed since, the core of the language remains similar. Consider the below line.

Wsin f440 t1.5

If a single big letter (W was used early) is used to start a wave oscillator of some type, and a small letter followed by a number assigns a value for frequency (f), amplitude (a), or time duration (t), then parsing is trivial. No abstractions for lexing or syntax trees, etc., are necessary. For each W the parser can simply add an oscillator node to a list, and go on to set values in the current node according to any parameter assignments (t, f, etc.) which follow while that oscillator is still being dealt with. Parameters can come in any order when name rather than placement indicates which is given a value – and it's also easy to allow skipping parameters, by simply giving them all default values.

Several parts creating new nodes may be written in a script. Additional syntax can tell the parser that the next node should be given a time offset value, so that it's not interpreted as something to run in parallel with the previously added node when generating audio. After parsing is done, the resulting list of nodes specifies what to run or do to generate audio.

Sequences and nesting

When a series of things are read from a script, the resulting list of nodes produced can be viewed as a sequential list of "steps" to take, "events" to handle, or "instructions" to follow, for setting up or changing audio generation. For a sequence of nodes with no time offsets between them, the steps taken are a series of configuration changes made one immediately after the other, to set up how audio generation should run thereafter. Audio generation using the things configured is done until finished, or if the time for a next configuartion node to handle arrives, audio rendering is interrupted until that has been done. To run it all, the list of nodes can be examined once to figure out what data structures need to be allocated and prepared (how many oscillators, etc.), and a second time when actually using those for running the simplistic "program" the script has been translated into.

To allow implementing modulation techniques, support for lists within which modulating oscillators could be placed was then added to the parser and audio generation code, including support for nesting lists within lists. Such a list of oscillators is assigned in connection with a parameter for frequency, phase, amplitude, etc. for an oscillator – naturally such text is simply parsed with recursion. This was a fairly simple extension for the language, with time proceeding along a linear list of steps to take like before, though the data used in connection with each main node and time position may thereafter branch out – recursion also entering in how audio generation code handles what's configured to run, mirroring what's specified in the scripts.

It was when trying to add more flexible syntax for timing, on top of that, that complexity first began to grow, and I had difficulty back then keeping it all working. The complexity of behavior and difficulty of getting it all right grew much faster than the code size. (I first tried to solve this, a little later, by moving in the direction of adding another stage of script processing between parsing and audio generation. Much later, I saw how to undo that and resimplify.) The design didn't include much of the classic structure of compilers and interpreters, and I didn't have the experience to grow my own design well and make it maintainable. The language also looked quite different from typical well-known and well-described paradigms. This is part of why added features hit a plateau fairly early on.

Yet before complexity turns into a problem, in a small and tidy way it's very doable to support things like parallel audio generation (several "voices"), combined with a sequential "time separator", and a more flexible "insert time delay for what's placed after this" modifier. That kind of terse and simple language which a fairly trivial program can run could be ideal for growing into a flexible tone generator language. But there's a great, huge chasm between that and a powerful musical language.

Main characteristics of the language

Early choices were made for brevity and ease of writing at the small scale, including not requiring any symbol between a name and a value for assignment. Instead, name-value pairs form a large part of the script contents, along with whitespace, different types of brackets for grouping, and some extra symbols. The lack of an assignment symbol imposes some limits on the names used as the left-hand part of an expression, because such names still need to be distinguishable from what follows them – from the values read, which may be numerical, use alphabetical characters, and/or other things. The simplest thing to do is to use one-character names as prefixes, each followed by a value. (Regardless of the length of the prefix-names, having it the same for each prefix-name makes it clear where it ends and the value after it begins.) The pattern of such name-value pairs can be seen in many places in the language, but it doesn't always repeat when breaking down larger subexpressions into parts; e.g. for numerical expressions (later extended further), I settled for the ease of conventional infix syntax for arithmetic in values, rather than elaborating some (in my view clunkier) alternative to it.

The easy and terse solution of using one-letter names for the left-hand part of name-value pairs (in some cases with a special character as the name), works well enough as long as there's not many things to name. It limits possible additions to the language, but works well for smaller, fixed sets of named things. The limitation is also loosened somewhat by allowing several values of different types for the same name (e.g. a number and/or a modulator list). Beyond that, subnames nested under names can be added for extra related parameters. As used, the one-character names very loosely mirror ordinary written language, in that context for smaller things is set by a mixture of capital letter names (e.g. adding a new wave oscillator) and special symbols. Lowercase one-character names denote smaller things which are accessed and used specific to the context, a little like function parameters or record fields.

For user-defined variable names, longer and more flexible names are allowed using a special symbol placed before a name string as the left-hand part of a pair, the name string being the right-hand part. This in turn is followed by another name-value pair, something with a value – which by being written after the former is associated with it, labeled by it. The latter thing could be a wave oscillator, and until 2022 variables were only used to label objects so that they can be referred back to by name. (The much later 2022 numerical variables feature uses a = symbol as the left-hand part of a second name-value pair, in an unusual imitation of conventional assignment syntax.)

Timing when running and generating

The basic design for how time works is very simple. Time for a script begins at 0 ms. A script is translated into a list of timed update instruction nodes, or "steps", each new step taking place after the previous, with or without time (samples to generate) passing between any two steps. Each step configures the system, e.g. telling it to start generating something or changing parameter values for some generator object.

The running of a script primarily advances through time, and secondarily through the timed update steps which are laid out in a list like a timeline of events. After parsing, time is translated into the number of samples to generate, or which should be generated before a time is reached. Time proceeds as output samples are written, while update events only come with time and do not advance it. The handling of such updates takes priority over output generation, pausing it until the updates at that time position have been handled.

Each thing which generates output, such as a wave oscillator, has a time duration of use, beginning at one time position and ending at one time position. The script has ended when both no things remain in use (the time durations set have expired), and no further update steps remain to be waited for (whether while creating sounds or in quiet). In other words, the duration of a script is equal to the total sum length of times to wait before each new update step, plus the remaining duration of play after the last update step for still-active "things" (e.g. oscillators).

Limitations of the early design

Some functionality cannot be reached without complicating the core design and beginning to move beyond it. For example, implementing a syntax for more flexible timing arrangement in a script than globally flatly time-ordered steps, led to moving out and growing some data processing between the parsing and the interpreting/audio generation main parts of the program, so that time may branch during parsing but timed nodes still be arranged into a flat timeline of steps before interpreting. That's a main example of something with a solution arrived at early on. (Unfortunately, the early work had bugs only fixed much later, though the basic design, with fixes made, remained viable.)

A main example of a limitation not dealt with early on is the nature of the nested list, i.e. tree structures, as the form of what can be specified in a script. Early on, the capabilities of old FM synthesizer systems had been an inspiration, but they also support connecting oscillators in arrangements other than the tree structures of carriers and modulators provided for by nested lists; e.g. several carriers may share a modulator, and in general the oscillator connection schema is a DAG (directed acyclic graph) in Yamaha's old FM "algorithms". (Technically, self-modulation could however be viewed as adding self-loops to an otherwise acyclic graph. Possibilities for going beyond acyclic graphs by supporting feedback loops more generally also exist, and are done in some synthesizer systems.)

But most conspicuously missing from the early language are features like defining and using functions with audio generation code in scripts, looping and other control flow constructs, etc. I skipped all that at first because I wanted to explore other things than inventing yet another "normal" language. Some things would be easier than others to add, in terms of the needed complexity, but after a while I really didn't know what else to work into it all, and how, and gradually ran out of practical ways to extend the early vision.

Generally, it isn't audio generation features which suggest the greatest departures in design, but programming language ideas which in one or another way allow re-using what's written one time more times when a script runs. And there's a variety of potential ideas to explore for turning audio definitions entered in a script into a "template" of sorts for further audio in that script, and other ideas increasing language expressiveness.

Relative to the early language, some kinds of extensions for it would mainly require reworking and complicating the design closer to the parsing end of the program, while others mainly require reworking the other end which ultimately runs what is produced. (When considering creating a more powerful interpreter, it's also worth noting that some basic big limitations in features are necessary for e.g. time durations for scripts to remain pre-calculable, as they ended up being. A Turing-complete language would not allow it.)

Trunk: Growth from main 2012 ideas

Experimenting on in 2011 and beyond, and then looking for potentially useful ideas for programming languages and compilers in 2012–2014 (in part while taking a few basic courses in related things), led to a series of old notes; they contain a list of thoughts on a new possible language, and ideas for possible design elaborations. Back then, while studying, I discounted my own early design and language as a starting point for something better, after learning some theory. In part, that was because basic standard concepts are usually connected to different-looking syntaxes and designs and implementations, and I couldn't see how what I'd already come up with may correspond to those concepts. I vaguely dreamed of different things, and put it all aside for years, until, on my own, gradually bridging that gap in thought roughly a decade later, arriving at a road to working out more in practice.

Various good little ideas made it into the program design in the year between April 2011 and April 2012 (when the old work ended), but in a rough and sometimes buggy form, requiring later clean-up. Other ideas from 2012–2014, along with later ones, remain to be explored – or really elaborated into something both useful and concrete enough to explore – more thoroughly for extending and reworking features. The old code also contained some little false starts, more of them being added over time.

Early timing & nesting design, more processing stages

The April 2012 program had grown a parser producing a flat list structure for time-ordered events (the main nodes), combined with tree structures attached to events (for data nodes for the things added or changed in a step, which may involve nested syntax elements). This corresponds fairly simply to the language: time-ordering is one dimension of structure, and nesting as in e.g. setting up oscillators for modulaton is another. But some of the semantics had begun to be handled after parsing but before interpretation, a middle-layer straightening out some final details of timing which seemed too messy to attempt during parsing. (It also counted and allocated needed voices for audio generation prior to running it.)

The way time placement and nesting works together in the language, one main node in the list of timed steps may provide a full tree of data nodes used e.g. when an oscillator arrangement is first introduced, while another main node for a later time provides an update including a sub-tree with data nodes for only what was changed (an ocillator is changed to use a new modulator introduced after a time, say). An updating main node could also contain data which clears links, e.g. removing modulators from an oscillator.

Another type of nesting exists which applies to time, but it is flattened away by the post-parse semantics code. It was originally added early and buggily for the old "compound steps" feature – a syntax extension for writing a series of timed changes for one object together, without advancing the timing for other objects. That way, timed updates can be grouped per object, rather than rigidly according to a global flow of time. This was implemented by producing the global timeline just after parsing, first allowing timing to branch out – temporarily placing such main nodes in side-timeline lists attached to other main nodes. The same design was tweaked a little for the 2022 "gapshift" feature replacing an earlier oscillator parameter (s) for padding time durations with leading silent time.

Strictly speaking, there's been more than one extra pass of script processing between the two main stages, the parser and the interpreter (long usually called the "generator" module). Several loops working through all event nodes have long been used to finalize the data before it's ready for the audio generation end of the program. For a period of time 2019–2021, the design was even temporarily complicated by adding another extra module with such a main loop, before later (having figured out that it was pointless to have yet another pass) replacing it with a small addition to another pass closer to the parser.

Later timing & nesting design, a new simplicity

After moving back from the complexity of extra post-parse processing passes during part of 2019–2021, over time I thought of going further, of more ways to reduce loops through event nodes, merging loops or using shorter ones. As of 2023, it turns out that the entire old feature set can be handled without any full loop between the parser and the audio generation end of the program, with a few changes that actually make behavior better (more consistent). The old extra pre-loop in the interpreter a.k.a. "generator" module could also be removed.

A very old feature of the SAU language provided the key; the time separator |, which is used to note that all which follows is separated in time from all which goes before, made a good basis for organizing the logic. Instead of either trying to do everything immediately, at the time of each new event node in the parser (or something close to that – which was the failed first 2011–2012 approach for fewer passes of processing), or the opposite of doing a lot only after all the nodes have already been created, a middle way turned out to be to organize a series of semantics loops per |-separated "duration group" in a script. Such a design is also more suitable for extending into further nesting and timing features (like allowing | inside of nested structures).

Of course, it would also be possible to alter the SAU language so that it is easy to do everything at once for each new event node, instead of needing to group semantics processing at all for series of event nodes like at present. But the present features, including flexible default time logic (the basic ideas of duration groups largely going back to 2011), seem worth it. Despite being a little tricky, these features can be implemented in a fairly elegant way, the design reaching a new simplicity.

Parsing like a calculator, calculating like a parser

Very early on, numerical infix expressions were added and handled in the parser with a recursive parsing function for such. The parser worked like a calculator, its output reducing numerical expressions to their results. As long as numerical expressions don't contain side-effects (modifying of state), or aren't evaluated more than once each (unlike what could happen if features like looping or function calls were added) – as long as at least one of those types of features are missing – that simple design continues to work well, without imposing limitations, when other features are added. Otherwise, if that big featureful combination ends up needed, the parser would then need a redesign, in order to allow a numerical expression to be evaluated separately from the initial crunching of the text – and thus possibly several times afterwards.

Such a redesign, to postpone (part of) the evaluation until later, would allow e.g. function bodies or loop contents, if such features are implemented, to behave differently each time they run, just the way they are usually expected to do when a numerical expression in them should evaluate to something different each time. Though it's of course also possible to make the unusual design choice of making each numerical expression a parse-time constant computed using parse-time state – while allowing other syntax to handle state in a more conventional way.

In 2022 one of the two crucial things – statefulness, in the form of mathematical functions like rand() – have appeared in numerical expressions. It remains to be seen what design path to take next if, and if so when, the other also enters the picture... The purer, fuller extension of the older design would make parse-time calculation akin to a smaller language embedded within a larger language – at once playing a role a little like a conventional preprocessor featuring macros, but with different means of setting and getting things, and restricted to being used only in designated places allowed for by the larger language.

More generator types

Only in early 2023 did another main audio generator come along – the rumble/random line segments oscillator R, which combines line segment functions, PRNGs, and modulation options. (I've written more about how the R oscillator works on my personal site.) Before then, I stuck with only the wave oscillator in main versions of the program – but had toyed with a simpler noise generator, and thought of other things. (The one wave oscillator has however had several wave types to choose from from the start, the set of them being tweaked over the years.)

Adding more types of audio generators is mainly about audio generation code, when sticking to the same basic design – apart from data structures becoming a little more complicated for generality to support more types of things. The R and W generator types share most parameters and most parsing code. The line types, used by R similarly to how W uses wave types, already existed for a different purpose, added in 2011 for use with a parameter sweep feature since improved several times. (Extending the set of line types however became interesting with R; used by it, line types can find several new uses, also including as envelope shapes when R is used for modulation in particular ways.)

Odds and ends from clean-up work

Some general ideas for cleaner code have evolved since reviving the project in November 2017. One little discovery is that staggered add-only allocator mempools are a perfect fit for most dynamic memory allocations in a program like this. Most of the rest can be handled using a simple dynamic array module. Bye-bye, nearly all memory clean-up code.