Design and main ideas

Notes on the design of the saugns software (and SAU language), as it has evolved, and the main ideas involved. (The SAU language has evolved in parallel with the software implementing it.)

Contents

Roots: Early 2011 design

The program was developed from scratch, as a hobby experiment beginning in early 2011; the language started out as the most straightforward way I could get a trivial program to generate sounds and handle timing flexibly according to a script (some early examples are shown on the history page). While details (and further features) have been added and changed since, the core of the language remains similar. Consider the below line.

Wsin f440 t1.5

If a single big letter (W was used early) is used to start a wave oscillator of some type, and a small letter followed by a number assigns a value for frequency (f), amplitude (a), or time duration (t), then parsing is trivial. No abstractions for lexing or syntax trees, etc., are necessary. For each W the parser can simply add an oscillator node to a list, and go on to set values in the current node according to any parameter assignments which follow while that oscillator is still being dealt with. Additional syntax can tell the parser that the next node should be given a time offset value, so that it's not interpreted as something to run in parallel with the previous when generating audio. After parsing is done, the resulting list of nodes specifies what to run or do to generate audio.

When a series of things are read from a script, the resulting list of nodes produced can be viewed as a sequential list of "steps" to take, "events" to handle, or "instructions" to follow, for setting up or changing audio generation. For a sequence of nodes with no time offsets between them, the steps taken are a series of configuration changes made one immediately after the other, to set up how audio generation should run thereafter. Audio generation using the things configured is done until finished, or if the time for a next configuartion node to handle arrives, audio rendering is interrupted until that has been done. To run it all, the list of nodes can be examined once to figure out what data structures need to be allocated and prepared (how many oscillators, etc.), and a second time when actually using those for running the simplistic "program" the script has been translated into.

To allow implementing modulation techniques, support for lists within which modulating oscillators could be placed was then added to the parser and audio generation code, including support for nesting lists within lists. Such a list of oscillators is assigned in connection with a parameter for frequency, phase, amplitude, etc. for an oscillator – naturally such text is simply parsed with recursion. This was a fairly simple extension for the language, with time proceeding along a linear list of steps to take like before, though the data used in connection with each main node and time position may thereafter branch out – recursion also entering in how audio generation code handles what's configured to run, mirroring what's specified in the scripts.

It was when trying to add more flexible syntax for timing, on top of that, that complexity first began to grow, and I had difficulty back then keeping it all working. The complexity of behavior and difficulty of getting it all right grew much faster than the code size. The design remained far too simple to include most of the classic main structure of compilers and interpreters, and I didn't have the experience to grow my own design well and make it maintainable. The language also looked quite different from typical well-known and well-described paradigms.

Yet before complexity turns into a problem, in a small and tidy way it's very doable to support things like parallel audio generation (several "voices"), combined with a sequential "time separator", and a more flexible "insert time delay for what's placed after this" modifier. That kind of terse and simple language which a fairly trivial program can run could be ideal for growing into a flexible tone generator language. But there's a great, huge chasm between that and a powerful musical language.

Main characteristics of the language

Early choices were made for brevity and ease of writing at the small scale, including not requiring any symbol between a name and a value for assignment. Instead, name-value pairs form a large part of the script contents, along with whitespace, different types of brackets for grouping, and some extra symbols. The lack of an assignment symbol imposes some limits on the names used as the left-hand part of an expression, because such names still need to be distinguishable from what follows them – from the values read, which may be numerical, use alphabetical characters, and/or other things. Generally, the easiest thing to do is to use a one-character name as a prefix followed by a value. The pattern of such name-value pairs can be seen in many places, but not always in subexpressions part of a larger value; e.g. for numerical expressions (later extended further), I settled for the ease of conventional infix syntax, rather than elaborating some (in my view clunkier) alternative to it.

The easy and terse solution of using one-letter names for the left-hand part of name-value pairs (in some cases with a special character as the name), works well enough as long as there's not many things to name. It limits possible additions to the language, but works well for smaller, fixed sets of named things. The limitation is also loosened somewhat by allowing several values of different types for the same name, and adding subnames nested under names for extra related parameters. As used, the one-character names very loosely mirror ordinary written language, in that context for smaller things is set by a mixture of capital letter names (e.g. adding a new wave oscillator) and special symbols. Lowercase one-character names denote smaller things which are accessed and used specific to the context, a little like function parameters or record fields.

For user-defined variable names, longer and more flexible names are allowed using a special symbol placed before a name string as the left-hand part of a pair, the name string being the right-hand part. This in turn is followed by another thing of the usual form – another name-value pair, which gives the variable a value. (This is also the case for the much later 2022 addition of numerical variables, where a = symbol is the left-hand part of a second pair, in an unusual imitation of conventional assignment syntax.)

Timing when running and generating

The basic design for how time works is very simple. Time for a script begins at 0 ms. A script is translated into a list of timed update instruction nodes, or "steps", each new step taking place after the previous, with or without time (samples to generate) passing between any two steps. Each step configures the system, e.g. telling it to start generating something or changing parameter values for some generator object.

The running of a script primarily advances through time, and secondarily through the timed update steps which are laid out in a list like a timeline of events. After parsing, time is translated into the number of samples to generate, or which should be generated before a time is reached. Time proceeds as output samples are written, while update events only come with time and do not advance it. The handling of such updates takes priority over output generation, pausing it until the updates at that time position have been handled.

Each thing which generates output, such as a wave oscillator, has a time duration of use, beginning at one time position and ending at one time position. The script has ended when both no things remain in use (the time durations set have expired), and no further update steps remain to be waited for (whether while creating sounds or in quiet). In other words, the duration of a script is equal to the total sum length of times to wait before each new update step, plus the remaining duration of play after the last update step for still-active "things" (e.g. oscillators).

Limitations of the early design

Some functionality cannot be reached without complicating the core design and beginning to move beyond it. For example, implementing a syntax for more flexible timing arrangement in a script than globally flatly time-ordered steps, led to moving out and growing some data processing between the parsing and the interpreting/audio generation main parts of the program, so that time may branch during parsing but timed nodes still be arranged into a flat timeline of steps before interpreting. That's a main example of something with a solution arrived at early on. (Unfortunately, the early work had bugs only fixed much later, though the basic design, with fixes made, remained viable.)

A main example of a limitation not dealt with early on is the nature of the nested list, i.e. tree structures, as the form of what can be specified in a script. Early on, the capabilities of old FM synthesizer systems had been an inspiration, but they also support connecting oscillators in arrangements other than the tree structures of carriers and modulators provided for by nested lists; e.g. several carriers may share a modulator, and in general the oscillator connection schema is a DAG (directed acyclic graph) in Yamaha's old FM "algorithms". (Technically, self-modulation could however be viewed as adding self-loops to an otherwise acyclic graph. Possibilities for going beyond acyclic graphs by supporting feedback loops more generally also exist, and are done in some synthesizer systems.)

But most conspicuously missing from the early language are features like defining and using functions with audio generation code in scripts, looping and other control flow constructs, etc. I skipped all that at first because I wanted to explore other things than inventing yet another "normal" language. Some things would be easier than others to add, in terms of the needed complexity, but after a while I really didn't know what else to work into it all, and how, and gradually ran out of practical ways to extend the early vision.

Generally, it isn't audio generation features which suggest the greatest departures in design, but programming language ideas which in one or another way allow re-using what's written one time more times when a script runs. And there's a variety of potential ideas to explore for turning audio definitions entered in a script into a "template" of sorts for further audio in that script, and other ideas increasing language expressiveness.

Relative to the early language, some kinds of extensions for it would mainly require reworking and complicating the design closer to the parsing end of the program, while others mainly require reworking the other end which ultimately runs what is produced. (When considering creating a more powerful interpreter, it's also worth noting that some basic big limitations in features are necessary for e.g. time durations for scripts to remain pre-calculable, as they ended up being. A Turing-complete language would not allow it.)

Trunk: Experimentation and 2012 ideas

Experimenting on in 2011 and beyond, and then looking for potentially useful ideas for programming languages and compilers in 2012–2014 (in part while taking a few basic courses in related things), led to a series of old notes; they contain a list of thoughts on a new possible language, and ideas for possible design elaborations. Back then, while studying, I discounted my own early design and language as a starting point for something better, after learning some theory. In part, that was because basic standard concepts are usually connected to different-looking syntaxes and designs and implementations, and I couldn't see how what I'd already come up with may correspond to those concepts. I vaguely dreamed of different things, and put it all aside for years, until, on my own, gradually bridging that gap in thought roughly a decade later, arriving at a road to working out more in practice.

Various good little ideas made it into the program design in the year between April 2011 and April 2012 (when the old work ended), but in a rough and sometimes buggy form, requiring later clean-up. Other ideas from 2012–2014, along with later ones, remain to be explored – or really elaborated into something both useful and concrete enough to explore – more thoroughly for extending and reworking features. The old code also contained some little false starts, more of them being added over time.

Growing the early design

The April 2012 program had grown a curious approach to parsing, in which a combination of flat list structures (for time-ordering) and a simplified tree structure (for nested syntax elements) was produced. This corresponds, in a fairly basic way, to the nature of the language: time-ordering is one dimension of structure, and nesting as in nested elements such as lists is another. (Nested lists of oscillator objects is the main and most obvious type of nesting.) As for the program structure, a middle-layer between the parsing end and the interpreting and audio generation end of the program had come along by 2012, initially from moving out semantic handling from the parser after it became overgrown. This middle developed to track timing in simple ways, and count and allocate voices for audio generation prior to running it.

The way time placement and nesting works together in the language, one main node from the parser describing something placed at a time (in a list of timed such nodes) may provide a tree of attached data, and another main node for a later time may provide an update for part of that data and include a sub-tree which is to become connected to the data tree (a modulator is added after a time delay to an oscillator which in turn was and remains used as a modulator for a carrier, say). Such a main node with an update could also include data which clears links, i.e. removes branches, from an object part of a tree (modulators of a given type are removed from an oscillator, say).

Another type of nesting which applies to time, originally only for a syntax extension part of the early design, places a series of those main nodes in a side-list attached to a main node; after parsing, such side-lists are merged into a single time-ordered list of the main nodes. This allows time placement to branch out several times in a script and during parsing, while the data to later be interpreted or "ran" remains a flat update-after-update list. This is used for the old "compound steps" feature, and for the new "gapshift" variation of it added in 2022 to replace an earlier oscillator parameter (s) for padding time durations with leading silent time.

The basic idea for the compound steps feature is to allow a series of timed steps in a script which update the same object (an oscillator, say), to be written together as a kind of unit, not advancing the timing for later-placed steps for other objects. That way, timed updates can be grouped per object, rather than rigidly according to the flow of time. (The 2012 program had several types of bugs for that feature, which could however all be fixed without greatly complicating anything.) Once that functionality is there, it can also be used for differently-looking things of an essentially similar kind.

Parsing like a calculator, calculating like a parser

Very early on, numerical infix expressions were added and handled in the parser with a recursive parsing function for such. The parser worked like a calculator, its output reducing numerical expressions to their results. As long as numerical expressions don't contain side-effects (modifying of state), or aren't evaluated more than once each (unlike what could happen if features like looping or function calls were added) – as long as at least one of those types of features are missing – that simple design continues to work well, without imposing limitations, when other features are added. Otherwise, if that big featureful combination ends up needed, the parser would then need a redesign, in order to allow a numerical expression to be evaluated separately from the initial crunching of the text – and thus possibly several times afterwards.

Such a redesign, to postpone (part of) the evaluation until later, would allow e.g. function bodies or loop contents, if such features are implemented, to behave differently each time they run, just the way they are usually expected to do when a numerical expression in them should evaluate to something different each time. Though it's of course also possible to make the unusual design choice of making each numerical expression a parse-time constant computed using parse-time state – while allowing other syntax to handle state in a more conventional way.

In 2022 one of the two crucial things – statefulness, in the form of mathematical functions like rand() – have appeared in numerical expressions. It remains to be seen what design path to take next if, and if so when, the other also enters the picture... The purer, fuller extension of the older design would make parse-time calculation akin to a smaller language embedded within a larger language – at once playing a role a little like a conventional preprocessor featuring macros, but with different means of setting and getting things, and restricted to being used only in designated places allowed for by the larger language.

Odds and ends from clean-up work

Some general ideas have evolved since reviving the project in November 2017, concerning making and keeping it cleaner and more modular. Small-scale experiments in how to (re)structure or otherwise improve little pieces of the whole have taken a large part of the focus up to around November 2021, in a slow-going pedantic clean-up phase for the project. (It's not been pragmatic at all, but it's been a good-enough way to spend some time in a longer, slow and somewhat glum period of life.)

To be continued...