Design and main ideas

Notes on the design of the saugns software (and SAU language), as it has evolved, and the main ideas involved. (The SAU language has evolved in parallel with the software implementing it.)

While somewhat structured according to the history of development, the new design largely builds on the old, and the early program shows main concepts that endure.

Contents

Roots: Early 2011 design

The program was developed from scratch, as a hobby experiment beginning in early 2011; the language started out as the most straightforward way I could get a trivial program to generate sounds and handle timing flexibly according to a script (some early examples are shown on the history page). While details (and further features) have been added and changed since, the core of the language remains similar. Consider the below line (which also runs with modern versions).

Wsin f440 t1.5

If a single big letter (W here) is used to start a wave oscillator of some type, and a small letter followed by a number assigns a value for frequency (f), amplitude (a), or time duration (t), then parsing is trivial. No abstractions for lexing or syntax trees, etc., are necessary. For each W the parser can simply add an oscillator node to a list, and go on to set values in the current node according to any parameter assignments (t, f, etc.) which follow while that oscillator is still being dealt with. Parameters can come in any order when name rather than placement indicates which one is given a value – and it's also easy to allow skipping parameters, by simply giving them all default values.

Several parts creating new nodes may be written in a script. Additional syntax can tell the parser that the next node should be given a time offset value, so that it's not interpreted as something to run in parallel with the previously added node when generating audio. After parsing is done, the resulting list of nodes specifies what to run or do to generate audio.

Sequences and nesting

When a series of things are read from a script, the resulting list of nodes produced can be viewed as a sequential list of "steps" to take, "events" to handle, or "instructions" to follow, for setting up or changing audio generation. For a sequence of nodes with no time offsets between them, the steps taken are a series of configuration changes made one immediately after the other, to set up how audio generation should run thereafter. Audio generation using the things configured is done until finished, or if the time for a next configuartion node to handle arrives, audio rendering is interrupted until that has been done. To run it all, the list of nodes can be examined once to figure out what data structures need to be allocated and prepared (how many oscillators, etc.) in advance, and a second time when actually using those for running the simplistic "program" the script has been translated into.

To allow implementing modulation techniques (basic options for PM, FM and AM), support for lists within which modulating oscillators could be placed was then added to the parser and audio generation code, including support for nesting lists within lists (for modulation chains longer than 2 in depth). Such a list of oscillators is assigned in connection with a parameter for frequency, phase, amplitude, etc. for an oscillator – naturally such text is simply parsed with recursion. This was a fairly simple extension for the language, with time proceeding along a linear list of steps to take like before, though the data used in connection with each main node and time position may branch like a tree – recursion also entering in how audio generation code handles what's configured to run, mirroring what's specified in the scripts.

It was when trying to add more flexible syntax for timing, on top of that, that complexity first began to grow unmanageable for my past self. In the early project I had difficulty keeping it all working – I had not developed the methodology I use nowadays. (I first tried to solve this, a little later, by moving in the direction of adding another stage of script processing between parsing and audio generation. Much later, I saw how to undo that and resimplify.) The design didn't include much of the classic structure of compilers and interpreters, and I didn't have the experience to grow my own design well and make it maintainable. The language also looked quite different from typical well-known and well-described paradigms. This is part of why added features hit a plateau fairly early on. The rest of the reason is, making it all do more required exponentially more, the further I wanted to go towards reaching my goals.

Yet before complexity grows much, in a small and tidy way it's very doable to support things like parallel audio generation (several "voices"), combined with a sequential "time separator", and a more flexible "insert time delay for what's placed after this" modifier. In it can be seen the outline for what may be an ideal, flexible tone, sound, and soundscape generator language.

Main characteristics of the language

Early choices in how the language looks were made for brevity and ease of writing at the small scale, including not requiring any symbol between a name and a value for assignment. Instead, immediate name-value pairs form a large part of the script contents, along with whitespace, different types of brackets for grouping, and some extra symbols. The lack of an assignment symbol imposes some limits on the names used as the left-hand part of an expression, because such names still need to be distinguishable from what follows them – from the values read, which may be numerical, use alphabetical characters, and/or other things. The simplest thing to do is to use one-character names as prefixes, each followed by the value assigned. (Regardless of the length of the prefix-names, having it the same for every prefix-name makes it clear where it ends and the value after it begins.) The pattern of such name-value pairs can be seen in many places in the language, but it doesn't always repeat when breaking down larger subexpressions into parts; e.g. for numerical expressions (later extended further), I settled for the ease of conventional infix syntax for arithmetic in values, rather than elaborating some (in my view clunkier) alternative to it.

The easy and terse solution of using one-letter names for the left-hand part of name-value pairs (in some cases with a special character as the name), works well enough as long as there's not many things to name. It limits possible additions to the language, but works well for smaller, fixed sets of named things. The limitation is also loosened somewhat by allowing several values of different types for the same name (e.g. a number and/or a modulator list). Beyond that, subnames nested under names can be added for extra related parameters. As used, the one-character names very loosely mirror ordinary written language, in that context for smaller things is set by a mixture of capital letter names (e.g. adding a new wave oscillator) and special symbols. Lowercase one-character names denote smaller things which are accessed and used specific to the context, a little like function parameters or record fields.

For user-defined label/variable names, longer and more flexible names are allowed using a special symbol placed before a name string as the left-hand part of a pair, the name string being the right-hand part. As in how a label for an object (e.g. oscillator) after it is written, 'name . (Until 2022 variables were only used to label objects like that, so that they can be referred back to by name. The 2022 numerical variables feature uses a = symbol as the left-hand part of a second name-value pair, in an unusual imitation of conventional assignment syntax. Eventually it was made to look like $name = number, whitespace optional.)

Timing when running and generating audio

The basic design for how time works is very simple. Time for a script begins at 0 ms. A script is translated into a list of timed update instruction nodes, or "steps", each new step taking place after the previous, with or without time (samples to generate) passing between any two steps. Each step configures the system, e.g. telling it to start generating something or changing parameter values for some generator object.

The running of a script primarily advances through time, and secondarily through the timed update steps which are laid out in a list like a timeline of events. After parsing, time is translated into the number of samples to generate, or which should be generated before a time is reached. Time proceeds as output samples are written, while update events only come with time and do not advance it. The handling of such updates takes priority over output generation, pausing it until the updates at that time position have been handled.

Each thing which generates output, such as a wave oscillator, has a time duration of use, beginning at one time position and ending at one time position. The script has ended when both no things remain in use (the time durations set have expired), and no further update steps remain to be waited for (whether while creating sounds or in quiet). In other words, the duration of a script is equal to the total sum length of times to wait before each new update step, plus the remaining duration of play after the last update step for still-active "things" (e.g. oscillators).

Limitations of the early design

A main example of a limitation not dealt with early on is the nature of the nested list, i.e. tree structures, as the form of what can be specified in a script. Early on, the capabilities of old FM synthesizer systems had been an inspiration, but they also support connecting oscillators in arrangements other than the tree structures of carriers and modulators provided for by nested lists; e.g. several carriers may share a modulator, and in general the oscillator connection schema is a DAG (directed acyclic graph) in Yamaha's old FM "algorithms". (Technically, self-modulation could however be viewed as adding self-loops to an otherwise acyclic graph. Possibilities for going beyond acyclic graphs by supporting feedback loops more generally also exist, and are done in some synthesizer systems.)

But most conspicuously missing from the early language are features like defining and using functions with audio generation code in scripts, looping and other control flow constructs, etc. I skipped all that at first because I wanted to explore other things rather than inventing yet another "typical" language. The absence of such things is half of what defines the old design. Often, it isn't audio generation features which suggest the greatest departures in design, but conventional programming language ideas.

Relative to the early language, some kinds of extensions for it would mainly require reworking and complicating the design closer to the parsing end of the program – maybe using another layer of early data structures and processing of them to preprocess script contents into something with a form closer to the old parser output. Other ideas would mainly require reworking the other end of the program, which in the simple design does audio rendering and can be viewed as an interpreter that only follows a flat line of instructions. (When considering creating a more powerful interpreter, whether at the parsing end or at the rendering end, it's also worth noting that some basic big limitations in features are necessary for e.g. time durations for scripts to remain pre-calculable, as they ended up being. A Turing-complete language would not allow it.)

Default values and flexible default times

Eschewing numerical variables and such in the early project, instead I added a very simply parsed and used mechanism for setting script options with S, a pseudo-type name used like an oscillator W with such parameters, but with the effect of changing default values and other settings at parse time instead of adding an object. This has remained, been tweaked (making it lexically scoped in 2023, imitating how generator objects are scoped in lists), and extended whenever convenient.

Other features also tie into default values. Something which was both designed more intricately than needed, and ending up buggier and trickier to get right than much else in the language, was the flexible default values for time durations. In wanting the most concise language possible, I put some thought into how the time duration t parameter for an audio generator should be filled in if nothing is written, and it turned out that intuitive "make it fit other durations as used in the script at the current time position" behavior was deceptively simple.

The old default time logic was debugged and preserved when the project was revived, and as of 2025 can also be found in the latest versions. In the following script, two audio generators are inserted at the same time, but time is set only for one of them. Yet, both are given the same time value, as the default values for the left-out value is increased to match the other one. (Default values may be increased, but never decreased, from the short default value which is 1 second unless changed with the S t option in a script.)

Wsin f220 t2 Wsin f440

But what if audio generators are inserted at different times, not at the same placement in seconds? In such cases, the default time logic counts down by subtracting time advancement in the script from the current longest time at the current position. Values at a later time position however also count when setting values at an earlier position; for consistency, default values must count up going backward. If the above script had been Wsin f220 /1 Wsin f440, the result is the same – 2 seconds for the 1st generator, 1 second for the 2nd, as the 2nd is given a 1 second (short) default time, and the backward counting-up adds the time shift in order to arrive at the longer default time for the 1st generator.

The use of the time separator | separates duration groups in scripts – basically the time scopes under consideration – so that what's after a | never alters default values for what's before and vice-versa.

It's tricky and a matter of taste how to combine some other features with the default time logic. The "compound steps" feature, for arranging a series of sub-steps for one generator as if one step in a script, can either receive longer default times for the first sub-step (later sub-steps have the time of the prior one as their default), or give a longer default time from the combined duration of all sub-steps, but not both (as then forcing consistent behavior demands an infinite loop of ever-lengthening times, the alternative being inconsistent and confusing behavior). I opted for the second option, to give a longer default time using the sum duration, using an unextended (short) default time for the first sub-step.

Trunk: Growth from main 2012 ideas

Experimenting on in 2011 and beyond, and then looking for potentially useful ideas for programming languages and compilers in 2012–2014 (in part while taking a few basic courses in related things), led to a series of old notes; they contain a list of thoughts on a new possible language, and ideas for possible design elaborations. Back then, while studying, I discounted my own early design and language as a starting point for something better, after both learning some theory and having got stuck with the old project. In part, that was because basic standard concepts are usually connected to different-looking syntaxes and designs and implementations, and I couldn't see how what I'd already come up with may correspond to those concepts. I vaguely dreamed of different things, and put it all aside for years, until, on my own, gradually bridging that gap in thought roughly a decade later, arriving at a road to working out more in practice.

Programming-wise the old project ended in April 2012, while considering various ideas in the time 2012–2014. After cleaning up the old program from 2018 and on, some smaller old ideas (alongside new ones) have been explored from time to time and made it into the program, but I maintain a conservative approach towards adding "typical" programming language features.

Early timing & nesting design, more processing stages

The April 2012 program had grown a parser producing a flat list structure for time-ordered events (the main nodes), combined with tree structures attached to events (for data nodes for the things added or changed in a step, which may involve nested syntax elements). This corresponds fairly simply to the language: time-ordering is one dimension of structure, and nesting as in e.g. setting up oscillators for modulaton is another. But! Some of the semantics had begun to be handled after parsing but before interpretation, a middle-layer finishing some details of timing which seemed too messy to attempt during parsing. (It also counted and allocated needed voices for audio generation prior to running it.)

Apart from nesting in the form of lists of objects, another type of nesting exists which applies to time, but it is flattened away by the post-parse semantics code. It was originally added early and buggily for the old "compound steps" feature – a syntax extension for writing a series of timed changes for one object together, without advancing the timing for other objects. That way, timed updates can be grouped per object, rather than rigidly according to a global flow of time. This was implemented by producing the global timeline (flat event/step list) in a step after parsing, while parsing first allows timing to branch out. Temporarily, event nodes for follow-on compound steps are placed in side-timeline lists attached to other event nodes. The same design was tweaked a little for the 2022 "gapshift" feature replacing an earlier oscillator parameter (s) for padding time durations with leading silent time.

Strictly speaking, there's been more than one extra pass of script processing between the two main stages, the parser and the final interpreter (long usually called the "generator" module, as it runs all audio generation). Several loops working through all event nodes have been used to finalize the data before it's fully ready to use. For a period of time 2019–2021, the design was even temporarily complicated by adding another extra module with such a main loop. Then I began to figure out how to reduce loops and passes step by step, and such changes followed.

Later timing & nesting design, a new simplicity

After moving back from the complexity of extra post-parse processing passes during part of 2019–2021, over time I thought of going further. In 2023, it turned out that the entire old feature set could be handled without any full loop through the event nodes between the parser and the audio generation end of the program, with a few changes that actually make behavior better (more consistent), as in how voices are minimally numbered and how amplitude is scaled accordingly by default. The old extra pre-loop in the generator (a.k.a. final interpreter) module could also be removed.

It was clear that everything can't be done event-by-event in a flat way as parsing produces the nodes, but how much grouping is needed for extra parser semantics? It turns out the only limitation in the old design is defined by the time separator | in the SAU language, which is used to note that all which follows is separated in time from all which goes before. Contents between such separators became the units for the mini-pass that replaced a full separate pass for extra semantics. (If no | is ever used in a script, it does in practice still become a full pass.)

Of course, it would also be possible to change the SAU language further so that it is easier to do everything at once for each new event node, instead of needing to group semantics processing like now for series of event nodes. The present feature making things tricky is the flexible default time logic (the basic ideas of duration groups largely going back to 2011); if default values were made simpler, this stuff could be even more simplified.

Parsing like a calculator, calculating like a parser

Very early on, numerical infix expressions were added and handled in the parser with a recursive parsing function for such. The parser worked like a calculator, its output reducing numerical expressions to their results. As long as numerical expressions don't contain side-effects (modifying of state), or aren't evaluated more than once each (unlike what could happen if features like looping or function calls were added) – as long as at least one of those types of features are missing – that simple design continues to work well, without imposing limitations, when other features are added. Otherwise, if that big featureful combination ends up needed, the parser would then need a redesign, in order to allow a numerical expression to be evaluated separately from the initial crunching of the text – and thus possibly several times afterwards.

Such a redesign, to postpone (part of) the evaluation until later, would allow e.g. function bodies or loop contents, if such features are implemented, to behave differently each time they run, just the way they are usually expected to do when a numerical expression in them should evaluate to something different each time. Though it's of course also possible to make the unusual design choice of making each numerical expression a parse-time constant computed using parse-time state – while allowing other syntax to handle state in a more conventional way.

In 2022 one of the two crucial things – statefulness, in the form of mathematical functions like rand() – have appeared in numerical expressions. It remains to be seen what design path to take next if, and if so when, the other also enters the picture... The purer, fuller extension of the older design would make parse-time calculation akin to a smaller language embedded within a larger language – at once playing a role a little like a conventional preprocessor featuring macros, but with different means of setting and getting things, and restricted to being used only in designated places allowed for by the larger language.

More audio generator types

Only in early 2023 did another main audio generator come along – the rumble/random line segments oscillator R, which combines line segment functions, PRNGs, and modulation options. (A bigger side-project of mine, I've written more about how the R oscillator works on my personal site.) Before then, I stuck with only the wavetable oscillator in main versions of the program. Later a simpler noise generator N and an amplitude generator A (for DC, line segments, and summing other signals) joined R.

Adding more types of audio generators is mainly about audio generation code, and so requires little design change. The R and W oscillators share most parameters and most parsing code, for example. Line types, used by R similarly to how W uses wave types, already existed for a different purpose, added in 2011 for use with parameter sweeps – though more variations on value-filling functions implementing each line were needed to fit the new needs.

More audio features under parameters

The feature of sweeps for parameters is the earliest (2011) example of attaching extra logic and audio rendering features to parameters, beyond support for modulators in lists. Such features have been expanded over the years. ADSR envelopes, added later (2025), take what sweeps do to a next level, and similarly could be added onto the older design by mainly some parsing changes and some extra audio rendering code. Sweeps and envelopes are both little state machines used as subcomponents for the larger audio generator types.

Odds and ends from clean-up work

Some general ideas for cleaner code have evolved since reviving the project in November 2017. One little discovery is that staggered region/arena add-only allocator mempools are a perfect fit for much dynamic memory allocations in a program like this. Most of the rest can be handled using a generic dynamic array module (which can be done pretty neatly in C).