Notes on the design of the saugns software (and SAU language), as it has evolved, and the main ideas involved. (The SAU language has evolved in parallel with the software implementing it.)
While somewhat structured according to the history of development, the new design largely builds on the old, and the early program shows main concepts that endure.
The program was developed from scratch, as a hobby experiment beginning in early 2011; the language started out as the most straightforward way I could get a trivial program to generate sounds and handle timing flexibly according to a script (some early examples are shown on the history page). While details (and further features) have been added and changed since, the core of the language remains similar. Consider the below line (which also runs with modern versions).
Wsin f440 t1.5
If a single big letter (W here) is used to start a wave oscillator of some type, and a small letter followed by a number assigns a value for frequency (f), amplitude (a), or time duration (t), then parsing is trivial. No abstractions for lexing or syntax trees, etc., are necessary. For each W the parser can simply add an oscillator node to a list, and go on to set values in the current node according to any parameter assignments (t, f, etc.) which follow while that oscillator is still being dealt with. Parameters can come in any order when name rather than placement indicates which one is given a value – and it's also easy to allow skipping parameters, by simply giving them all default values.
Several parts creating new nodes may be written in a script. Additional syntax can tell the parser that the next node should be given a time offset value, so that it's not interpreted as something to run in parallel with the previously added node when generating audio. After parsing is done, the resulting list of nodes specifies what to run or do to generate audio.
When a series of things are read from a script, the resulting list of nodes produced can be viewed as a sequential list of "steps" to take, "events" to handle, or "instructions" to follow, for setting up or changing audio generation. For a sequence of nodes with no time offsets between them, the steps taken are a series of configuration changes made one immediately after the other, to set up how audio generation should run thereafter. Audio generation using the things configured is done until finished, or if the time for a next configuartion node to handle arrives, audio rendering is interrupted until that has been done. To run it all, the list of nodes can be examined once to figure out what data structures need to be allocated and prepared (how many oscillators, etc.) in advance, and a second time when actually using those for running the simplistic "program" the script has been translated into.
To allow implementing modulation techniques (basic options for PM, FM and AM), support for lists within which modulating oscillators could be placed was then added to the parser and audio generation code, including support for nesting lists within lists (for modulation chains longer than 2 in depth). Such a list of oscillators is assigned in connection with a parameter for frequency, phase, amplitude, etc. for an oscillator – naturally such text is simply parsed with recursion. This was a fairly simple extension for the language, with time proceeding along a linear list of steps to take like before, though the data used in connection with each main node and time position may branch like a tree – recursion also entering in how audio generation code handles what's configured to run, mirroring what's specified in the scripts.
It was when trying to add more flexible syntax for timing, on top of that, that complexity first began to grow unmanageable for my past self. In the early project I had difficulty keeping it all working – I had not developed the methodology I use nowadays. (I first tried to solve this, a little later, by moving in the direction of adding another stage of script processing between parsing and audio generation. Much later, I saw how to undo that and resimplify.) The design didn't include much of the classic structure of compilers and interpreters, and I didn't have the experience to grow my own design well and make it maintainable. The language also looked quite different from typical well-known and well-described paradigms. This is part of why added features hit a plateau fairly early on. The rest of the reason is, making it all do more required exponentially more, the further I wanted to go towards reaching my goals.
Yet before complexity grows much, in a small and tidy way it's very doable to support things like parallel audio generation (several "voices"), combined with a sequential "time separator", and a more flexible "insert time delay for what's placed after this" modifier. In it can be seen the outline for what may be an ideal, flexible tone, sound, and soundscape generator language.
Early choices in how the language looks were made for brevity and ease of writing at the small scale, including not requiring any symbol between a name and a value for assignment. Instead, immediate name-value pairs form a large part of the script contents, along with whitespace, different types of brackets for grouping, and some extra symbols. The lack of an assignment symbol imposes some limits on the names used as the left-hand part of an expression, because such names still need to be distinguishable from what follows them – from the values read, which may be numerical, use alphabetical characters, and/or other things. The simplest thing to do is to use one-character names as prefixes, each followed by the value assigned. (Regardless of the length of the prefix-names, having it the same for every prefix-name makes it clear where it ends and the value after it begins.) The pattern of such name-value pairs can be seen in many places in the language, but it doesn't always repeat when breaking down larger subexpressions into parts; e.g. for numerical expressions (later extended further), I settled for the ease of conventional infix syntax for arithmetic in values, rather than elaborating some (in my view clunkier) alternative to it.
The easy and terse solution of using one-letter names for the left-hand part of name-value pairs (in some cases with a special character as the name), works well enough as long as there's not many things to name. It limits possible additions to the language, but works well for smaller, fixed sets of named things. The limitation is also loosened somewhat by allowing several values of different types for the same name (e.g. a number and/or a modulator list). Beyond that, subnames nested under names can be added for extra related parameters. As used, the one-character names very loosely mirror ordinary written language, in that context for smaller things is set by a mixture of capital letter names (e.g. adding a new wave oscillator) and special symbols. Lowercase one-character names denote smaller things which are accessed and used specific to the context, a little like function parameters or record fields.
For user-defined label/variable names, longer and more flexible names are allowed using a special symbol placed before a name string as the left-hand part of a pair, the name string being the right-hand part. As in how a label for an object (e.g. oscillator) after it is written, 'name . (Until 2022 variables were only used to label objects like that, so that they can be referred back to by name. The 2022 numerical variables feature uses a = symbol as the left-hand part of a second name-value pair, in an unusual imitation of conventional assignment syntax. Eventually it was made to look like $name = number, whitespace optional.)
The basic design for how time works is very simple. Time for a script begins at 0 ms. A script is translated into a list of timed update instruction nodes, or "steps", each new step taking place after the previous, with or without time (samples to generate) passing between any two steps. Each step configures the system, e.g. telling it to start generating something or changing parameter values for some generator object.
The running of a script primarily advances through time, and secondarily through the timed update steps which are laid out in a list like a timeline of events. After parsing, time is translated into the number of samples to generate, or which should be generated before a time is reached. Time proceeds as output samples are written, while update events only come with time and do not advance it. The handling of such updates takes priority over output generation, pausing it until the updates at that time position have been handled.
Each thing which generates output, such as a wave oscillator, has a time duration of use, beginning at one time position and ending at one time position. The script has ended when both no things remain in use (the time durations set have expired), and no further update steps remain to be waited for (whether while creating sounds or in quiet). In other words, the duration of a script is equal to the total sum length of times to wait before each new update step, plus the remaining duration of play after the last update step for still-active "things" (e.g. oscillators).
A main example of a limitation not dealt with early on is the nature of the nested list, i.e. tree structures, as the form of what can be specified in a script. Early on, the capabilities of old FM synthesizer systems had been an inspiration, but they also support connecting oscillators in arrangements other than the tree structures of carriers and modulators provided for by nested lists; e.g. several carriers may share a modulator, and in general the oscillator connection schema is a DAG (directed acyclic graph) in Yamaha's old FM "algorithms". (Technically, self-modulation could however be viewed as adding self-loops to an otherwise acyclic graph. Possibilities for going beyond acyclic graphs by supporting feedback loops more generally also exist, and are done in some synthesizer systems.)
But most conspicuously missing from the early language are features like defining and using functions with audio generation code in scripts, looping and other control flow constructs, etc. I skipped all that at first because I wanted to explore other things rather than inventing yet another "typical" language. The absence of such things is half of what defines the old design. Often, it isn't audio generation features which suggest the greatest departures in design, but conventional programming language ideas.
Relative to the early language, some kinds of extensions for it would mainly require reworking and complicating the design closer to the parsing end of the program – maybe using another layer of early data structures and processing of them to preprocess script contents into something with a form closer to the old parser output. Other ideas would mainly require reworking the other end of the program, which in the simple design does audio rendering and can be viewed as an interpreter that only follows a flat line of instructions. (When considering creating a more powerful interpreter, whether at the parsing end or at the rendering end, it's also worth noting that some basic big limitations in features are necessary for e.g. time durations for scripts to remain pre-calculable, as they ended up being. A Turing-complete language would not allow it.)
Eschewing numerical variables and such in the early project, instead I added a very simply parsed and used mechanism for setting script options with S, a pseudo-type name used like an oscillator W with such parameters, but with the effect of changing default values and other settings at parse time instead of adding an object. This has remained, been tweaked (making it lexically scoped in 2023, imitating how generator objects are scoped in lists), and extended whenever convenient.
Other features also tie into default values. Something which was both designed more intricately than needed, and ending up buggier and trickier to get right than much else in the language, was the flexible default values for time durations. In wanting the most concise language possible, I put some thought into how the time duration t parameter for an audio generator should be filled in if nothing is written, and it turned out that intuitive "make it fit other durations as used in the script at the current time position" behavior was deceptively simple.
The old default time logic was debugged and preserved when the project was revived, and as of 2025 can also be found in the latest versions. In the following script, two audio generators are inserted at the same time, but time is set only for one of them. Yet, both are given the same time value, as the default values for the left-out value is increased to match the other one. (Default values may be increased, but never decreased, from the short default value which is 1 second unless changed with the S t option in a script.)
Wsin f220 t2 Wsin f440
But what if audio generators are inserted at different times, not at the same placement in seconds? In such cases, the default time logic counts down by subtracting time advancement in the script from the current longest time at the current position. Values at a later time position however also count when setting values at an earlier position; for consistency, default values must count up going backward. If the above script had been Wsin f220 /1 Wsin f440, the result is the same – 2 seconds for the 1st generator, 1 second for the 2nd, as the 2nd is given a 1 second (short) default time, and the backward counting-up adds the time shift in order to arrive at the longer default time for the 1st generator.
The use of the time separator | separates duration groups in scripts – basically the time scopes under consideration – so that what's after a | never alters default values for what's before and vice-versa.
It's tricky and a matter of taste how to combine some other features with the default time logic. The "compound steps" feature, for arranging a series of sub-steps for one generator as if one step in a script, can either receive longer default times for the first sub-step (later sub-steps have the time of the prior one as their default), or give a longer default time from the combined duration of all sub-steps, but not both (as then forcing consistent behavior demands an infinite loop of ever-lengthening times, the alternative being inconsistent and confusing behavior). I opted for the second option, to give a longer default time using the sum duration, using an unextended (short) default time for the first sub-step.
Experimenting on in 2011 and beyond, and then looking for potentially useful ideas for programming languages and compilers in 2012–2014 (in part while taking a few basic courses in related things), led to a series of old notes; they contain a list of thoughts on a new possible language, and ideas for possible design elaborations. Back then, while studying, I discounted my own early design and language as a starting point for something better, after both learning some theory and having got stuck with the old project. In part, that was because basic standard concepts are usually connected to different-looking syntaxes and designs and implementations, and I couldn't see how what I'd already come up with may correspond to those concepts. I vaguely dreamed of different things, and put it all aside for years, until, on my own, gradually bridging that gap in thought roughly a decade later, arriving at a road to working out more in practice.
Programming-wise the old project ended in April 2012, while considering various ideas in the time 2012–2014. After cleaning up the old program from 2018 and on, some smaller old ideas (alongside new ones) have been explored from time to time and made it into the program, but I maintain a conservative approach towards adding "typical" programming language features.
The April 2012 program had grown a parser producing a flat main list of time-ordered events (the main nodes), combined with tree structures attached to those events (for data nodes for the things added or changed in a step, which may involve nested syntax elements). This corresponds fairly simply to the language: time-ordering (with time placement syntax) is one dimension of structure, and nesting as in e.g. setting up oscillators for modulaton in lists is another. But! Some of the semantics had begun to be handled after parsing but before interpretation, a middle-layer finishing some details of timing which seemed too messy to attempt during parsing. (It also counted and allocated needed voices for audio generation prior to running it.)
One kind of nesting in the SAU language however applies to time, the ; "compound steps" feature. This nesting, allowing time placement to branch out, is flattened away by the post-parse semantics code. It was (buggily) added early on to make script text neater, allowing writing a series of timed changes for one object together, without advancing the timing for other objects. That way, timed updates can be grouped per object, rather than mainly according to a global flow of time. To implement this, during parsing the event nodes for follow-on compound steps are placed in side-timeline lists attached to the main event node made before the first ;. Then the timelines are merged into one. The same design was tweaked a little for the 2022 "gapshift" feature, which uses the same mechanism to replace an earlier oscillator parameter (s) for padding time durations with leading silent time.
Strictly speaking, there's been even more than one extra pass of script processing between the two main stages, the parser and the final interpreter (long usually called the "generator" module, as it runs all audio generation). Several loops working through all event nodes have been used to finalize the data before it's fully ready to use. For a period of time 2019–2021, the design was temporarily complicated by adding another extra module and set of data structures, with another conversion pass. Then I began to figure out how to do more with fewer loops, and over time (2023–2025) simplified away all the extra layers and loops until only one parser pass and one audio rendering pass remained.
While it was clear that everything can't be done event-by-event straight away while parsing produces nodes, it turns out it's enough to accomodate the ; compound step (smaller grouping) and the time separator | (larger grouping) in the SAU language. The latter says that all which follows it is separated in time from all which goes before it (and is tied to the old (2011) feature of flexible default time logic). Contents delimited by | became the units for the mini-pass that replaced a full separate pass for extra semantics just after parsing. (If no | is ever used in a script, it does in practice still become a full pass.)
Voices are simultaneous sounds, but technically more like "main audio generators" or outputs which behave as such. One thing done first more as a puzzle, later tied to features, is counting and allocating the voices before running audio generation, while keeping the number down. Code to count voices had been in my program since earlu on, but only after resuming the project did I suddenly realize the pre-counted value could be used to auto-adjust amplitude per script (like a global volume control, see S a.m).
But how to count that value, the voice number? For each voice, in short a current time duration and a main audio generator is tracked. There's an incoming list of events from the script with a "wait for a time" value attached to each (meaning subtract the wait time from the duration of all current voices), and some audio generators configured by events play the main role and are given a voice each, with updates for them often setting a new time duration which translates into a new or extended voice duration. Either you can reuse an expired voice or you need to add another one when faced with a new audio generator needing a voice.
The early implementation avoided the simplest, greedy algorithm which just reuses a voice as soon as something stops running and something is added in the script, because I didn't think to allow arbitrary renumbering of voices. What if an old voice, now expired, comes of use later again? An audio generator may have a label allowing it to be given a new time later. If free renumbering of such voices is allowed (a later change), then the crudest greedy algorithm is also the optimal solution; it is the approach which avoids excess voice count using minimal computation. It can also be implemented without requiring any separate full semantics pass/loop, unlike avoiding the greedy renumbering approach with minimal inflation of voice count.
But what about counting and allocating a minimal number of audio generator objects? I put it off until 2025, but ultimately settled for an approach without renumbering of non-dead objects, which avoids reusing objects given labels in scripts, after experimenting. (To allow free renumbering for a minimal count has a complication here – audio generators refer to each other by their IDs, and dependable IDs are internally needed for modulator lists and such. Thus you then need another set of IDs which is not freely renumbered, in part defeating the point, unless solving the issue in an even more complicated manner.)
The semantics code which also counts audio generators does a depth-first traversal of linked modulators from a carrier for several purposes. One purpose is to mark audio generator objects nested under dead ones also dead, so ID reuse can then happen. Traversal is done per voice, with the voice and its "root generator object" as starting point, with timing and tracking for voices and audio generators managed in tandem. When marking an audio generator as expired, two things need to be avoided: it can't be reachable from a label (directly or nested beneath an object which is), and it can't have follow-on events inside the current "duration group" (tied to ; and | timing syntax). Following those two conditions for exclusion, reuse becomes safe. Compared to a "free renumbering" approach, the only waste is the exclusion of labeled objects from reuse; other objects, once related t times set run out, are available as soon as no follow-on events exist for them, given the constraints of the SAU language.
Very early on, numerical infix expressions were added and handled in the parser with one dedicated recursive parsing function for such, handling all subexpressions with recursive calls. The parser parsed and calculated in-place, nested calls combining and reducing numerical expressions to their results. As long as numerical expressions don't contain both side-effects (modifying of state in a script), and re-evaluation of script contents (like in loops or script-defined functions), no more is needed. Only if both are added would the parser need a redesign, in order to allow a numerical expression to be evaluated separately from the initial crunching of the text – and thus possibly several times afterwards.
In 2022 one of the two crucial things – statefulness, in the form of mathematical functions like rand() – were added to numerical expressions.
Only in early 2023 did another main audio generator come along – the rumble/random line segments oscillator R, which combines line segment functions, PRNGs, and modulation options. (A bigger side-project of mine, I've written more about how the R oscillator works on my personal site.) Before then, I stuck with only the wavetable oscillator in main versions of the program. Later a simpler noise generator N and an amplitude generator A (for DC, line segments, and summing other signals) joined R.
Adding more types of audio generators is mainly about audio generation code, and so requires little design change. The R and W oscillators share most parameters and most parsing code, for example. Line types, used by R similarly to how W uses wave types, already existed for a different purpose, added in 2011 for use with parameter sweeps – though more variations on value-filling functions implementing each line were needed to fit the new needs.
The feature of sweeps for parameters is the earliest (2011) example of attaching extra logic and audio rendering features to parameters, beyond support for modulators in lists. Such features have been expanded over the years. ADSR envelopes, added later (2025), take what sweeps do to a next level, and similarly could be added onto the older design by mainly some parsing changes and some extra audio rendering code. Sweeps and envelopes are both little state machines used as subcomponents for the larger audio generator types.
Some general ideas for cleaner code have evolved since reviving the project in November 2017. One little discovery is that staggered region/arena add-only allocator mempools are a perfect fit for much dynamic memory allocations in a program like this. Most of the rest can be handled using a generic dynamic array module (which can be done pretty neatly in C).