Foundations of Dawn: The Untyped Multistack Concatenative Calculus

By Scott J Maddox.

First published 2022-02-24.

Updated on 2022-11-01 to distinguish between sequence termination indices when composing two quotes. Thank you, Maja Kądziołka, for identifying this error and notifying me via email.

In the last Foundations of Dawn post, I formally defined the untyped (single stack) concatenative calculus (UCC), demonstrated encodings for booleans and natural numbers, and provided a toy programming language based on the UCC. In this post I will do the same for the untyped multistack concatenative calculus (UMCC), which enhances the UCC with an arbitrary number of arbitrarily named stacks. These extra stacks relieve much of the pain associated with stack shuffling in the UCC. They also provide some other benefits that we’ll begin to look at in this post and explore further in future posts.

If you identify any errors or omissions in this post or in the associated toy programming language, I would greatly appreciate it if you would notify me. I would be happy to give attribution for any corrections or suggestions.

Syntax

The syntax of the UMCC extends that of the UCC by introducing the concept of stack contexts, which we denote by (se)(s|e), where ss is a stack identifier and ee is an expression:

Expressionse=iintrinsic[e]quotee1 e2 ... encompose(se)where sS(e)stack context\begin{array}{ l l l l l l } \text{Expressions} & e & = & i & & \text{intrinsic} \\ & & | & [e] & & \text{quote} \\ & & | & e_1~e_2~...~e_n & & \text{compose} \\ & & | & (s|e) & \text{where}~s\notin \mathbb{S}(e) & \text{stack context} \\ \end{array}

The meta function S(e)\mathbb{S}(e) returns the set of unquoted stack identifiers contained in ee:

S(i)={}S([e])={}S(e1 e2 ... en)=S(e1)S(e2)...S(en)S((se))={s}S(e)\begin{array}{ l l l l l l } \mathbb{S}(i ) & = & \{\} \\ \mathbb{S}([e] ) & = & \{\} \\ \mathbb{S}(e_1~e_2~...~e_n) & = & \mathbb{S}(e_1) \bigcup \mathbb{S}(e_2) \bigcup ... \bigcup \mathbb{S}(e_n) \\ \mathbb{S}((s|e) ) & = & \{s\} \bigcup \mathbb{S}(e) \\ \end{array}

The sS(e)s\notin \mathbb{S}(e) restriction essentially means that nested, unquoted stack contexts must have disjoint stack identifiers. Later in this post, we’ll look at how the sS(e)s\notin \mathbb{S}(e) restriction can effectively be lifted when implementing this calculus as a programming language.

Semantics

The semantics of this calculus are defined by the effects that expressions have on value multistacks, which are essentially maps from stack identifiers to value stacks:

Valuesv=[e]quoted expressionValue StackssV=sempty stack ssV vpush v onto stack sValue MultistacksV=Vϵempty multistackVsVadd stack s\begin{array}{ l l r l l } \text{Values} & v & = & [e] & \text{quoted expression} \\ \\ \text{Value Stacks} & \braket{s|V} & = & \braket{s|} & \text{empty stack}~s \\ & & | & \braket{s|V~v} & \text{push}~v~\text{onto stack}~s \\ \\ \text{Value Multistacks} & \mathbb{V} & = & \mathbb{V}_\epsilon & \text{empty multistack} \\ & & | & \mathbb{V} \braket{s|V} & \text{add stack}~s \\ \end{array}

Compared to the UCC, in the UMCC we replace the swap\text{swap} intrinsic with push\text{push} and pop\text{pop}, which move values between stacks. So, in the UMCC there are seven intrinsic concatenative terms, push\text{push}, pop\text{pop}, clone\text{clone}, drop\text{drop}, quote\text{quote}, compose\text{compose}, and apply\text{apply}, which are defined by the following small-step semantics:

VsV vsV v(s(spush))VsV v vsVVsV vsV v(s(spop))VsVsV v vVsV v(s(sclone))VsV v vVsV v(s(sdrop))VsVVsV v(s(squote))VsV [v]VsV [e1 ... en] [e1 ... em](s(scompose))VsV [e1 ... en e1 ... em]VsV [e](s(sapply))VsV(s(se))\begin{array}{ l l l l } \mathbb{V}\braket{s|V~v}\braket{s'|V'~v'} & (s'|(s|\text{push})) & \longrightarrow & \mathbb{V}\braket{s|V~v~v'}\braket{s'|V'} \\ \mathbb{V}\braket{s|V~v}\braket{s'|V'~v'} & (s'|(s|\text{pop})) & \longrightarrow & \mathbb{V}\braket{s|V}\braket{s'|V'~v'~v} \\ \mathbb{V}\braket{s|V~v} & (s'|(s|\text{clone})) & \longrightarrow & \mathbb{V}\braket{s|V~v~v} \\ \mathbb{V}\braket{s|V~v} & (s'|(s|\text{drop})) & \longrightarrow & \mathbb{V}\braket{s|V} \\ \mathbb{V}\braket{s|V~v} & (s'|(s|\text{quote})) & \longrightarrow & \mathbb{V}\braket{s|V~[v]} \\ \mathbb{V}\braket{s|V~[e_1~...~e_n]~[e'_1~...~e'_m]} & (s'|(s|\text{compose})) & \longrightarrow & \mathbb{V}\braket{s|V~[e_1~...~e_n~e'_1~...~e'_m]} \\ \mathbb{V}\braket{s|V~[e]} & (s'|(s|\text{apply})) & \longrightarrow & \mathbb{V}\braket{s|V} (s'|(s|e)) \\ \end{array}

Just as in the UCC, in the UMCC, quotes reduce directly to values:

VsV(s(s[e]))VsV [e]\begin{array}{ l l l l } \mathbb{V}\braket{s|V} & (s'|(s|[e])) & \longrightarrow & \mathbb{V} \braket{s|V~[e]} \\ \end{array}

And composed expressions are reduced from left to right:

V e1 e2 ... en=((((V e1) e2) ...) en)\begin{array}{ l l l } \mathbb{V}~e_1~e_2~...~e_n & = & ((((\mathbb{V}~e_1)~e_2)~...)~e_n) \\ \end{array}

Compared to the UCC, three additional rules are needed in order to reduce all valid UMCC expressions. First, stack contexts distribute over composed sub-expressions:

(se1 e2 ... en)(se1) (se2) ... (sen)\begin{array}{ l l l } (s|e_1~e_2~...~e_n) & \longleftrightarrow & (s|e_1)~(s|e_2)~...~(s|e_n) \\ \end{array}

Second, when more than two stack contexts are directly nested, the outer-most context is redundant and can be removed:

(s(s(se)))(s(se))\begin{array}{ l l l } (s''|(s'|(s|e))) & \longrightarrow & (s'|(s|e)) \\ \end{array}

Finally, empty stack contexts have no effect and can be removed:

(s(s))(s).\begin{array}{ l l l l l } (s'(s|)) & \longrightarrow & (s'|) & \longrightarrow & . \\ \end{array}

This completes the operational semantics for the UMCC. By extending the UCC with stack contexts, a few new rules for reducing them, and two new intrinsics for moving values between stacks, we arrive at a concatenative calculus that is still incredibly small and simple while being considerably more practical for writing useful programs. As I began describing in Introducing Dawn (Part 1), multistacks alleviate the pain of stack shuffling and significantly improve readability and writability compared to other concatenative languages, while retaining the positive aspects of concatenative languages. Furthermore, as we’ll begin to see later in this post, multistack concatenative languages have other interesting features such as something like global variables but without the drawbacks that global variables typically entail.

Embedding the UCC

We can embed the UCC in the UMCC by translating any UCC expression, eUCCe_{UCC}, into the corresponding UMCC expression, eUMCC=(s(seUCC))e_{UMCC} = (s'|(s|e_{UCC})), and by defining a swap\text{swap} term. To define swap\text{swap}, we simply need to pick two fresh stack identifiers, e.g. s1s_1 and s1s'_1, push the top two values onto them, and then pop them off in swapped order:

swap=(s1push)(s1push)(s1pop)(s1pop) \qquad\text{swap} = (s_1|\text{push}) (s'_1|\text{push}) (s_1|\text{pop}) (s'_1|\text{pop})

Stack Identifier Deshadowing

When implementing the UMCC as a programming language, we will want to extend the core calculus with user-defined terms. Unfortunately, the sS(e)s\notin \mathbb{S}(e) restriction on stack contexts presents a problem that we must address. If the user defines a term that contains the stack identifier ss, then obeying the sS(e)s\notin \mathbb{S}(e) restriction would preclude use of that term inside of any other ss stack context, which is an undesirable limitation. If we ignore this restriction without any other changes, though, then that term would behave differently depending on whether or not it’s used inside an ss stack context, which could result in considerable confusion.

Luckily, we can easily resolve this issue by having the interpreter/compiler automatically rename unquoted stack identifiers when they shadow an outer stack context. We will refer to this transformation as stack identifier deshadowing. There are two situations in which deshadowing is necessary—first, when evaluating the apply\text{apply} intrinsic, and second, when evaluating (i.e. expanding) a user-defined term. In order to make it trivial for users to inline term definitions even when they contain unquoted shadowing stack identifiers, we can additionally apply deshadowing when defining new terms and when beginning to reduce a new expression in the interpreter’s read-eval-print loop (REPL). If applied in all four of these situations, deshadowing effectively lifts the sS(e)s\notin \mathbb{S}(e) restriction, entirely.

Implementation: UMCCI

Just as for the UCC, in order to test and validate everything written in this post, I have implemented an interpreter, the UMCCI, for a toy programming language based on this calculus. The toy programming language extends the core calculus with user-defined terms, which can be directly or mutually recursive, and uses stack identifier deshadowing to lift the sS(e)s\notin \mathbb{S}(e) restriction, as described above. In addition, in order to reduce boilerplate, when evaluating an expression without any stack contexts, the interpreter automatically wraps the expression in default stack contexts.

The interpreter is available to try in your browser, and the source code is available on Github. Most of the terms described in this post come predefined, but any and all terms can be redefined by the user. Example sessions will be shown in the remainder of this post.

Encoding Data Types and Computations

Next, we’ll take a look at how we might perform some computations in this calculus, just as we did for the UCC. This calculus still does not include any built-in data types, so we will still encode data types as quoted expressions. While we could use the same encodings for booleans and natural numbers that we did in the UCC, we’ll instead use a more general and systematic encoding, and we’ll take advantage of multistacks to improve ergonomics. In particular, we will take inspiration from the Scott encoding for arbitrary (including recursive) algebraic data types (ADTs) in the untyped lambda calculus, and apply the same concept to the UMCC. First, we’ll formally define the encoding, and then we’ll take a look at several examples including booleans and natural numbers.

Encoding Arbitrary Algebraic Data Types (ADTs)

The basic idea behind Scott encoding is to encode each variant of an algebraic data type as a higher-order term that acts as both the constructor and destructor (a.k.a. eliminator). When applied to a variant’s input data, the constructor returns a closure over that input data. When that closure is then applied to a case expression for each possible variant, it applies only the case expression for the encoded variant to the closed-over data. So the closure acts as the destructor. In the untyped lambda calculus, the destructor is applied to case expressions in a particular order. Similarly, in the UCC, we would place quoted case expressions on top of the stack in a particular order and then apply the destructor. In the UMCC, we’ll instead place quoted case expressions on top of particular stacks and then apply the destructor.

Formally, let DD be an algebraic data type with NN constructors, {Ci}i=1N\{C_i\}_{i=1}^N, such that constructor CiC_{i} has arity AiA_{i}. The encoding of constructor CiC_{i} of the data type DD is given by

Ci=quoteAi [Ci] compose\qquad C_i = \text{quote}_{A_i}~[C'_i]~\text{compose}

where

Ci=(case_C1drop)...(case_Ci1drop)(case_Cipop)(case_Ci+1drop)...(case_CNdrop) apply\qquad C'_i = (\text{case\_}C_1|\text{drop}) ... (\text{case\_}C_{i-1}|\text{drop})(\text{case\_}C_i|\text{pop})(\text{case\_}C_{i+1}|\text{drop})... (\text{case\_}C_N|\text{drop})~\text{apply}

and where quoten\text{quote}_{n} accepts nn inputs and returns a single quote containing all of the inputs:

VsV v1...vn(s(squoten))VsV [v1...vn]\qquad\mathbb{V}\braket{s|V~v_1...v_n} (s'|(s|\text{quote}_n)) \Downarrow \mathbb{V}\braket{s|V~[v_1...v_n]}

The definitions for quoten\text{quote}_{n} with n=0n=0, n=1n=1, and n>1n>1 are given by

quote0=[]\qquad\text{quote}_{0} = []

quote1=quote\qquad\text{quote}_{1} = \text{quote}

quoten>1=(snpush)...(s1push) (s1quote pop)...(snquote pop) composen\qquad\text{quote}_{n>1} = (s_n| \text{push})...(s_1| \text{push})~(s_1|\text{quote}~\text{pop})...(s_n|\text{quote}~\text{pop})~\text{compose}_n

where composen\text{compose}_n composes nn quotes and has the same definition as in the previous post on the UCC.

To destructure an ADT encoded in this way, place a quoted case expression on top of each case stack and then apply the destructor:

(case_C1[e1])...(case_CN[eN]) apply\qquad(\text{case\_}C_1|[e_1]) ... (\text{case\_}C_N|[e_N])~\text{apply}

As we’ll see in the examples below, destructuring closely resembles simple exhaustive pattern matching in ML-family languages. Ultimately, Dawn will include more ergonomic forms of pattern matching, such as recursive and multi-value matching. In a future post, we’ll examine such ergonomic pattern matching forms and derive how they can be lowered to the limited form of destructing available in the UMCC.

Encoding Booleans

In the last post on the UCC, we encoded booleans as quoted higher-order terms that, when applied, accepted two values, one for the false case and one for the true case, dropped the non-matching case value, and kept the matching case value. Here, our encoding of booleans will accept two quoted case expressions, one for the false case and one for the true case, drop the non-matching quoted case expression, and keep and apply the matching quoted case expression. The result is that applying a boolean is analogous to simple exhaustive pattern matching on a boolean in an ML-family language, such as Haskell. Thus, the following Haskell and UMCC definitions are equivalent:

Haskell:

data Bool = False | True

not a = case a of
  False -> True
  True -> False

or a b = case b of
  False -> case a of
    False -> False
    True -> True
  True -> True

and a b = case b of
  False -> False
  True -> case a of
    False -> False
    True -> True

UMCCI:

{term False = quote0 [_False] compose}
{term True  = quote0 [_True]  compose}

{term _False = (case_False|pop)  (case_True|drop) apply}
{term _True  = (case_False|drop) (case_True|pop)  apply}

{term not =
  (case_False|[True])
  (case_True|[False])
  apply
}

{term or =
  (case_False|[
    (case_False|[False])
    (case_True|[True])
    apply
  ])
  (case_True|[drop True])
  apply
}

{term and =
  (case_False|[drop False])
  (case_True|[
    (case_False|[False])
    (case_True|[True])
    apply
  ])
  apply
}

In the UMCC version, to destructure a value we first place a quoted case expression for each variant on its own stack. So the quoted case expression for False is placed on the case_False stack, and the quoted case expression for True is placed on the case_True stack. Then we apply the value, and the value drops all of the non-matching quoted case expressions and keeps and applies the matching quoted case expression. So False drops the quoted case expression from the case_True stack, pops the quoted case expression off of the case_False stack, and then applies it, while True drops the quoted case expression from the case_False stack, pops the quoted case expression off of the case_True stack, and then applies it.

Note how each data type constructor acts as its own destructor (a.k.a eliminator), when applied on quoted case expressions. This is directly analogous to Scott encoding in the untyped lambda calculus.

Also note how the translation from Haskell to UMCC is nearly syntax directed. The only complication is the need to use drop\text{drop} (and in general, clone\text{clone}) to obey linearity (i.e. exactly single use) of values in the UMCC. In general, though, the mismatch between Haskell’s lazy evaluation and UMCC’s eager evaluation can make term translation more difficult.

A UMCCI session using these definitions is shown below:

>>> False not
    (__|(_|False not))
⇓ ⟨_|[_True]⟩

>>> drop
⟨_|[_True]⟩    (__|(_|drop))
⇓

>>> False True or
    (__|(_|False True or))
⇓ ⟨_|[_True]⟩

>>> True and
⟨_|[_True]⟩    (__|(_|True and))
⇓ ⟨_|[_True]⟩

To trace each small step reduction, you can use the :trace directive:

>>> :trace False not
    (__|(_|False not))
‒StkCtxDistr⟶     (__|(_|False) (_|not))
‒StkCtxDistr⟶     (__|(_|False)) (__|(_|not))
‒LitCall⟶     (__|(_|quote0 [_False] compose)) (__|(_|not))
‒StkCtxDistr⟶     (__|(_|quote0) (_|[_False] compose)) (__|(_|not))
‒StkCtxDistr⟶     (__|(_|quote0)) (__|(_|[_False] compose)) (__|(_|not))
‒LitCall⟶     (__|(_|[])) (__|(_|[_False] compose)) (__|(_|not))
‒LitQuote⟶ ⟨_|[]⟩    (__|(_|[_False] compose)) (__|(_|not))
‒StkCtxDistr⟶ ⟨_|[]⟩    (__|(_|[_False]) (_|compose)) (__|(_|not))
‒StkCtxDistr⟶ ⟨_|[]⟩    (__|(_|[_False])) (__|(_|compose)) (__|(_|not))
‒LitQuote⟶ ⟨_|[] [_False]⟩    (__|(_|compose)) (__|(_|not))
‒IntrCompose⟶ ⟨_|[_False]⟩    (__|(_|not))
‒LitCall⟶ ⟨_|[_False]⟩    (__|(_|(case_False|[True]) (case_True|[False]) apply))
‒StkCtxDistr⟶ ⟨_|[_False]⟩    (__|(_|(case_False|[True])) (_|(case_True|[False]) apply))
‒StkCtxDistr⟶ ⟨_|[_False]⟩    (__|(_|(case_False|[True]))) (__|(_|(case_True|[False]) apply))
‒StkCtx3Redund⟶ ⟨_|[_False]⟩    (_|(case_False|[True])) (__|(_|(case_True|[False]) apply))
‒LitQuote⟶ ⟨_|[_False]⟩ ⟨case_False|[True]⟩    (__|(_|(case_True|[False]) apply))
‒StkCtxDistr⟶ ⟨_|[_False]⟩ ⟨case_False|[True]⟩    (__|(_|(case_True|[False])) (_|apply))
‒StkCtxDistr⟶ ⟨_|[_False]⟩ ⟨case_False|[True]⟩    (__|(_|(case_True|[False]))) (__|(_|apply))
‒StkCtx3Redund⟶ ⟨_|[_False]⟩ ⟨case_False|[True]⟩    (_|(case_True|[False])) (__|(_|apply))
‒LitQuote⟶ ⟨_|[_False]⟩ ⟨case_False|[True]⟩ ⟨case_True|[False]⟩    (__|(_|apply))
‒IntrApply⟶ ⟨case_False|[True]⟩ ⟨case_True|[False]⟩    (__|(_|_False))
‒LitCall⟶ ⟨case_False|[True]⟩ ⟨case_True|[False]⟩    (__|(_|(case_False|pop) (case_True|drop) apply))
‒StkCtxDistr⟶ ⟨case_False|[True]⟩ ⟨case_True|[False]⟩    (__|(_|(case_False|pop)) (_|(case_True|drop) apply))
‒StkCtxDistr⟶ ⟨case_False|[True]⟩ ⟨case_True|[False]⟩    (__|(_|(case_False|pop))) (__|(_|(case_True|drop) apply))
‒StkCtx3Redund⟶ ⟨case_False|[True]⟩ ⟨case_True|[False]⟩    (_|(case_False|pop)) (__|(_|(case_True|drop) apply))
‒IntrPop⟶ ⟨_|[True]⟩ ⟨case_True|[False]⟩    (__|(_|(case_True|drop) apply))
‒StkCtxDistr⟶ ⟨_|[True]⟩ ⟨case_True|[False]⟩    (__|(_|(case_True|drop)) (_|apply))
‒StkCtxDistr⟶ ⟨_|[True]⟩ ⟨case_True|[False]⟩    (__|(_|(case_True|drop))) (__|(_|apply))
‒StkCtx3Redund⟶ ⟨_|[True]⟩ ⟨case_True|[False]⟩    (_|(case_True|drop)) (__|(_|apply))
‒IntrDrop⟶ ⟨_|[True]⟩    (__|(_|apply))
‒IntrApply⟶     (__|(_|True))
‒LitCall⟶     (__|(_|quote0 [_True] compose))
‒StkCtxDistr⟶     (__|(_|quote0) (_|[_True] compose))
‒StkCtxDistr⟶     (__|(_|quote0)) (__|(_|[_True] compose))
‒LitCall⟶     (__|(_|[])) (__|(_|[_True] compose))
‒LitQuote⟶ ⟨_|[]⟩    (__|(_|[_True] compose))
‒StkCtxDistr⟶ ⟨_|[]⟩    (__|(_|[_True]) (_|compose))
‒StkCtxDistr⟶ ⟨_|[]⟩    (__|(_|[_True])) (__|(_|compose))
‒LitQuote⟶ ⟨_|[] [_True]⟩    (__|(_|compose))
‒IntrCompose⟶ ⟨_|[_True]⟩

It’s perhaps worth noting that while this toy implementation of the UMCC is extremely inefficient, since it performs every small step reduction explicitly, it should be fairly straight-forward to write either a fairly efficient threaded interpreter or a quite efficient compiler for the UMCC.

Encoding Natural Numbers

Next, let’s take a look at the encoding for natural numbers. The following Haskell and UMCC definitions are equivalent:

Haskell:

data Nat = Z | S Nat

succ a = S a

add a b = case b of
  Z -> a
  S b' -> add (succ a) b'

mul a b = _mul Z a b

_mul acc a b = case b of
  Z -> acc
  S b' -> mul (add acc a) a b'

UMCCI:

{term Z = quote0 [_Z] compose}
{term S = quote1 [_S] compose}

{term _Z = (case_Z|pop)  (case_S|drop) apply}
{term _S = (case_Z|drop) (case_S|pop)  apply}

{term succ = S}

{term add =
  (case_Z|[])
  (case_S|[(b|push) succ (b|pop) add])
  apply}

{term mul = (_|push push) Z (_|pop pop) _mul}

{term _mul =
  (case_Z|[drop])
  (case_S|[(b|push) clone (a|push) add (a|pop) (b|pop) _mul])
  apply
}

This example demonstrates how captured values are placed on top of the current stack when destructuring. Destructuring a Z produces zero captured values as inputs to the Z case expression, while destructuring an S produces one captured value as an input to the S case expression. Again, the primary difference between the Haskell and UMCC implementations are that the UMCC implementation must use drop\text{drop} and clone\text{clone} terms to obey linearity.

A UMCCI session using these definitions is shown below:

>>> Z succ
    (__|(_|Z succ))
⇓ ⟨_|[[_Z] _S]⟩

>>> succ
⟨_|[[_Z] _S]⟩    (__|(_|succ))
⇓ ⟨_|[[[_Z] _S] _S]⟩

>>> Z S add
⟨_|[[[_Z] _S] _S]⟩    (__|(_|Z S add))
⇓ ⟨_|[[[[_Z] _S] _S] _S]⟩

>>> Z S S mul
⟨_|[[[[_Z] _S] _S] _S]⟩    (__|(_|Z S S mul))
⇓ ⟨_|[[[[[[[_Z] _S] _S] _S] _S] _S] _S]⟩

Encoding Lists

We can also encode parametric types in the UMCC, such as lists:

data List a = Nil | Cons a (List a)
{term Nil  = quote0 [_Nil]  compose}
{term Cons = quote2 [_Cons] compose}

{term _Nil  = (case_Nil|pop)  (case_Cons|drop) apply}
{term _Cons = (case_Nil|drop) (case_Cons|pop)  apply}

However, since the UMCC is untyped, there’s nothing to enforce proper usage. So there’s nothing preventing you from, for example, using a natural number where a list is expected, e.g. Z Z Cons.

Encoding the Abstract Syntax Tree (AST) for the UMCC

And finally, as a more complex example, we can encode the abstract syntax tree (AST) for the UMCC:

Haskell:

data Expr = Intrinsic Intrinsic
          | Quote Expr
          | Compose (List Expr)
          | StackContext String Expr

data Intrinsic = Push | Pop | Clone | Drop | Quote | Compose | Apply

UMCC:

{term Intrinsic    = quote1 [_Intrinsic]    compose}
{term Quote        = quote1 [_Quote]        compose}
{term Compose      = quote1 [_Compose]      compose}
{term StackContext = quote2 [_StackContext] compose}

{term _Intrinsic    = (case_Intrinsic|pop)  (case_Quote|drop) (case_Compose|drop) (case_StackContext|drop) apply}
{term _Quote        = (case_Intrinsic|drop) (case_Quote|pop)  (case_Compose|drop) (case_StackContext|drop) apply}
{term _Compose      = (case_Intrinsic|drop) (case_Quote|drop) (case_Compose|pop)  (case_StackContext|drop) apply}
{term _StackContext = (case_Intrinsic|drop) (case_Quote|drop) (case_Compose|drop) (case_StackContext|pop)  apply}

{term Push    = quote0 [_Push]    compose}
{term Pop     = quote0 [_Pop]     compose}
{term Clone   = quote0 [_Clone]   compose}
{term Drop    = quote0 [_Drop]    compose}
{term Quote   = quote0 [_Quote]   compose}
{term Compose = quote0 [_Compose] compose}
{term Apply   = quote0 [_Apply]   compose}

{term _Push    = (case_Push|pop)  (case_Pop|drop) (case_Clone|drop) (case_Drop|drop) (case_Quote|drop) (case_Compose|drop) (case_Apply|drop) apply}
{term _Pop     = (case_Push|drop) (case_Pop|pop)  (case_Clone|drop) (case_Drop|drop) (case_Quote|drop) (case_Compose|drop) (case_Apply|drop) apply}
{term _Clone   = (case_Push|drop) (case_Pop|drop) (case_Clone|pop)  (case_Drop|drop) (case_Quote|drop) (case_Compose|drop) (case_Apply|drop) apply}
{term _Drop    = (case_Push|drop) (case_Pop|drop) (case_Clone|drop) (case_Drop|pop)  (case_Quote|drop) (case_Compose|drop) (case_Apply|drop) apply}
{term _Quote   = (case_Push|drop) (case_Pop|drop) (case_Clone|drop) (case_Drop|drop) (case_Quote|pop)  (case_Compose|drop) (case_Apply|drop) apply}
{term _Compose = (case_Push|drop) (case_Pop|drop) (case_Clone|drop) (case_Drop|drop) (case_Quote|drop) (case_Compose|pop)  (case_Apply|drop) apply}
{term _Apply   = (case_Push|drop) (case_Pop|drop) (case_Clone|drop) (case_Drop|drop) (case_Quote|drop) (case_Compose|drop) (case_Apply|pop)  apply}

Note that in the UMCC implementation I assume that String is defined elsewhere as a list of bytes or similar.

Global Stacks: Global Variables Without the Drawbacks

Global variables are a useful feature that is available in many existing programming languages. Global variables allow the user to access a value from many different functions without having to explicitly thread the value through every function call. Unfortunately, global variables have some drawbacks. Most importantly, accessing a mutable global variable breaks referential transparency, which is a desirable property that makes programs easier to understand, analyze, test, and optimize. And in statically typed languages with global variables, the type of a function (typically) does not reflect whether or not it accesses any global variables, which exacerbates the problem.

In a multistack concatenative calculus, we can use the global availability of arbitrarily named stacks to gain the benefits of global variables without encountering these drawbacks. Namely, we do not need to explicitly thread values through each function call. Instead, every value is available somewhere on the value multistack. In the UMCC, there isn’t much of a benefit, since letting any term access or alter any value isn’t particularly different from letting any function access or alter any global variable. However, once we add a static type system, the type for every term will reflect which values it touches. As a result, all terms in such a multistack concatenative calculus would be referentially transparent.

Generators and Iterators

Generators and iterators are two other useful features that are available in a variety of existing programming languages. Generators, which are also known as coroutines, are like functions that can be paused and resumed and that yield one or more values each time they are resumed, while iterators are essentially generators that yield elements from an existing data structure. Can we implement generators and iterators in the UCC/UMCC? If so, how?

In order to pause and resume a function in the UCC/UMCC, we will need to store the paused function as a value. Luckily, this is exactly what quotes do. So, in order to implement a generator, we need to define a term that returns both the next value and the next quoted term. Then we must simply apply\text{apply} the quoted term each time we would like to resume the generator.

Below, we apply this concept to define a generator, nat_gen, that yields increasing natural numbers:

>>> {term nat_gen = clone succ quote [nat_gen] compose}
Redefined `nat_gen`.

>>> [Z nat_gen]
    (__|(_|[Z nat_gen]))
⇓ ⟨_|[Z nat_gen]⟩

>>> apply
⟨_|[Z nat_gen]⟩    (__|(_|apply))
⇓ ⟨_|[_Z] [[[_Z] _S] nat_gen]⟩

>>> apply
⟨_|[_Z] [[[_Z] _S] nat_gen]⟩    (__|(_|apply))
⇓ ⟨_|[_Z] [[_Z] _S] [[[[_Z] _S] _S] nat_gen]⟩

>>> apply
⟨_|[_Z] [[_Z] _S] [[[[_Z] _S] _S] nat_gen]⟩    (__|(_|apply))
⇓ ⟨_|[_Z] [[_Z] _S] [[[_Z] _S] _S] [[[[[_Z] _S] _S] _S] nat_gen]⟩

A similar approach could presumably be used to define a variety of generators, as well as to define iterators over various data types, such as lists.

Conclusion

In this second Foundations of Dawn post, I have formally defined the untyped multistack concatenative calculus, which brings us one step closer to the Dawn programming language, and I have explored some of the properties of this calculus. I have also provided an interpreter for a toy programming language based on this calculus, which I have used to test and validate everything written here.

In future Foundations of Dawn posts, I’ll formalize a few more features that we’ll want in order to define a practical and productive programming language. In particular, we’ll want:

I make no promises about the order in which each of these will be addressed, but I plan to address all of them eventually.


Discuss this post on Hacker News or Lobsters.