2024 eclipse timelapse by Ashley Lian

Sartoris — A custom LISP clone

Lisp is a family of languages that treats all data structures as lists, even its source code is comprised of lists (hence the name LISt Processor). Lisp is the second oldest programming language, preceded only by Fortran.

In this exercise, I use C to create my own Lisp. Originally, I embarked on this particularly difficult task because I wanted to add features to the open source photo editing tool Ronin which I began playing around with recently. Ronin is written and operated in its own version of Lisp which was prohibitively difficult for me to understand. I figure that learning to make my own will give me a leg up on how to make the appropriate edits and additions to the open source project. I will be following along Daniel Holden's Build Your Own Lisp project. I also suspect that Ronin's creator took this exact same route in the journey to create the tool.

Developer's note

As of 01D03, Sartoris remains incomplete. Simple arithmetic is possible with it and it supports other opcodes like mod, max, min as well as some simple list manipulation. Sartoris does not yet support variables, functions, or loops. Operationally, Sartoris lives within a complete and accessible REPL allowing for in-line editing and past commands cached in memory.

What is a programming language

A programming language, like any other language, is a mutable combination of words or phrases that form some coherent structure which can be understood by entities other than the utterer. This ability to understand comes from the fact that language must follow a (mostly) self-consistent set of rules. I call this set rules a grammar. In our everyday languages, the use of a grammar rule set comes fairly naturally, but in programming I have to be more rigorous, computers are tough conversational partners.

To build my own programming language I use the mpc Parser Combinator Library. mpc allows me to simultaneously define the foundational elements of a language (e.g. nouns, adjectives, verbs), the options within those elements (cat, fluffy, run), as well as the rules by which such elements must be combined together to form an expression ("the fluffy cat runs" rather than "run cat the fluffy").

Regular Expressions

Regular expressions are ways to write grammars for small sections of text. Below is an example of how I would construct the bedrock elements of an ultra-simple language called "Sartoris".

mpc_parser_t* Number   = mpc_new("number");
mpc_parser_t* Operator = mpc_new("operator");
mpc_parser_t* Expr     = mpc_new("expr");
mpc_parser_t* Sartoris    = mpc_new("sartoris");

mpca_lang(MPCA_LANG_DEFAULT,
  "                                                     \
    number   : /-?[0-9]+/ ;                             \
    operator : '+' | '-' | '*' | '/' ;                  \
    expr     : <number> | '(' <operator> <expr>+ ')' ;  \
    sartoris : /^/ <operator> <expr>+ /$/ ;             \
  ",
  Number, Operator, Expr, Sartoris);
      

And with the below grammatical rules for the regular expressions (defined within the parser combinator mpc), I've established the grammar of "Sartoris". It's that easy!

SyntaxGrammar Rule
'.'Any character is required.
'^'The start of input is required.
'$'The end of input is required.
'ab'The string ab is required.
'a'The character a is required.
'[abcdef]'Any character in the set abcdef is required.
'[a-f]'Any character in the range a to f is required.
'a' 'b'First a is required then b is required.
'a' | 'b'Either a is required or b is required.
'a'*Zero or more a are required.
'a'+One or more a are required.
'a'?The charater a us optional.
<abba>The rule abba is required.

Of course, as with any language, the number of the foundational elements is significantly larger, but luckily the rules by which they are applied are relatively constrained. The average programmer knows this to be true as there is not that many ways you can construct or manipulate simple mathematical expressions.

The next steps require some pointer magic to direct a Parser to take some input text and pass through the Sartoris language structure. If I pass through an input such as + 5 (* 2 2), the output would look something like the following, it may not be obvious, but this is a nested tree structure with leafs and leafs with leafs. As with other tree data structures, a recursive function is a very effective way to move through a tree.

regex
  operator|char:1:1 '+'
  expr|number|regex:1:3 '5'
  expr|>
    char:1:5 '('
    operator|char:1:6 '*'
    expr|number|regex:1:8 '2'
    expr|number|regex:1:10 '2'
    char:1:11 ')'
regex
    

Stepping Through the Tree

From our interpreter's perspective the tree that I have generated, while detailed, is ultimately a bunch of nonsense. Evaluating our input requires stepping through the tree, extracting and interpreting the elements encounted, and outputting a result.

Daunting to be sure. But if I examine the tree structure long enough and think about how the tree would expand with a more complex input, some obvious and exploitable patterns emerge. First, the 0th element encountered will always be an operator, followed by a number as the 1st element. Next, either a '(' character or another number. If it's a number(s), I can extract the elements into a separate funciton that checks the operator and performs the appropriate math on the numbers. But, if it's a '(', then I can repeat the cycle; our evaluation is internally recursive. The recursive nature of LISPs allows for complex nested inputs to be parsed in an ultra-compact manner, requiring just at few dozen lines (when including element typing to assist in error handling).

Once the parsing and interpreting structure is established, I expaned the types of operators that a user can pass to include power: '^', mod: '%', as well as externally defined functions MAX() MIN().

Adding new features to SLISP

The typical approach to adding new features to the language consists of a number of steps:

SyntaxAdd new rule to the language grammar.
RepresentationAdd new data type variation.
ParsingAdd new functions for reading the feature from the abstract syntax tree.
SemanticsAdd new functions for evauating and manipulating.

Q-expressions

listTakes one oor more args and returns new Q-Expr containing the args.
headReturns a Q-Expr and returns the first element of another Q-Expr.
tailReturns Q-Expr with first element removed.
joinReturns sinlge Q-Expr of two or more conjoined Q-expr.
evalEvaluates Q-Expr as if it were an S-Expr.