01D03
, Sartoris remains incomplete. Simple arithmetic is possible with it and it supports other opcodes like mod, max, min as well as some simple list manipulation. Sartoris does not yet support variables, functions, or loops. Operationally, Sartoris lives within a complete and accessible REPL allowing for in-line editing and past commands cached in memory.
Sartoris — A custom LISP clone
Lisp is a family of languages that treats all data structures as lists, even its source code is comprised of lists (hence the name LISt Processor). Lisp is the second oldest programming language, preceded only by Fortran.
In this exercise, I use C to create my own Lisp. Originally, I embarked on this particularly difficult task because I wanted to add features to the open source photo editing tool Ronin which I began playing around with recently. Ronin is written and operated in its own version of Lisp which was prohibitively difficult for me to understand. I figure that learning to make my own will give me a leg up on how to make the appropriate edits and additions to the open source project. I will be following along Daniel Holden's Build Your Own Lisp project. I also suspect that Ronin's creator took this exact same route in the journey to create the tool.
Developer's note
What is a programming language
A programming language, like any other language, is a mutable combination of words or phrases that form some coherent structure which can be understood by entities other than the utterer. This ability to understand comes from the fact that language must follow a (mostly) self-consistent set of rules. I call this set rules a grammar. In our everyday languages, the use of a grammar rule set comes fairly naturally, but in programming I have to be more rigorous, computers are tough conversational partners.
To build my own programming language I use the
Regular Expressions
Regular expressions are ways to write grammars for small sections of text. Below is an example of how I would construct the bedrock elements of an ultra-simple language called "Sartoris".
mpc_parser_t* Number = mpc_new("number"); mpc_parser_t* Operator = mpc_new("operator"); mpc_parser_t* Expr = mpc_new("expr"); mpc_parser_t* Sartoris = mpc_new("sartoris"); mpca_lang(MPCA_LANG_DEFAULT, " \ number : /-?[0-9]+/ ; \ operator : '+' | '-' | '*' | '/' ; \ expr : <number> | '(' <operator> <expr>+ ')' ; \ sartoris : /^/ <operator> <expr>+ /$/ ; \ ", Number, Operator, Expr, Sartoris);
And with the below grammatical rules for the regular expressions (defined within the parser combinator
Syntax | Grammar Rule |
Any character is required. | |
The start of input is required. | |
The end of input is required. | |
The string | |
The character | |
Any character in the set | |
Any character in the range | |
First | |
Either | |
Zero or more | |
One or more | |
The charater | |
The rule |
Of course, as with any language, the number of the foundational elements is significantly larger, but luckily the rules by which they are applied are relatively constrained. The average programmer knows this to be true as there is not that many ways you can construct or manipulate simple mathematical expressions.
The next steps require some pointer magic to direct a Parser to take some input text and pass through the Sartoris language structure. If I pass through an input such as + 5 (* 2 2)
, the output would look something like the following, it may not be obvious, but this is a nested tree structure with leafs and leafs with leafs. As with other tree data structures, a recursive function is a very effective way to move through a tree.
regex operator|char:1:1 '+' expr|number|regex:1:3 '5' expr|> char:1:5 '(' operator|char:1:6 '*' expr|number|regex:1:8 '2' expr|number|regex:1:10 '2' char:1:11 ')' regex
Stepping Through the Tree
From our interpreter's perspective the tree that I have generated, while detailed, is ultimately a bunch of nonsense. Evaluating our input requires stepping through the tree, extracting and interpreting the elements encounted, and outputting a result.
Daunting to be sure. But if I examine the tree structure long enough and think about how the tree would expand with a more complex input, some obvious and exploitable patterns emerge. First, the 0th element encountered will always be an operator, followed by a number as the 1st element. Next, either a
Once the parsing and interpreting structure is established, I expaned the types of operators that a user can pass to include
Adding new features to SLISP
The typical approach to adding new features to the language consists of a number of steps:
Syntax | Add new rule to the language grammar. |
Representation | Add new data type variation. |
Parsing | Add new functions for reading the feature from the abstract syntax tree. |
Semantics | Add new functions for evauating and manipulating. |
Q-expressions
list | Takes one oor more args and returns new Q-Expr containing the args. |
head | Returns a Q-Expr and returns the first element of another Q-Expr. |
tail | Returns Q-Expr with first element removed. |
join | Returns sinlge Q-Expr of two or more conjoined Q-expr. |
eval | Evaluates Q-Expr as if it were an S-Expr. |