Building the Rue Type System
This post goes into detail on how the type system of the Rue programming language works and the rationale behind its design. If you need a refresher, this is a link to the previous blog post where I described what Rue Lang is and what sets it apart from Chialisp:
Type Systems
It’s important to first understand what the goal of a type system is. The Wikipedia article goes into a lot of detail, but essentially it boils down to preventing compilation of programs that use data in the wrong way. For example, if you try to assign an integer to a function that expects a boolean, it will fail to compile. This prevents a large array of logic errors that could lead a program to either crash or have an unintended result.
Many languages employ dynamic type checking, wherein the values have type information associated with them at runtime. While interpreting the program, the runtime will check the type and raise an error if there is a mismatch. However, static type checking enforces these checks at compile time, before the program is ever run. This can be useful for catching bugs earlier on, improving maintainability, and optimizing compiled programs.
For these reasons, and because it makes code easier to audit, Rue is a strictly typed programming language. Types are required for all values, whether they are inferred or explicitly specified. Bypassing the type checker or overriding the type of a value is intentionally cumbersome to discourage doing so, since it reduces the safety of the program.
However, making a language statically typed will be frustrating unless the type system is powerful enough to sufficiently express the intention of complex programs with types. Every time the compiler isn’t smart enough to understand what you’re trying to do, and gives you a type error that you know is unnecessary, it takes you out of the flow of writing code and forces you to find workarounds.
CLVM Types
There are only two fundamental types in CLVM (the bytecode that Rue compiles down into):
Atom - an arbitrary sequence of bytes
Pair - two values (each of which can either be an atom or pair)
Thus, the first two Rue types are born: Atom and (A, B).
You can nest these types:
(Atom, (Atom, Atom))
Atoms
Often atoms have specific semantics or constraints that you’d like to encode in the type system.
There are various subtypes that are provided:
Bytes - a sequence of bytes
Bytes32 - a sequence of exactly 32 bytes (useful for SHA-256 hashes)
PublicKey - a BLS-12381 public key
Int - an arbitrary precision signed integer
Bool - a union of true or false
nil - an empty atom
As seen with booleans and nil, it’s possible to use specific values as types. For example, you can use 42 as a type that represents that value. Anything else, be it an arbitrary int or a different value, will not be assignable to that type.
Unions
Now that you can get specific about whether a type is an atom, a pair, or a specific value, it’s important to be able to represent union types. A union represents more than one type - where A | B means it matches either A or B.
Type Aliases
A type alias is like a variable for types. It lets you hide the complexity behind a reusable name.
type Owner = PublicKey | nil;
This can also be used to create recursive data types (the Wikipedia article linked actually represents the equivalent of the following example in Lisp):
type List<T> = nil | (T, List<T>);
This means that a list can either be nil, or one element followed by zero or more other elements (another List). Even though List is a built-in type in the language, this is the exact way that it’s implemented (and you can write the type yourself if you wanted to).
Recursive types were the biggest challenge to implement in the type system. Every operation that interacts with types, including comparison, type checking, subtraction, and replacement, must account for the fact that you might end up infinitely checking the same type over and over. Solutions in the Rue compiler include maximum recursion depths, a stack of “visited” comparisons, and a cache of references to results that are yet to be determined. I’m very excited that I was able to preserve such a powerful aspect of the type system and fight through all of the problems I faced when designing it.
Another example of a recursive type is Any (which represents, well, any CLVM value):
type Any = Atom | (Any, Any);
Structs
Now that you can represent any CLVM structure in Rue, it’s important to be able to name the fields in a list.
This is where structs come in:
struct Coin {
parent_coin_id: Bytes32,
puzzle_hash: Bytes32,
amount: Int,
}
This type has the same structure as (Bytes32, (Bytes32, (Int, nil))), but it allows you to reference the fields by name (such as coin.amount).
Structs also have the unique property that they are semantically different from one another. If you define two identical structs named Coin1 and Coin2, you cannot assign from one to the other (even though they have the same structure), unless you cast them explicitly using the as operator.
A struct can have default values, which makes it easier to define types such as conditions:
struct ReserveFee {
opcode = 52,
amount: Int,
}
Now you can construct this with ReserveFee { amount: 42 } rather than needing to specify the opcode as well.
Type Guards
Well, now we have a bunch of types that we can use to represent the structure and semantics of CLVM values accurately. But we’re missing a way to check what type a value is at runtime. This is where type guards come in.
A type guard is a combination of two things:
Type check - which generates code that checks the type of a value at runtime
Type replacement - which modifies the type of a symbol or field at compile time
As a concise example:
fn count(list: List<Int>) -> Int {
// We’ve reached the end of the list, so there’s no length.
if list is nil {
return 0;
}
// Now we know that list is (Int, List<Int>) rather than nil
1 + count(list.rest)
}
When you use the expression list is nil you are creating a type guard - it will output the CLVM (not (l list)), which checks that the value is an atom (nil in this case). Inside of the if branch, the type of list is nil. However, on the outside, the type is (Int, List<T>) thanks to subtraction! This means that we’ve narrowed the type down by checking the value at runtime.
This feature is the most important aspect of the type system, and also what took the most effort to get right. It’s optimized such that it won’t output unnecessary checks (if the value can never be a type, there’s no point in checking for it).
Subtraction
Subtraction is instrumental in making sure that inverting a type guard for else branches works.
Fundamentally, these are the steps to implement subtraction:
Recursively extract the original type into a list of union variants (along with their corresponding semantic types, if any)
Remove any union variants that are assignable to the right hand side
Repackage and group the union variants into their corresponding semantic types
Essentially this means that any types that were a subset of the if branch are filtered out of the else branch (which matches the behavior at runtime).
Replacement
Type guarding doesn’t just work on entire symbols - if you check the type of a field in a symbol whose type is a struct, the symbol’s type will be replaced with one wherein that field is subtracted in the else branch.
This involves recursively walking the tree of pairs and substituting based on the path of the field that you type guarded. If there are multiple type guards on separate fields, they will be aggregated together in the final type when referenced later on in the branch.
Function Types
Function types are a bit special - technically a function could be any CLVM value, but not every CLVM value could be a function. And there’s really no way to determine whether a value is a certain function type at compile time. Thus, it’s possible to type guard on a function type, but not check whether a value is a function.
Although, a function type is always assignable to itself or any function type with a superset of its argument and return types.
Conclusion
We can build fully functional Chia puzzles by putting all these pieces together (pun intended). The hope is that you get the same level of flexibility that Chialisp provides, with the benefits of static typing at compile time.
In future blog posts, I’ll cover how the dependency graph, HIR lowerer, and optimizer work. In the meantime, I’ll keep working on the compiler, and getting it closer to a new alpha release. The rewrite has been quite the project, but it’s nearing initial completion.
Thanks for reading!