Chapter 2: The Kaleidoscope Dialect
In this chapter, we’ll learn some of pliron’s core IR concepts
by defining a dialect for the Kaleidoscope language. As new IR
concepts are introduced, you may find it useful to refer to the
pliron documentation
for details and examples.
The implementation for this chapter lives in examples/kaleidoscope/dialect.rs.
What is a Dialect?
pliron is an extensible compiler IR framework. This means that it does not
restrict you to a fixed set of operations (or instructions) or types.
Instead, you can define your own operations and types that capture the
semantics of your source language.
A dialect is merely a grouping of operations, types, and attributes that define the IR for a particular domain, language or purpose. It serves as the foundation for representing and manipulating programs in that domain.
IR Entities / Structure
Every IR entity (such as an operation or type) is owned by a Context.
Almost every method in the framework requires a Context argument to
either look up existing entities or create new ones.
An Operation is a basic unit of execution. It has operands (or input values),
results (or output values), attributes (or metadata), and may own regions.
A sequential list of operations forms a BasicBlock. At the end of a block,
a terminator operation (like ReturnOp or BranchOp) indicates the end
of the sequence and a possible transfer of control. Basic-blocks can have
one or more arguments.
Operation results and BasicBlock arguments are
SSA values.
BasicBlock arguments are an alternative (but equivalent and more convenient)
form of SSA ϕ-nodes.
The control-flow-graph (i.e., a directed graph of BasicBlocks) is contained
within a Region. The Region itself is owned by a parent operation.
For example, an IfOp owns two regions: one for the “then” block and one for
the “else” block.
These three entities (Operation, BasicBlock, Region) form the core structural IR
elements. They are the building blocks for representing programs in pliron.
All of these entities are owned by a Context, and are handled in the IR
through a Ptr (e.g., Ptr<Operation>, Ptr<Region>, etc.). A Ptr
can be dereferenced to access the underlying entity using the deref
and deref_mut methods, which return the underlying Ref or RefMut
of the entity. See interior mutability.
Types and Attributes
Every value (BasicBlock operand or Operation result) has a type.
Types are first-class IR entities that can be defined by the user.
For example, IntegerType in the builtin dialect represents integers.
Dialects can define new types as needed.
Attributes allow attaching arbitrary metadata to operations. For example,
in a ConstantOp, the constant value is stored as an attribute (e.g., IntegerAttr).
Ops
Operations are the core IR entities that represent instructions or statements in the program.
Ops are thin wrappers around the underlying Operation struct, providing “opcode”
specific APIs and invariants. For example, a BinOp represents a binary operation (like addition)
and has specific attributes to indicate the kind of binary operation (e.g., BinOpKind::Add).
You can imagine an Op to look like:
struct MyOp {
op: Ptr<Operation>
}
impl MyOp {
...
}
Defining The Kaleidoscope Dialect
Let’s dive in now and see how we can define a ConstantOp.
ConstantOp: A Simple Example
The constant operation represents a literal integer value in the IR. It has no operands and one result (the constant value). The constant value is stored as an attribute of the operation.
#![allow(unused)]
fn main() {
#[pliron_op(
name = "kaleidoscope.constant",
format = "attr($value, $IntegerAttr) ` : ` type($0)",
interfaces = [NOpdsInterface<0>, OneResultInterface, NResultsInterface<1>],
attributes = (value: IntegerAttr),
verifier = "succ",
)]
pub struct ConstantOp;
}
We define ConstantOp as simply an empty struct, annotated with the pliron_op
macro. In addition to adding the op: Ptr<Operation> field (that we saw above),
this macro fills up boilerplate code required for implementing pliron’s
Op trait.
The pliron_op macro has one mandatory argument: A string specifying the dialect
and opcode name (e.g., "kaleidoscope.constant").
We then specify, using the attributes field in the macro, that this operation has
one attribute named value of type IntegerAttr.
In this example, we’re also specifying the format for the Op. This defines
the syntax for the operation when printed as IR. The full specification for the format
syntax string is available in the docs. In this case, the format string says that
the attribute (named value) containing the constant value (as an IntegerAttr)
should be printed first, followed by the result type (after a colon).
Defining a syntax amounts to automatically deriving the Printable and Parsable
traits for the operation, which are used by pliron’s IR printer and parser respectively.
The format field can have its string argument skipped, in which case a canonical
(default) syntax is used.
Next, we have the interfaces field in the macro. Interfaces are Rust
traits that are annotated with the #[op_interface] macro. Interfaces allow
capturing common behaviour and invariants across multiple operations.
In this case, NOpdsInterface<0> says that ConstantOp has zero operands,
and OneResultInterface says that it has exactly one result.
And finally we have the verifier field, which specifies the verifier function for
this operation. The verifier is responsible for enforcing invariants not captured
by the interfaces that this Op implements. In this case, we can use the built-in succ
verifier, which simply returns Ok(()) to indicate that the operation is always valid.
For more complex invariants, you can write a custom verifier function by implementing
the Verify trait for the Op.
Note: We have not defined IntegerAttr in this dialect. We are using the IntegerAttr
defined in the builtin dialect. This is an important point: The IR is not restricted
to using only the operations, types, and attributes defined in a single dialect. You can
mix and match entities from different dialects as needed.
DeclOp: Variable Declarations
A DeclOp returns a pointer-typed result while storing source-level variable type as a TypeAttr.
This represents local allocation of variables. (Analogous to LLVM’s AllocaInst)
The PointerType is taken from pliron’s LLVM dialect.
#![allow(unused)]
fn main() {
#[pliron_op(
name = "kaleidoscope.decl",
format = "attr($var_type, $TypeAttr) ` : ` type($0)",
interfaces = [
NOpdsInterface<0>,
OneResultInterface,
NResultsInterface<1>,
ResultNOfType<0, PointerType>
],
attributes = (var_type: TypeAttr),
verifier = "succ",
)]
pub struct DeclOp;
}
BinOp: Binary Operations
BinOp has two operands (the left-hand side and right-hand side of the binary operation)
and one result (the computed value). The specific binary operation (e.g., add, subtract,
multiply, divide) is captured as an attribute (BinOpKind).
#![allow(unused)]
fn main() {
#[pliron_op(
name = "kaleidoscope.binop",
format = "$0 ` `attr($kind, $BinOpKind) ` ` $1 ` : ` type($0)",
interfaces = [
AtLeastNOpdsInterface<1>,
AtLeastNResultsInterface<1>,
NOpdsInterface<2>,
OneResultInterface,
NResultsInterface<1>,
SameOperandsType,
SameResultsType,
SameOperandsAndResultType
],
attributes = (kind: BinOpKind),
verifier = "succ"
)]
pub struct BinOp;
}
Here we show how one can define convenience methods on the typed wrapper (Op):
#![allow(unused)]
fn main() {
impl BinOp {
/// Creates a new `BinOp` of the specified kind with the given operands.
pub fn new(ctx: &mut Context, kind: BinOpKind, lhs: Value, rhs: Value) -> Self {
let result_ty = lhs.get_type(ctx);
let op = Operation::new(
ctx,
Self::get_concrete_op_info(),
vec![result_ty],
vec![lhs, rhs],
vec![],
0,
);
let op = BinOp { op };
op.set_attr_kind(ctx, kind);
op
}
/// Returns the `BinOpKind` of this `BinOp`.
pub fn kind(&self, ctx: &Context) -> BinOpKind {
*self
.get_attr_kind(ctx)
.expect("BinOp must carry kind attribute")
}
/// Returns the left-hand side operand of this `BinOp`.
pub fn lhs(&self, ctx: &Context) -> Value {
self.get_operation().deref(ctx).get_operand(0)
}
/// Returns the right-hand side operand of this `BinOp`.
pub fn rhs(&self, ctx: &Context) -> Value {
self.get_operation().deref(ctx).get_operand(1)
}
}
}
CallOp
The CallOp represents a function call. It has a variable number of operands (the arguments to the function) and one result (the return value of the function). The callee function is captured as an attribute (an IdentifierAttr).
#![allow(unused)]
fn main() {
#[pliron_op(
name = "kaleidoscope.call",
format = "`@`attr($callee, $IdentifierAttr) `(` operands(CharSpace(`,`)) `)` : type($0)",
interfaces = [NResultsInterface<1>],
attributes = (callee: IdentifierAttr),
verifier = "succ",
)]
pub struct CallOp;
}
Regions and Blocks
Structured control flow is represented through regions and blocks:
IfOp
The IfOp has two regions: one for the “then” block and one for the “else” block.
Each region here contains a single block (captured by the SingleBlockRegionInterface),
which in turn contains a sequence of operations.
#![allow(unused)]
fn main() {
#[pliron_op(
name = "kaleidoscope.if",
format = "$0 ` then ` region($0) ` else ` region($1)",
interfaces = [
NOpdsInterface<1>,
NResultsInterface<0>,
NRegionsInterface<2>,
SingleBlockRegionInterface
],
verifier = "succ",
)]
pub struct IfOp;
}
WhileOp
The WhileOp has a single region capturing the loop body. The loop condition is represented
as an operand of the op. Here we define the semantics of the condition to be a pointer
(to an integer value), captured by the OperandNOfType<0, PointerType> interface specification,
where a non-zero value (of the pointee) is considered true and zero is false.
The operand being a pointer allows the loop condition to be updated within the loop body.
Similar to DeclOp, the pointer type is taken from pliron’s LLVM dialect.
#![allow(unused)]
fn main() {
#[pliron_op(
name = "kaleidoscope.while",
format = "`*`$0 ` do ` region($0)",
interfaces = [
OperandNOfType<0, PointerType>,
NResultsInterface<0>,
NRegionsInterface<1>,
SingleBlockRegionInterface
],
verifier = "succ",
)]
pub struct WhileOp;
}
YieldOp
The YieldOp serves as a terminator for the body region of WhileOp and IfOp.
It has no operands or results, and simply indicates the end of the structured region.
#![allow(unused)]
fn main() {
#[pliron_op(
name = "kaleidoscope.yield",
format = "",
interfaces = [IsTerminatorInterface, NResultsInterface<0>, NOpdsInterface<0>],
verifier = "succ",
)]
pub struct YieldOp;
}
ReturnOp
The ReturnOp serves as a terminator from a function body region.
It has one operand, which is the value being returned from the function.
To illustrate the use of a custom verifier, ReturnOp implements the
Verify trait to enforce the invariant that it must have exactly one operand
(this could also have been enforced declaratively using NOpdsInterface<1>).
#![allow(unused)]
fn main() {
#[pliron_op(
name = "kaleidoscope.return",
format = "$0",
interfaces = [IsTerminatorInterface],
)]
pub struct ReturnOp;
}
The custom verifier:
#![allow(unused)]
fn main() {
impl Verify for ReturnOp {
fn verify(&self, ctx: &Context) -> Result<()>
where
Self: Sized,
{
let n_operands = self.get_operation().deref(ctx).get_num_operands();
if n_operands != 1 {
return verify_err!(
self.loc(ctx),
"kaleidoscope.return expects exactly 1 operand, found {}",
n_operands
);
}
Ok(())
}
}
}
Construction Example
In this section, we manually construct IR, using pliron APIs and utilities for a specific simple Fibonacci function. This is to illustrate how the IR is built up from the basic entities (operations, blocks, regions) and how the dialect ops are used in practice. The next chapter shows how to do this systematically by translating (lowering) the AST into this IR.
Pseudocode in Kaleidoscope for the function being generated:
#![allow(unused)]
fn main() {
// Kaleidoscope pseudocode for the generated function:
//
// def main() {
// var a = 0;
// var b = 1;
// var i = 0;
// var n = 10;
//
// while i < n {
// var tmp = a + b;
// a = b;
// b = tmp;
// i = i + 1;
// }
//
// return b;
// }
}
Rather than reading the entire construction test as one block, it is easier to follow it in the same order as the IR is assembled.
1. Set up the context, module, and entry block
First we create the Context, define the function type, create a module and a
main function, and position an inserter at the end of the entry block.
#![allow(unused)]
fn main() {
let ctx = &mut Context::new();
let i64_ty = IntegerType::get(ctx, 64, Signedness::Signless);
let module = ModuleOp::new(ctx, "fib_test".try_into().expect("valid module name"));
let main_ty = FunctionType::get(ctx, vec![], vec![i64_ty.into()]);
let main_fn = FuncOp::new(
ctx,
"main".try_into().expect("valid function name"),
main_ty,
);
module.append_operation(ctx, main_fn.get_operation(), 0);
let entry = main_fn.get_entry_block(ctx);
let mut ins = OpInserter::new_at_block_end(entry);
}
At this point we have the outer container structure in place, but the function body is still empty.
2. Allocate mutable slots for the variables
The Fibonacci implementation uses mutable local state (a, b, tmp, i,
n, and the loop-condition slot), so the next step is to create DeclOps for
each one.
#![allow(unused)]
fn main() {
// Mutable slots: a, b, tmp, i, and the iteration limit n.
let slot_a = DeclOp::new(ctx, i64_ty.into());
set_operation_result_name(ctx, slot_a.get_operation(), 0, "a".try_into().ok());
ins.append_op(ctx, &slot_a);
let slot_b = DeclOp::new(ctx, i64_ty.into());
set_operation_result_name(ctx, slot_b.get_operation(), 0, "b".try_into().ok());
ins.append_op(ctx, &slot_b);
let slot_tmp = DeclOp::new(ctx, i64_ty.into());
set_operation_result_name(ctx, slot_tmp.get_operation(), 0, "tmp".try_into().ok());
ins.append_op(ctx, &slot_tmp);
let slot_i = DeclOp::new(ctx, i64_ty.into());
set_operation_result_name(ctx, slot_i.get_operation(), 0, "i".try_into().ok());
ins.append_op(ctx, &slot_i);
let slot_n = DeclOp::new(ctx, i64_ty.into());
set_operation_result_name(ctx, slot_n.get_operation(), 0, "n".try_into().ok());
ins.append_op(ctx, &slot_n);
// A mutable slot for the loop condition (while cond_ptr do ...).
let slot_cond_ptr = DeclOp::new(ctx, i64_ty.into());
set_operation_result_name(
ctx,
slot_cond_ptr.get_operation(),
0,
"cond_ptr".try_into().ok(),
);
ins.append_op(ctx, &slot_cond_ptr);
}
Notice that we also attach result names such as a and b. These names are
not required for correctness, but they make the printed IR much easier to read.
3. Initialize the state and create the loop shell
Next we materialize the constants, store the initial values into the mutable
slots, compute the initial loop condition i < n, and build the WhileOp
itself.
#![allow(unused)]
fn main() {
// Initialize: a=0, b=1, i=0, n=10.
let c0 = ConstantOp::new_i64(ctx, 0);
ins.append_op(ctx, &c0);
let c1 = ConstantOp::new_i64(ctx, 1);
ins.append_op(ctx, &c1);
let c10 = ConstantOp::new_i64(ctx, 10);
ins.append_op(ctx, &c10);
let store_a0 = StoreOp::new(ctx, slot_a.get_result(ctx), c0.get_result(ctx));
ins.append_op(ctx, &store_a0);
let store_b1 = StoreOp::new(ctx, slot_b.get_result(ctx), c1.get_result(ctx));
ins.append_op(ctx, &store_b1);
let store_i0 = StoreOp::new(ctx, slot_i.get_result(ctx), c0.get_result(ctx));
ins.append_op(ctx, &store_i0);
let store_n10 = StoreOp::new(ctx, slot_n.get_result(ctx), c10.get_result(ctx));
ins.append_op(ctx, &store_n10);
// Loop condition i < n.
let i_before = LoadOp::new(ctx, slot_i.get_result(ctx), i64_ty.into());
ins.append_op(ctx, &i_before);
let n_before = LoadOp::new(ctx, slot_n.get_result(ctx), i64_ty.into());
ins.append_op(ctx, &n_before);
let cond = BinOp::new(
ctx,
BinOpKind::Lt,
i_before.get_result(ctx),
n_before.get_result(ctx),
);
ins.append_op(ctx, &cond);
let store_cond_ptr = StoreOp::new(ctx, slot_cond_ptr.get_result(ctx), cond.get_result(ctx));
ins.append_op(ctx, &store_cond_ptr);
let while_op = WhileOp::new(ctx, slot_cond_ptr.get_result(ctx));
ins.append_op(ctx, &while_op);
}
The important detail here is that WhileOp takes a pointer to the condition
slot, not the condition value directly. That lets the loop body recompute and
update the condition on every iteration.
4. Fill in the loop body region
Once the WhileOp exists, we create its body block and populate it with the
actual Fibonacci update logic.
#![allow(unused)]
fn main() {
let while_region = while_op.body_region(ctx);
let while_block = BasicBlock::new(
ctx,
Some("while_body".try_into().expect("valid block name")),
vec![],
);
while_block.insert_at_front(while_region, ctx);
let mut while_ins = OpInserter::new_at_block_end(while_block);
// tmp = a + b; a = b; b = tmp; i = i + 1
let a_val = LoadOp::new(ctx, slot_a.get_result(ctx), i64_ty.into());
while_ins.append_op(ctx, &a_val);
let b_val = LoadOp::new(ctx, slot_b.get_result(ctx), i64_ty.into());
while_ins.append_op(ctx, &b_val);
let sum = BinOp::new(
ctx,
BinOpKind::Add,
a_val.get_result(ctx),
b_val.get_result(ctx),
);
while_ins.append_op(ctx, &sum);
let store_tmp = StoreOp::new(ctx, slot_tmp.get_result(ctx), sum.get_result(ctx));
while_ins.append_op(ctx, &store_tmp);
let b_for_a = LoadOp::new(ctx, slot_b.get_result(ctx), i64_ty.into());
while_ins.append_op(ctx, &b_for_a);
let store_a = StoreOp::new(ctx, slot_a.get_result(ctx), b_for_a.get_result(ctx));
while_ins.append_op(ctx, &store_a);
let tmp_for_b = LoadOp::new(ctx, slot_tmp.get_result(ctx), i64_ty.into());
while_ins.append_op(ctx, &tmp_for_b);
let store_b = StoreOp::new(ctx, slot_b.get_result(ctx), tmp_for_b.get_result(ctx));
while_ins.append_op(ctx, &store_b);
let i_val = LoadOp::new(ctx, slot_i.get_result(ctx), i64_ty.into());
while_ins.append_op(ctx, &i_val);
let one_step = ConstantOp::new_i64(ctx, 1);
while_ins.append_op(ctx, &one_step);
let i_next = BinOp::new(
ctx,
BinOpKind::Add,
i_val.get_result(ctx),
one_step.get_result(ctx),
);
while_ins.append_op(ctx, &i_next);
let store_i = StoreOp::new(ctx, slot_i.get_result(ctx), i_next.get_result(ctx));
while_ins.append_op(ctx, &store_i);
// i is updated, now do the comparison again
let i_inc = LoadOp::new(ctx, slot_i.get_result(ctx), i64_ty.into());
while_ins.append_op(ctx, &i_inc);
let n_for_cond = LoadOp::new(ctx, slot_n.get_result(ctx), i64_ty.into());
while_ins.append_op(ctx, &n_for_cond);
let cond_next = BinOp::new(
ctx,
BinOpKind::Lt,
i_inc.get_result(ctx),
n_for_cond.get_result(ctx),
);
while_ins.append_op(ctx, &cond_next);
let store_cond = StoreOp::new(
ctx,
slot_cond_ptr.get_result(ctx),
cond_next.get_result(ctx),
);
while_ins.append_op(ctx, &store_cond);
let while_yield = YieldOp::new(ctx);
while_ins.append_op(ctx, &while_yield);
}
This block mirrors the source program closely: compute tmp = a + b, rotate
a and b, increment i, recompute the loop condition, then end the region
with YieldOp.
5. Return the result and verify the IR
After the loop, the function loads the final value of b, returns it, and runs
verification on the constructed module.
#![allow(unused)]
fn main() {
let fib_result = LoadOp::new(ctx, slot_b.get_result(ctx), i64_ty.into());
ins.append_op(ctx, &fib_result);
let ret = ReturnOp::new(ctx, fib_result.get_result(ctx));
ins.append_op(ctx, &ret);
verify_op(&module, ctx).expect("constructed fibonacci IR should verify");
println!("{}", module.get_operation().disp(ctx));
}
This last step is worth keeping in mind whenever you construct IR manually: verification is the cheapest way to catch structural mistakes early.
Try it out
cargo test --example kaleidoscope -- build_fib_example --show-output
This prints Kaleidoscope IR for the Fibonacci function shown above. We suggest studying the test and the printed IR together to understand how the dialect ops are used to construct the IR.
Exercise: Try adding a test to build the factorial function similarly and print its IR.
Next step
In this chapter we manually built the IR for a specific Kaleidoscope program. Chapter 3 shows how to lower the AST into this IR, which means we can parse any Kaleidoscope program into an AST and then lower it into IR.