Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Chapter 2: The Kaleidoscope Dialect

In this chapter, we’ll learn some of pliron’s core IR concepts by defining a dialect for the Kaleidoscope language. As new IR concepts are introduced, you may find it useful to refer to the pliron documentation for details and examples.

The implementation for this chapter lives in examples/kaleidoscope/dialect.rs.

What is a Dialect?

pliron is an extensible compiler IR framework. This means that it does not restrict you to a fixed set of operations (or instructions) or types. Instead, you can define your own operations and types that capture the semantics of your source language.

A dialect is merely a grouping of operations, types, and attributes that define the IR for a particular domain, language or purpose. It serves as the foundation for representing and manipulating programs in that domain.

IR Entities / Structure

Every IR entity (such as an operation or type) is owned by a Context. Almost every method in the framework requires a Context argument to either look up existing entities or create new ones.

An Operation is a basic unit of execution. It has operands (or input values), results (or output values), attributes (or metadata), and may own regions.

A sequential list of operations forms a BasicBlock. At the end of a block, a terminator operation (like ReturnOp or BranchOp) indicates the end of the sequence and a possible transfer of control. Basic-blocks can have one or more arguments.

Operation results and BasicBlock arguments are SSA values. BasicBlock arguments are an alternative (but equivalent and more convenient) form of SSA ϕ-nodes.

The control-flow-graph (i.e., a directed graph of BasicBlocks) is contained within a Region. The Region itself is owned by a parent operation. For example, an IfOp owns two regions: one for the “then” block and one for the “else” block.

These three entities (Operation, BasicBlock, Region) form the core structural IR elements. They are the building blocks for representing programs in pliron.

All of these entities are owned by a Context, and are handled in the IR through a Ptr (e.g., Ptr<Operation>, Ptr<Region>, etc.). A Ptr can be dereferenced to access the underlying entity using the deref and deref_mut methods, which return the underlying Ref or RefMut of the entity. See interior mutability.

Types and Attributes

Every value (BasicBlock operand or Operation result) has a type. Types are first-class IR entities that can be defined by the user. For example, IntegerType in the builtin dialect represents integers. Dialects can define new types as needed.

Attributes allow attaching arbitrary metadata to operations. For example, in a ConstantOp, the constant value is stored as an attribute (e.g., IntegerAttr).

Ops

Operations are the core IR entities that represent instructions or statements in the program. Ops are thin wrappers around the underlying Operation struct, providing “opcode” specific APIs and invariants. For example, a BinOp represents a binary operation (like addition) and has specific attributes to indicate the kind of binary operation (e.g., BinOpKind::Add). You can imagine an Op to look like:

struct MyOp {
    op: Ptr<Operation>
}

impl MyOp {
  ...
}

Defining The Kaleidoscope Dialect

Let’s dive in now and see how we can define a ConstantOp.

ConstantOp: A Simple Example

The constant operation represents a literal integer value in the IR. It has no operands and one result (the constant value). The constant value is stored as an attribute of the operation.

#![allow(unused)]
fn main() {
#[pliron_op(
    name = "kaleidoscope.constant",
    format = "attr($value, $IntegerAttr) ` : ` type($0)",
    interfaces = [NOpdsInterface<0>, OneResultInterface, NResultsInterface<1>],
    attributes = (value: IntegerAttr),
    verifier = "succ",
)]
pub struct ConstantOp;
}

We define ConstantOp as simply an empty struct, annotated with the pliron_op macro. In addition to adding the op: Ptr<Operation> field (that we saw above), this macro fills up boilerplate code required for implementing pliron’s Op trait.

The pliron_op macro has one mandatory argument: A string specifying the dialect and opcode name (e.g., "kaleidoscope.constant").

We then specify, using the attributes field in the macro, that this operation has one attribute named value of type IntegerAttr.

In this example, we’re also specifying the format for the Op. This defines the syntax for the operation when printed as IR. The full specification for the format syntax string is available in the docs. In this case, the format string says that the attribute (named value) containing the constant value (as an IntegerAttr) should be printed first, followed by the result type (after a colon).

Defining a syntax amounts to automatically deriving the Printable and Parsable traits for the operation, which are used by pliron’s IR printer and parser respectively.

The format field can have its string argument skipped, in which case a canonical (default) syntax is used.

Next, we have the interfaces field in the macro. Interfaces are Rust traits that are annotated with the #[op_interface] macro. Interfaces allow capturing common behaviour and invariants across multiple operations. In this case, NOpdsInterface<0> says that ConstantOp has zero operands, and OneResultInterface says that it has exactly one result.

And finally we have the verifier field, which specifies the verifier function for this operation. The verifier is responsible for enforcing invariants not captured by the interfaces that this Op implements. In this case, we can use the built-in succ verifier, which simply returns Ok(()) to indicate that the operation is always valid. For more complex invariants, you can write a custom verifier function by implementing the Verify trait for the Op.

Note: We have not defined IntegerAttr in this dialect. We are using the IntegerAttr defined in the builtin dialect. This is an important point: The IR is not restricted to using only the operations, types, and attributes defined in a single dialect. You can mix and match entities from different dialects as needed.

DeclOp: Variable Declarations

A DeclOp returns a pointer-typed result while storing source-level variable type as a TypeAttr. This represents local allocation of variables. (Analogous to LLVM’s AllocaInst) The PointerType is taken from pliron’s LLVM dialect.

#![allow(unused)]
fn main() {
#[pliron_op(
    name = "kaleidoscope.decl",
    format = "attr($var_type, $TypeAttr) ` : ` type($0)",
    interfaces = [
        NOpdsInterface<0>,
        OneResultInterface,
        NResultsInterface<1>,
        ResultNOfType<0, PointerType>
    ],
    attributes = (var_type: TypeAttr),
    verifier = "succ",
)]
pub struct DeclOp;
}

BinOp: Binary Operations

BinOp has two operands (the left-hand side and right-hand side of the binary operation) and one result (the computed value). The specific binary operation (e.g., add, subtract, multiply, divide) is captured as an attribute (BinOpKind).

#![allow(unused)]
fn main() {
#[pliron_op(
    name = "kaleidoscope.binop",
    format = "$0 ` `attr($kind, $BinOpKind) ` ` $1 ` : ` type($0)",
    interfaces = [
        AtLeastNOpdsInterface<1>,
        AtLeastNResultsInterface<1>,
        NOpdsInterface<2>,
        OneResultInterface,
        NResultsInterface<1>,
        SameOperandsType,
        SameResultsType,
        SameOperandsAndResultType
    ],
    attributes = (kind: BinOpKind),
    verifier = "succ"
)]
pub struct BinOp;
}

Here we show how one can define convenience methods on the typed wrapper (Op):

#![allow(unused)]
fn main() {
impl BinOp {
    /// Creates a new `BinOp` of the specified kind with the given operands.
    pub fn new(ctx: &mut Context, kind: BinOpKind, lhs: Value, rhs: Value) -> Self {
        let result_ty = lhs.get_type(ctx);
        let op = Operation::new(
            ctx,
            Self::get_concrete_op_info(),
            vec![result_ty],
            vec![lhs, rhs],
            vec![],
            0,
        );
        let op = BinOp { op };
        op.set_attr_kind(ctx, kind);
        op
    }

    /// Returns the `BinOpKind` of this `BinOp`.
    pub fn kind(&self, ctx: &Context) -> BinOpKind {
        *self
            .get_attr_kind(ctx)
            .expect("BinOp must carry kind attribute")
    }

    /// Returns the left-hand side operand of this `BinOp`.
    pub fn lhs(&self, ctx: &Context) -> Value {
        self.get_operation().deref(ctx).get_operand(0)
    }

    /// Returns the right-hand side operand of this `BinOp`.
    pub fn rhs(&self, ctx: &Context) -> Value {
        self.get_operation().deref(ctx).get_operand(1)
    }
}
}

CallOp

The CallOp represents a function call. It has a variable number of operands (the arguments to the function) and one result (the return value of the function). The callee function is captured as an attribute (an IdentifierAttr).

#![allow(unused)]
fn main() {
#[pliron_op(
    name = "kaleidoscope.call",
    format = "`@`attr($callee, $IdentifierAttr) `(` operands(CharSpace(`,`)) `)` : type($0)",
    interfaces = [NResultsInterface<1>],
    attributes = (callee: IdentifierAttr),
    verifier = "succ",
)]
pub struct CallOp;
}

Regions and Blocks

Structured control flow is represented through regions and blocks:

IfOp

The IfOp has two regions: one for the “then” block and one for the “else” block. Each region here contains a single block (captured by the SingleBlockRegionInterface), which in turn contains a sequence of operations.

#![allow(unused)]
fn main() {
#[pliron_op(
    name = "kaleidoscope.if",
    format = "$0 ` then ` region($0) ` else ` region($1)",
    interfaces = [
        NOpdsInterface<1>,
        NResultsInterface<0>,
        NRegionsInterface<2>,
        SingleBlockRegionInterface
    ],
    verifier = "succ",
)]
pub struct IfOp;
}

WhileOp

The WhileOp has a single region capturing the loop body. The loop condition is represented as an operand of the op. Here we define the semantics of the condition to be a pointer (to an integer value), captured by the OperandNOfType<0, PointerType> interface specification, where a non-zero value (of the pointee) is considered true and zero is false. The operand being a pointer allows the loop condition to be updated within the loop body. Similar to DeclOp, the pointer type is taken from pliron’s LLVM dialect.

#![allow(unused)]
fn main() {
#[pliron_op(
    name = "kaleidoscope.while",
    format = "`*`$0 ` do ` region($0)",
    interfaces = [
        OperandNOfType<0, PointerType>,
        NResultsInterface<0>,
        NRegionsInterface<1>,
        SingleBlockRegionInterface
    ],
    verifier = "succ",
)]
pub struct WhileOp;
}

YieldOp

The YieldOp serves as a terminator for the body region of WhileOp and IfOp. It has no operands or results, and simply indicates the end of the structured region.

#![allow(unused)]
fn main() {
#[pliron_op(
    name = "kaleidoscope.yield",
    format = "",
    interfaces = [IsTerminatorInterface, NResultsInterface<0>, NOpdsInterface<0>],
    verifier = "succ",
)]
pub struct YieldOp;
}

ReturnOp

The ReturnOp serves as a terminator from a function body region. It has one operand, which is the value being returned from the function.

To illustrate the use of a custom verifier, ReturnOp implements the Verify trait to enforce the invariant that it must have exactly one operand (this could also have been enforced declaratively using NOpdsInterface<1>).

#![allow(unused)]
fn main() {
#[pliron_op(
    name = "kaleidoscope.return",
    format = "$0",
    interfaces = [IsTerminatorInterface],
)]
pub struct ReturnOp;
}

The custom verifier:

#![allow(unused)]
fn main() {
impl Verify for ReturnOp {
    fn verify(&self, ctx: &Context) -> Result<()>
    where
        Self: Sized,
    {
        let n_operands = self.get_operation().deref(ctx).get_num_operands();
        if n_operands != 1 {
            return verify_err!(
                self.loc(ctx),
                "kaleidoscope.return expects exactly 1 operand, found {}",
                n_operands
            );
        }
        Ok(())
    }
}
}

Construction Example

In this section, we manually construct IR, using pliron APIs and utilities for a specific simple Fibonacci function. This is to illustrate how the IR is built up from the basic entities (operations, blocks, regions) and how the dialect ops are used in practice. The next chapter shows how to do this systematically by translating (lowering) the AST into this IR.

Pseudocode in Kaleidoscope for the function being generated:

#![allow(unused)]
fn main() {
// Kaleidoscope pseudocode for the generated function:
//
// def main() {
//   var a = 0;
//   var b = 1;
//   var i = 0;
//   var n = 10;
//
//   while i < n {
//     var tmp = a + b;
//     a = b;
//     b = tmp;
//     i = i + 1;
//   }
//
//   return b;
// }
}

Rather than reading the entire construction test as one block, it is easier to follow it in the same order as the IR is assembled.

1. Set up the context, module, and entry block

First we create the Context, define the function type, create a module and a main function, and position an inserter at the end of the entry block.

#![allow(unused)]
fn main() {
        let ctx = &mut Context::new();
        let i64_ty = IntegerType::get(ctx, 64, Signedness::Signless);

        let module = ModuleOp::new(ctx, "fib_test".try_into().expect("valid module name"));
        let main_ty = FunctionType::get(ctx, vec![], vec![i64_ty.into()]);
        let main_fn = FuncOp::new(
            ctx,
            "main".try_into().expect("valid function name"),
            main_ty,
        );
        module.append_operation(ctx, main_fn.get_operation(), 0);

        let entry = main_fn.get_entry_block(ctx);
        let mut ins = OpInserter::new_at_block_end(entry);
}

At this point we have the outer container structure in place, but the function body is still empty.

2. Allocate mutable slots for the variables

The Fibonacci implementation uses mutable local state (a, b, tmp, i, n, and the loop-condition slot), so the next step is to create DeclOps for each one.

#![allow(unused)]
fn main() {
        // Mutable slots: a, b, tmp, i, and the iteration limit n.
        let slot_a = DeclOp::new(ctx, i64_ty.into());
        set_operation_result_name(ctx, slot_a.get_operation(), 0, "a".try_into().ok());
        ins.append_op(ctx, &slot_a);
        let slot_b = DeclOp::new(ctx, i64_ty.into());
        set_operation_result_name(ctx, slot_b.get_operation(), 0, "b".try_into().ok());
        ins.append_op(ctx, &slot_b);
        let slot_tmp = DeclOp::new(ctx, i64_ty.into());
        set_operation_result_name(ctx, slot_tmp.get_operation(), 0, "tmp".try_into().ok());
        ins.append_op(ctx, &slot_tmp);
        let slot_i = DeclOp::new(ctx, i64_ty.into());
        set_operation_result_name(ctx, slot_i.get_operation(), 0, "i".try_into().ok());
        ins.append_op(ctx, &slot_i);
        let slot_n = DeclOp::new(ctx, i64_ty.into());
        set_operation_result_name(ctx, slot_n.get_operation(), 0, "n".try_into().ok());
        ins.append_op(ctx, &slot_n);
        // A mutable slot for the loop condition (while cond_ptr do ...).
        let slot_cond_ptr = DeclOp::new(ctx, i64_ty.into());
        set_operation_result_name(
            ctx,
            slot_cond_ptr.get_operation(),
            0,
            "cond_ptr".try_into().ok(),
        );
        ins.append_op(ctx, &slot_cond_ptr);
}

Notice that we also attach result names such as a and b. These names are not required for correctness, but they make the printed IR much easier to read.

3. Initialize the state and create the loop shell

Next we materialize the constants, store the initial values into the mutable slots, compute the initial loop condition i < n, and build the WhileOp itself.

#![allow(unused)]
fn main() {
        // Initialize: a=0, b=1, i=0, n=10.
        let c0 = ConstantOp::new_i64(ctx, 0);
        ins.append_op(ctx, &c0);
        let c1 = ConstantOp::new_i64(ctx, 1);
        ins.append_op(ctx, &c1);
        let c10 = ConstantOp::new_i64(ctx, 10);
        ins.append_op(ctx, &c10);

        let store_a0 = StoreOp::new(ctx, slot_a.get_result(ctx), c0.get_result(ctx));
        ins.append_op(ctx, &store_a0);
        let store_b1 = StoreOp::new(ctx, slot_b.get_result(ctx), c1.get_result(ctx));
        ins.append_op(ctx, &store_b1);
        let store_i0 = StoreOp::new(ctx, slot_i.get_result(ctx), c0.get_result(ctx));
        ins.append_op(ctx, &store_i0);
        let store_n10 = StoreOp::new(ctx, slot_n.get_result(ctx), c10.get_result(ctx));
        ins.append_op(ctx, &store_n10);

        // Loop condition i < n.
        let i_before = LoadOp::new(ctx, slot_i.get_result(ctx), i64_ty.into());
        ins.append_op(ctx, &i_before);
        let n_before = LoadOp::new(ctx, slot_n.get_result(ctx), i64_ty.into());
        ins.append_op(ctx, &n_before);
        let cond = BinOp::new(
            ctx,
            BinOpKind::Lt,
            i_before.get_result(ctx),
            n_before.get_result(ctx),
        );
        ins.append_op(ctx, &cond);
        let store_cond_ptr = StoreOp::new(ctx, slot_cond_ptr.get_result(ctx), cond.get_result(ctx));
        ins.append_op(ctx, &store_cond_ptr);

        let while_op = WhileOp::new(ctx, slot_cond_ptr.get_result(ctx));
        ins.append_op(ctx, &while_op);
}

The important detail here is that WhileOp takes a pointer to the condition slot, not the condition value directly. That lets the loop body recompute and update the condition on every iteration.

4. Fill in the loop body region

Once the WhileOp exists, we create its body block and populate it with the actual Fibonacci update logic.

#![allow(unused)]
fn main() {
        let while_region = while_op.body_region(ctx);
        let while_block = BasicBlock::new(
            ctx,
            Some("while_body".try_into().expect("valid block name")),
            vec![],
        );
        while_block.insert_at_front(while_region, ctx);
        let mut while_ins = OpInserter::new_at_block_end(while_block);

        // tmp = a + b; a = b; b = tmp; i = i + 1
        let a_val = LoadOp::new(ctx, slot_a.get_result(ctx), i64_ty.into());
        while_ins.append_op(ctx, &a_val);
        let b_val = LoadOp::new(ctx, slot_b.get_result(ctx), i64_ty.into());
        while_ins.append_op(ctx, &b_val);
        let sum = BinOp::new(
            ctx,
            BinOpKind::Add,
            a_val.get_result(ctx),
            b_val.get_result(ctx),
        );
        while_ins.append_op(ctx, &sum);
        let store_tmp = StoreOp::new(ctx, slot_tmp.get_result(ctx), sum.get_result(ctx));
        while_ins.append_op(ctx, &store_tmp);

        let b_for_a = LoadOp::new(ctx, slot_b.get_result(ctx), i64_ty.into());
        while_ins.append_op(ctx, &b_for_a);
        let store_a = StoreOp::new(ctx, slot_a.get_result(ctx), b_for_a.get_result(ctx));
        while_ins.append_op(ctx, &store_a);

        let tmp_for_b = LoadOp::new(ctx, slot_tmp.get_result(ctx), i64_ty.into());
        while_ins.append_op(ctx, &tmp_for_b);
        let store_b = StoreOp::new(ctx, slot_b.get_result(ctx), tmp_for_b.get_result(ctx));
        while_ins.append_op(ctx, &store_b);

        let i_val = LoadOp::new(ctx, slot_i.get_result(ctx), i64_ty.into());
        while_ins.append_op(ctx, &i_val);
        let one_step = ConstantOp::new_i64(ctx, 1);
        while_ins.append_op(ctx, &one_step);
        let i_next = BinOp::new(
            ctx,
            BinOpKind::Add,
            i_val.get_result(ctx),
            one_step.get_result(ctx),
        );
        while_ins.append_op(ctx, &i_next);
        let store_i = StoreOp::new(ctx, slot_i.get_result(ctx), i_next.get_result(ctx));
        while_ins.append_op(ctx, &store_i);

        // i is updated, now do the comparison again
        let i_inc = LoadOp::new(ctx, slot_i.get_result(ctx), i64_ty.into());
        while_ins.append_op(ctx, &i_inc);
        let n_for_cond = LoadOp::new(ctx, slot_n.get_result(ctx), i64_ty.into());
        while_ins.append_op(ctx, &n_for_cond);
        let cond_next = BinOp::new(
            ctx,
            BinOpKind::Lt,
            i_inc.get_result(ctx),
            n_for_cond.get_result(ctx),
        );
        while_ins.append_op(ctx, &cond_next);
        let store_cond = StoreOp::new(
            ctx,
            slot_cond_ptr.get_result(ctx),
            cond_next.get_result(ctx),
        );
        while_ins.append_op(ctx, &store_cond);

        let while_yield = YieldOp::new(ctx);
        while_ins.append_op(ctx, &while_yield);
}

This block mirrors the source program closely: compute tmp = a + b, rotate a and b, increment i, recompute the loop condition, then end the region with YieldOp.

5. Return the result and verify the IR

After the loop, the function loads the final value of b, returns it, and runs verification on the constructed module.

#![allow(unused)]
fn main() {
        let fib_result = LoadOp::new(ctx, slot_b.get_result(ctx), i64_ty.into());
        ins.append_op(ctx, &fib_result);
        let ret = ReturnOp::new(ctx, fib_result.get_result(ctx));
        ins.append_op(ctx, &ret);

        verify_op(&module, ctx).expect("constructed fibonacci IR should verify");
        println!("{}", module.get_operation().disp(ctx));
}

This last step is worth keeping in mind whenever you construct IR manually: verification is the cheapest way to catch structural mistakes early.

Try it out

cargo test --example kaleidoscope -- build_fib_example --show-output

This prints Kaleidoscope IR for the Fibonacci function shown above. We suggest studying the test and the printed IR together to understand how the dialect ops are used to construct the IR.

Exercise: Try adding a test to build the factorial function similarly and print its IR.

Next step

In this chapter we manually built the IR for a specific Kaleidoscope program. Chapter 3 shows how to lower the AST into this IR, which means we can parse any Kaleidoscope program into an AST and then lower it into IR.