Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Chapter 3: Lowering the AST to IR

In Chapter 2 we hand-built IR for a specific Fibonacci function. This chapter shows how to do that systematically for any Kaleidoscope program by lowering the AST produced in Chapter 1 into the dialect ops defined in Chapter 2.

The implementation for this chapter lives in examples/kaleidoscope/from_ast.rs

Design

Every Kaleidoscope value is a 64-bit signless integer (i64). The key design decisions are:

  • All variables are memory-backed. Every local variable and every function parameter is backed by a DeclOp slot (analogous to LLVM’s alloca). Reads become LoadOps; writes become StoreOps. This avoids having to worry about SSA form.
  • Control flow uses regions. IfOp owns two regions (then / else) and WhileOp owns one region (the loop body). Each region gets its own new BasicBlock.
  • WhileOp holds a condition pointer. Rather than threading a boolean value as an argument, the condition is stored in a dedicated DeclOp slot in the outer block and updated at the end of each loop iteration.

Entry point

The public API is a single function. It takes an AST Function node and produces a FuncOp inside the provided Context:

#![allow(unused)]
fn main() {
pub fn lower_function(ctx: &mut Context, func: &Function) -> Result<FuncOp> {
    let i64_ty = IntegerType::get(ctx, 64, Signedness::Signless);

    // Build the function type: all params are i64, single i64 return.
    let param_tys: Vec<_> = func.params.iter().map(|_| i64_ty.into()).collect();
    let func_ty = FunctionType::get(ctx, param_tys, vec![i64_ty.into()]);

    let func_op = FuncOp::new(
        ctx,
        func.name
            .as_str()
            .try_into()
            .expect("invalid function name"),
        func_ty,
    );

    let entry = func_op.get_entry_block(ctx);
    let mut ins = OpInserter::new_at_block_end(entry);
    let mut var_map = VarMap::default();

    // Spill each function parameter (block argument) into a mutable DeclOp slot.
    for (idx, param_name) in func.params.iter().enumerate() {
        let param_val = entry.deref(ctx).get_argument(idx);
        let slot = DeclOp::new(ctx, i64_ty.into());
        let slot_val = slot.get_result(ctx);
        ins.append_op(ctx, &slot);
        let store = StoreOp::new(ctx, slot_val, param_val);
        ins.append_op(ctx, &store);
        var_map.insert(param_name.clone(), slot_val);
    }

    lower_stmts(ctx, &mut ins, &mut var_map, &func.body)?;

    // If no terminator was emitted (e.g., function ends with an `if` where both
    // branches return), add a fallback return of 0 to satisfy the verifier.
    if entry.deref(ctx).get_terminator(ctx).is_none() {
        let zero = ConstantOp::new_i64(ctx, 0);
        let zero_val = zero.get_result(ctx);
        ins.append_op(ctx, &zero);
        let ret = ReturnOp::new(ctx, zero_val);
        ins.append_op(ctx, &ret);
    }
    Ok(func_op)
}
}

Two type aliases keep the rest of the code concise:

#![allow(unused)]
fn main() {
/// Inserter type used throughout this module.
type OpInserter = IRInserter<DummyListener>;

/// Maps variable names to the slot-pointer [`Value`] produced by their [`DeclOp`].
type VarMap = FxHashMap<String, Value>;
}

OpInserter is an IRInserter with a no-op listener. It tracks a cursor position inside a block and appends ops there via append_op. VarMap maps source-level variable names to the Value (result of a DeclOp) that holds the variable’s storage slot.

Creating the FuncOp

lower_function first creates an empty FuncOp with the right type, then obtains the entry block and initialises the inserter at its end:

#![allow(unused)]
fn main() {
pub fn lower_function(ctx: &mut Context, func: &Function) -> Result<FuncOp> {
    let i64_ty = IntegerType::get(ctx, 64, Signedness::Signless);

    // Build the function type: all params are i64, single i64 return.
    let param_tys: Vec<_> = func.params.iter().map(|_| i64_ty.into()).collect();
    let func_ty = FunctionType::get(ctx, param_tys, vec![i64_ty.into()]);

    let func_op = FuncOp::new(
        ctx,
        func.name
            .as_str()
            .try_into()
            .expect("invalid function name"),
        func_ty,
    );

    let entry = func_op.get_entry_block(ctx);
    let mut ins = OpInserter::new_at_block_end(entry);
    let mut var_map = VarMap::default();

    // Spill each function parameter (block argument) into a mutable DeclOp slot.
    for (idx, param_name) in func.params.iter().enumerate() {
        let param_val = entry.deref(ctx).get_argument(idx);
        let slot = DeclOp::new(ctx, i64_ty.into());
        let slot_val = slot.get_result(ctx);
        ins.append_op(ctx, &slot);
        let store = StoreOp::new(ctx, slot_val, param_val);
        ins.append_op(ctx, &store);
        var_map.insert(param_name.clone(), slot_val);
    }

    lower_stmts(ctx, &mut ins, &mut var_map, &func.body)?;

    // If no terminator was emitted (e.g., function ends with an `if` where both
    // branches return), add a fallback return of 0 to satisfy the verifier.
    if entry.deref(ctx).get_terminator(ctx).is_none() {
        let zero = ConstantOp::new_i64(ctx, 0);
        let zero_val = zero.get_result(ctx);
        ins.append_op(ctx, &zero);
        let ret = ReturnOp::new(ctx, zero_val);
        ins.append_op(ctx, &ret);
    }
    Ok(func_op)
}
}

(Only the setup and fallback parts are discussed in detail below.)

Spilling parameters

Every function parameter arrives as a basic-block argument. To allow mutation later, each argument is immediately spilled into a DeclOp slot:

#![allow(unused)]
fn main() {
    // Spill each function parameter (block argument) into a mutable DeclOp slot.
    for (idx, param_name) in func.params.iter().enumerate() {
        let param_val = entry.deref(ctx).get_argument(idx);
        let slot = DeclOp::new(ctx, i64_ty.into());
        let slot_val = slot.get_result(ctx);
        ins.append_op(ctx, &slot);
        let store = StoreOp::new(ctx, slot_val, param_val);
        ins.append_op(ctx, &store);
        var_map.insert(param_name.clone(), slot_val);
    }
}

entry.deref(ctx).get_argument(idx) returns the Value of the idx-th block argument. DeclOp::new_with_type allocates a pointer-typed slot; StoreOp writes the initial value into it.

Fallback terminator

After lowering the body, the entry block might still lack a terminator. This can happen when the source function ends with an if statement where both branches contain a return. In that case a fallback return 0 is appended:

#![allow(unused)]
fn main() {
    // If no terminator was emitted (e.g., function ends with an `if` where both
    // branches return), add a fallback return of 0 to satisfy the verifier.
    if entry.deref(ctx).get_terminator(ctx).is_none() {
        let zero = ConstantOp::new_i64(ctx, 0);
        let zero_val = zero.get_result(ctx);
        ins.append_op(ctx, &zero);
        let ret = ReturnOp::new(ctx, zero_val);
        ins.append_op(ctx, &ret);
    }
}

Lowering statements

Statements are lowered by two helpers. lower_stmts iterates a slice and delegates each item to lower_stmt, short-circuiting on the first error:

#![allow(unused)]
fn main() {
fn lower_stmts(
    ctx: &mut Context,
    ins: &mut OpInserter,
    var_map: &mut VarMap,
    stmts: &[Stmt],
) -> Result<bool> {
    let mut terminated = false;
    for stmt in stmts {
        terminated = lower_stmt(ctx, ins, var_map, stmt)?;
    }
    Ok(terminated)
}
}

Both functions return Result<bool> where true means the last op emitted was a block terminator (ReturnOp). This signal is used to suppress redundant YieldOp terminators in if / while regions.

Variable declaration

var name; or var name = expr; allocates a slot, registers it in the VarMap, and optionally stores an initial value:

#![allow(unused)]
fn main() {
        Stmt::VarDecl { name, init } => {
            let slot = DeclOp::new(ctx, i64_ty.into());
            let slot_val = slot.get_result(ctx);
            ins.append_op(ctx, &slot);
            var_map.insert(name.clone(), slot_val);

            if let Some(init_expr) = init {
                let val = lower_expr(ctx, ins, var_map, init_expr)?;
                let store = StoreOp::new(ctx, slot_val, val);
                ins.append_op(ctx, &store);
            }
            Ok(false)
        }
}

Assignment

name = expr; looks up the existing slot in VarMap and emits a StoreOp. If the variable was never declared, input_error! returns a structured error instead of panicking:

#![allow(unused)]
fn main() {
        Stmt::Assign { name, value } => {
            let val = lower_expr(ctx, ins, var_map, value)?;
            let slot = *var_map.get(name.as_str()).ok_or_else(|| {
                input_error!(
                    Location::Unknown,
                    "assignment to undeclared variable: {name}"
                )
            })?;
            let store = StoreOp::new(ctx, slot, val);
            ins.append_op(ctx, &store);
            Ok(false)
        }
}

Return

return expr; lowers the expression, emits ReturnOp, and signals that the block is now terminated:

#![allow(unused)]
fn main() {
        Stmt::Return(expr) => {
            let val = lower_expr(ctx, ins, var_map, expr)?;
            let ret = ReturnOp::new(ctx, val);
            ins.append_op(ctx, &ret);
            Ok(true) // ReturnOp is a block terminator
        }
}

If statement

if cond { then } else { else } emits an IfOp, then builds two new blocks — one for each region. Each branch is lowered recursively. A YieldOp is appended only when the branch did not end with a return:

#![allow(unused)]
fn main() {
        Stmt::If {
            cond,
            then_body,
            else_body,
        } => {
            let cond_val = lower_expr(ctx, ins, var_map, cond)?;
            let if_op = IfOp::new(ctx, cond_val);
            ins.append_op(ctx, &if_op);

            // Then region: add YieldOp only if the branch didn't terminate.
            let then_block = BasicBlock::new(ctx, None, vec![]);
            then_block.insert_at_front(if_op.then_region(ctx), ctx);
            let mut then_ins = OpInserter::new_at_block_end(then_block);
            let mut then_vars = var_map.clone();
            let then_terminated = lower_stmts(ctx, &mut then_ins, &mut then_vars, then_body)?;
            if !then_terminated {
                let then_yield = YieldOp::new(ctx);
                then_ins.append_op(ctx, &then_yield);
            }

            // Else region: add YieldOp only if the branch didn't terminate.
            let else_block = BasicBlock::new(ctx, None, vec![]);
            else_block.insert_at_front(if_op.else_region(ctx), ctx);
            let mut else_ins = OpInserter::new_at_block_end(else_block);
            let mut else_vars = var_map.clone();
            let else_terminated = lower_stmts(ctx, &mut else_ins, &mut else_vars, else_body)?;
            if !else_terminated {
                let else_yield = YieldOp::new(ctx);
                else_ins.append_op(ctx, &else_yield);
            }

            Ok(false) // IfOp itself is not a terminator in the outer block
        }
}

Note that IfOp itself is not a block terminator in the outer block, so lower_stmt_if returns Ok(false).

While loop

while cond { body } uses the memory-backed condition pattern described in the design section above. Before entering the loop, the condition is evaluated and stored in a fresh DeclOp slot. At the end of each iteration, the condition is re-evaluated and the slot is updated. A YieldOp closes the body region (if it doesn’t already end with a return):

#![allow(unused)]
fn main() {
        Stmt::While { cond, body } => {
            // Allocate the condition slot in the outer block.
            let cond_slot = DeclOp::new(ctx, i64_ty.into());
            let cond_slot_val = cond_slot.get_result(ctx);
            ins.append_op(ctx, &cond_slot);

            // Compute the initial condition and store it.
            let init_cond = lower_expr(ctx, ins, var_map, cond)?;
            let init_store = StoreOp::new(ctx, cond_slot_val, init_cond);
            ins.append_op(ctx, &init_store);

            let while_op = WhileOp::new(ctx, cond_slot_val);
            ins.append_op(ctx, &while_op);

            // Build the loop body.
            let body_block = BasicBlock::new(ctx, None, vec![]);
            body_block.insert_at_front(while_op.body_region(ctx), ctx);
            let mut body_ins = OpInserter::new_at_block_end(body_block);
            let mut body_vars = var_map.clone();
            let body_terminated = lower_stmts(ctx, &mut body_ins, &mut body_vars, body)?;

            if !body_terminated {
                // Re-evaluate the condition at the end of the body and update the slot.
                let next_cond = lower_expr(ctx, &mut body_ins, &body_vars, cond)?;
                let next_store = StoreOp::new(ctx, cond_slot_val, next_cond);
                body_ins.append_op(ctx, &next_store);
                let body_yield = YieldOp::new(ctx);
                body_ins.append_op(ctx, &body_yield);
            }

            Ok(false) // WhileOp itself is not a terminator in the outer block
        }
}

Expression statement

A bare expr; lowers the expression for its side effects and discards the resulting value:

#![allow(unused)]
fn main() {
        Stmt::Expr(expr) => {
            lower_expr(ctx, ins, var_map, expr)?;
            Ok(false)
}

Lowering expressions

lower_expr matches on the Expr variant and returns the Value that holds the computed result:

#![allow(unused)]
fn main() {
fn lower_expr(
    ctx: &mut Context,
    ins: &mut OpInserter,
    var_map: &VarMap,
    expr: &Expr,
) -> Result<Value> {
    let i64_ty = IntegerType::get(ctx, 64, Signedness::Signless);

    match expr {
        // ── integer literal ───────────────────────────────────────────────
        Expr::Integer(n) => {
            let op = ConstantOp::new_i64(ctx, *n);
            let val = op.get_result(ctx);
            ins.append_op(ctx, &op);
            Ok(val)
        }

        // ── variable reference ────────────────────────────────────────────
        Expr::Variable(name) => {
            let slot = *var_map.get(name.as_str()).ok_or_else(|| {
                input_error!(
                    Location::Unknown,
                    "reference to undeclared variable: {name}"
                )
            })?;
            let load = LoadOp::new(ctx, slot, i64_ty.into());
            let val = load.get_result(ctx);
            ins.append_op(ctx, &load);
            Ok(val)
        }

        // ── binary operation ──────────────────────────────────────────────
        Expr::BinOp { op, lhs, rhs } => {
            let lhs_val = lower_expr(ctx, ins, var_map, lhs)?;
            let rhs_val = lower_expr(ctx, ins, var_map, rhs)?;
            let kind = ast_binop_to_kind(op);
            let bin_op = BinOp::new(ctx, kind, lhs_val, rhs_val);
            let val = bin_op.get_result(ctx);
            ins.append_op(ctx, &bin_op);
            Ok(val)
        }

        // ── function call ─────────────────────────────────────────────────
        Expr::Call { callee, args } => {
            let mut arg_vals = Vec::with_capacity(args.len());
            for a in args {
                arg_vals.push(lower_expr(ctx, ins, var_map, a)?);
            }
            let callee_id = callee.as_str().try_into().expect("valid callee identifier");
            let call_op = CallOp::new(ctx, IdentifierAttr::new(callee_id), arg_vals, i64_ty.into());
            let val = call_op.get_operation().deref(ctx).get_result(0);
            let val = { val }; // reborrow to release ctx ref before append
            ins.append_op(ctx, &call_op);
            Ok(val)
    }
}
}

Integer literal

A literal is wrapped in a ConstantOp. The result value is returned:

#![allow(unused)]
fn main() {
        Expr::Integer(n) => {
            let op = ConstantOp::new_i64(ctx, *n);
            let val = op.get_result(ctx);
            ins.append_op(ctx, &op);
            Ok(val)
        }
}

Variable reference

A variable reference looks up the slot in VarMap and emits a LoadOp to read its current value. An undeclared variable is reported as an InvalidInput error using the input_error! macro:

#![allow(unused)]
fn main() {
        Expr::Variable(name) => {
            let slot = *var_map.get(name.as_str()).ok_or_else(|| {
                input_error!(
                    Location::Unknown,
                    "reference to undeclared variable: {name}"
                )
            })?;
            let load = LoadOp::new(ctx, slot, i64_ty.into());
            let val = load.get_result(ctx);
            ins.append_op(ctx, &load);
            Ok(val)
        }
}

Binary operation

Both operands are lowered recursively, then a BinOp is created with the appropriate BinOpKind attribute:

#![allow(unused)]
fn main() {
        Expr::BinOp { op, lhs, rhs } => {
            let lhs_val = lower_expr(ctx, ins, var_map, lhs)?;
            let rhs_val = lower_expr(ctx, ins, var_map, rhs)?;
            let kind = ast_binop_to_kind(op);
            let bin_op = BinOp::new(ctx, kind, lhs_val, rhs_val);
            let val = bin_op.get_result(ctx);
            ins.append_op(ctx, &bin_op);
            Ok(val)
        }
}

Function call

Arguments are lowered in order and collected into a Vec. A CallOp is then created with the callee name as an IdentifierAttr:

#![allow(unused)]
fn main() {
        Expr::Call { callee, args } => {
            let mut arg_vals = Vec::with_capacity(args.len());
            for a in args {
                arg_vals.push(lower_expr(ctx, ins, var_map, a)?);
            }
            let callee_id = callee.as_str().try_into().expect("valid callee identifier");
            let call_op = CallOp::new(ctx, IdentifierAttr::new(callee_id), arg_vals, i64_ty.into());
            let val = call_op.get_operation().deref(ctx).get_result(0);
            let val = { val }; // reborrow to release ctx ref before append
            ins.append_op(ctx, &call_op);
            Ok(val)
}

Note the reborrow trick: call_op.get_operation().deref(ctx) borrows ctx immutably, so the result val must be captured before passing call_op to append_op, which borrows ctx mutably.

Note: The general guideline around calling deref or deref_mut on Ptr<T> is to keep the borrowed value (Ref<T> or RefMut<T>) around for as little time as possible, and to avoid passing it to any function that might call back into the context.

Error handling

All lowering functions return pliron::result::Result<T>. Errors are propagated with ?. The input_error! macro creates a pliron::result::Error with kind ErrorKind::InvalidInput and an optional source location. Because the Kaleidoscope AST does not carry source positions, Location::Unknown is used throughout.

Tests

The test module defines a helper that parses a source string, lowers every function into a fresh module, verifies the IR, and prints it:

#![allow(unused)]
fn main() {
    fn lower_program(src: &str) -> String {
        let funcs = parse_program(src).expect("parse error");
        let ctx = &mut Context::new();
        let module = ModuleOp::new(ctx, "test".try_into().expect("valid module name"));
        for func in &funcs {
            let func_op = lower_function(ctx, func).expect("lowering failed");
            module.append_operation(ctx, func_op.get_operation(), 0);
        }
        verify_op(&module, ctx).expect("IR verification failed");
        format!("{}", module.get_operation().disp(ctx))
    }
}

Try it out

cargo test --example kaleidoscope -- fibonacci_from_ast --show-output
cargo test --example kaleidoscope -- factorial_from_ast --show-output
cargo test --example kaleidoscope -- inline_fibonacci_from_ast --show-output

Each test prints the lowered IR so you can inspect how the AST maps to ops.

Next step

Chapter 4 lowers the Kaleidoscope dialect into a lower-level LLVM dialect.