GitHub

Architecture Deep Dive

FORMA: The engineering decisions behind verifiable AI-generated code.

This is the technical architecture document for FORMA. It covers the contract system, MIR intermediate representation, verification toolchain integration, and the roadmap to formal verification for critical systems.

Build Provenance

FORMA's implementation was largely implemented with Claude Code & Opus 4.6, with code review, regression-oriented evaluation, and architectural critique performed using OpenAI GPT-5.3-Codex. This project intentionally uses a multi-model workflow: high-throughput generation paired with independent review and validation.

~35%

Fewer tokens than Rust for AI generation

Zero

Lifetime annotations—memory safety without complexity

@pre/@post

First-class contracts with quantifiers

SMT-ready

MIR architecture designed for formal proofs

The Problem

AI writes code faster than humans can review it

This is the core engineering challenge FORMA was built to solve. When AI generates thousands of lines of systems code, traditional code review becomes a bottleneck. You need machine-verifiable guarantees.

1. The trust gap

AI can generate syntactically correct code that passes type checking but violates semantic invariants. You can't review 10,000 lines of generated code line-by-line. You need to specify what the code should do, then verify that it does.

FORMA solution: Contract expressions that capture intent, verification tools that check behavior.

2. The compilation wall

In repository-level translation benchmarks, 94.8% of AI failures targeting Rust are compilation errors—with dependency resolution (unresolved imports, missing methods) accounting for 61.9% of failures. Rust's strict compiler catches errors early, but its complexity—lifetimes, trait bounds, module paths—creates a high barrier for AI-generated code.

FORMA solution: Second-class references (no lifetime syntax), batteries-included stdlib (no dependency maze), JSON error output for AI self-repair.

3. The feedback loop

When AI-generated code fails, the error messages need to be machine-readable for automated correction. Human-readable diagnostics don't help an AI agent self-repair.

FORMA solution: JSON error output, structured diagnostics, grammar export for constrained generation.

4. The verification ceiling

Runtime testing catches bugs after the fact. For critical systems—medical, financial, aerospace—you need static guarantees before deployment. Property-based testing helps, but formal verification is the goal.

FORMA solution: Contract system designed for eventual SMT solver integration and formal proofs.

Contract System Architecture

Specifications that AI can generate and machines can verify

FORMA's contract system is the foundation of its verification story. Contracts capture what code should do in a form that's both human-readable and machine-checkable.

Contract Expression Language

FORMA contracts support a rich expression language designed for specifying program behavior precisely.

# Basic preconditions and postconditions
@pre(n >= 0)
@post(result >= 1)
f factorial(n: Int) -> Int

# Old expressions - capture values at function entry
@post(result == old(balance) + amount)
f deposit(amount: Int) -> Int

# Quantifiers - express properties over collections
@pre(forall i in 0..len(arr)-1: arr[i] <= arr[i+1])  # sorted input
@post(exists i in 0..len(arr): arr[i] == target)     # element found

# Multiple preconditions for conditional constraints
@pre(amount > 0)
@pre(balance >= amount)
f withdraw(amount: Int) -> Int!Str

# Membership testing
@pre(status in ["active", "pending", "closed"])
f update_status(status: Str) -> Bool

Named Contract Patterns

Common verification patterns are captured as reusable named contracts. This reduces specification boilerplate and establishes a shared vocabulary for common guarantees.

# Built-in patterns (35 total, 6 categories)
# Numeric: @positive @nonnegative @nonzero @even @odd @divisible @bounded @in_range
# Collection: @nonempty @contains @all_positive @all_nonnegative @all_nonzero @valid_index @valid_range
# Set: @subset @superset @disjoint @equals @same_length @permutation
# Sequence: @prefix @suffix @reversed @rotated @unique
# Ordering: @sorted @sorted_desc @strictly_sorted @strictly_sorted_desc @sorted_by @partitioned @stable
# State: @unchanged @pure

# Usage
@nonempty(items)
@sorted(result)
@permutation(items, result)
f verified_sort(items: [Int]) -> [Int]
    sort_ints(items)

Contract Evaluation Strategy

Current implementation: Runtime contract checking with property-based test generation.

How it works:

  • forma verify generates random inputs satisfying preconditions
  • Executes the function and checks postconditions
  • Reports violations with counterexamples
  • Deterministic seeding (--seed) for reproducible CI runs

Why runtime first: Property-based testing provides immediate value while the formal verification infrastructure is built. The contract language is designed to be forward-compatible with SMT solver backends.

# Property-based verification with 1000 test cases
forma verify src/math.forma --examples 1000 --seed 42

# JSON output for CI integration
forma verify src --report --format json

Contract AST Representation

Contracts are first-class AST nodes, not comments or annotations that get stripped. This enables:

  • Type checking of contract expressions against function signatures
  • Extraction for documentation generation
  • Translation to verification conditions
  • Round-tripping through forma fmt
# Internal AST representation (simplified)
Contract {
    kind: Pre | Post | Invariant,
    expression: Expr,
    pattern_name: Option<String>,  // For @sorted, @pure, etc.
    span: SourceSpan,
}

MIR Architecture

The intermediate representation that enables verification

FORMA's Mid-level Intermediate Representation (MIR) is a control-flow graph designed for both execution and analysis. It's the foundation for future formal verification integration.

MIR Design Principles

Why MIR matters for verification: High-level source code is too complex for direct analysis. MIR normalizes control flow into a form suitable for both interpretation and symbolic execution.

  • Explicit control flow: All branches, loops, and early returns become explicit CFG edges
  • SSA-like properties: Variables have single assignment points within basic blocks
  • Type-erased operations: Generic code is monomorphized before MIR generation
  • Ownership tracking: Move/borrow semantics are explicit in MIR operations

MIR Structure

# MIR for a simple function (conceptual representation)
f abs(x: Int) -> Int:
    bb0:
        _1 = x
        switchInt(_1 < 0) -> [true: bb1, false: bb2]
    bb1:
        _2 = Neg(_1)
        goto -> bb3
    bb2:
        _2 = _1
        goto -> bb3
    bb3:
        return _2

Key properties:

  • Basic blocks (bb0, bb1, ...) contain sequential operations
  • Terminators (switchInt, goto, return) control flow between blocks
  • Temporaries (_1, _2, ...) are SSA-style intermediate values
  • All operations are explicit—no implicit conversions or coercions

Compilation Pipeline

FORMA's compilation pipeline produces a single MIR representation that serves both execution and verification:

┌─────────────────────────────────────────────────────────────────────┐
│                        FORMA Source Code                            │
│                    (with @pre/@post contracts)                      │
└───────────────────────────────┬─────────────────────────────────────┘
                                │ Parse
                                ▼
┌─────────────────────────────────────────────────────────────────────┐
│                         Typed AST                                   │
│              (contracts are first-class AST nodes)                  │
└───────────────────────────────┬─────────────────────────────────────┘
                                │ Lower
                                ▼
┌─────────────────────────────────────────────────────────────────────┐
│                          MIR (CFG)                                  │
│         (explicit control flow, ownership tracking)                 │
└───────────────────────────────┬─────────────────────────────────────┘
                                │ Optimize
                                ▼
┌─────────────────────────────────────────────────────────────────────┐
│                      Optimized MIR                                  │
│   (constant fold, copy propagation, dead blocks, peephole)         │
└──────────────┬────────────────────────────────────┬─────────────────┘
               │                                    │
               ▼                                    ▼
    ┌──────────────────────┐             ┌──────────────────────┐
    │     Interpreter      │             │   VCGen (planned)    │
    │   (forma run/verify) │             │    (SMT export)      │
    └──────────────────────┘             └──────────────────────┘

Key insight: The same MIR serves runtime execution today and formal verification tomorrow. An optimization pass runs between lowering and execution, eliminating redundant temporaries and simplifying control flow. No new IR needed—just a new backend that generates SMT queries instead of executing code.

Why This Architecture Enables Formal Verification

  • Contracts at AST level: Specifications are preserved through compilation, not lost as comments
  • Explicit CFG: Loop invariants and path conditions can be computed from MIR structure
  • No implicit behavior: Every operation is explicit, making symbolic execution tractable
  • Ownership in IR: Memory safety properties can be verified alongside functional correctness

Verification Toolchain

Tools that close the trust gap

FORMA's verification toolchain is designed for integration into AI-assisted development workflows. Every tool produces machine-readable output suitable for automated pipelines.

forma explain — Contract Translation

Translates contract expressions into human-readable or machine-readable explanations. This bridges the gap between formal specifications and code review.

# Human-readable explanation
$ forma explain math.forma --format human

┌─ factorial(n: Int) -> Int
│  Requires:
│    - n is at least 0
│  Guarantees:
│    - result is at least 1
└─

┌─ binary_search(arr: [Int], target: Int) -> Int?
│  Requires:
│    - [@sorted] for every i in 0..arr.len() - 1,
│      arr[i] is at most arr[i + 1]
└─

# JSON output for tooling integration
$ forma explain math.forma --format json
{
  "functions": [{
    "name": "factorial",
    "signature": "factorial(n: Int) -> Int",
    "preconditions": [{
      "expression": "n >= 0",
      "english": "n is at least 0"
    }],
    "postconditions": [{
      "expression": "result >= 1",
      "english": "result is at least 1"
    }]
  }]
}

Use case: AI generates code with contracts. Human reviews the explain output to verify that the contracts match intent, without reading implementation details.

forma verify — Property-Based Verification

Runs property-based testing against contract specifications. Generates inputs satisfying preconditions, executes functions, and checks postconditions.

# Report-style verification with deterministic seed
$ forma verify src/ --report --format human --seed 42

FORMA Verification Report

src/math.forma
  ✓ PASS factorial        contracts:2  examples:20/20
  ✓ PASS binary_search    contracts:2  examples:20/20
  ✗ FAIL quicksort        contracts:3  examples:18/20

Summary
  Functions: 3
  Verified: 2, Failures: 1
  Examples passed: 58/60

# CI-friendly JSON report
$ forma verify src --report --format json --examples 100 --seed 42
{
  "files": [{"path": "src/math.forma", "functions": [
    {"name": "factorial", "status": "PASS",
     "contract_count": 2, "examples_run": 100,
     "examples_passed": 100, "issues": []},
    {"name": "quicksort", "status": "FAIL",
     "contract_count": 3, "examples_run": 100,
     "examples_passed": 97, "issues": [...]}
  ]}],
  "summary": {
    "total_functions": 3, "verified": 2,
    "failures": 1, "total_examples": 300,
    "examples_passed": 297
  }
}

Key features:

  • Deterministic seeding for reproducible CI results
  • Configurable example count for thoroughness vs. speed tradeoff
  • Counterexample reporting with failing input details
  • JSON output for automated failure triage

forma grammar — Constrained Generation

Exports FORMA's grammar in formats suitable for constrained decoding in LLM pipelines.

# EBNF format for parser generators
$ forma grammar --format ebnf > forma.ebnf

# JSON format for LLM constrained decoding
$ forma grammar --format json > forma-grammar.json

# Integration with constrained decoding
# (pseudocode for LLM pipeline)
grammar = load_grammar("forma-grammar.json")
completion = llm.generate(
    prompt="Write a sorting function in FORMA",
    grammar_constraint=grammar
)

Why this matters: Constrained decoding ensures AI-generated code is always syntactically valid. Combined with contract verification, this creates a closed loop: generate valid syntax, verify semantic correctness.

forma check --error-format json — Machine-Readable Diagnostics

Structured error output enables AI agents to self-correct without parsing human-readable messages.

$ forma check broken.forma --error-format json
{
  "success": false,
  "errors": [{
    "file": "broken.forma",
    "line": 15,
    "column": 12,
    "end_line": 15,
    "end_column": 18,
    "severity": "error",
    "code": "TYPE",
    "message": "type mismatch: expected Int, found Str"
  }]
}

Fields designed for AI consumption:

  • line/column: Exact location for targeted fixes
  • message: Human-readable description with expected/found types
  • code: Stable error category (TYPE, PARSE, MODULE) for pattern matching
  • success: Quick pass/fail check for automation

The AI-Assisted Development Workflow

1

Constrain

Export grammar, feed to AI. Generation is syntactically valid by construction.

2

Generate

AI writes code with contracts specifying behavior.

3

Explain

Review contracts in English. Understand promises without reading implementation.

4

Verify

Run verification report. Get PASS/FAIL for every contract.

5

Ship

Trust the code because you verified the specification.

Path to Formal Verification

From property testing to mathematical proofs

FORMA's architecture was designed from day one with formal verification as the end goal. Here's where we are, what's next, and why this matters for critical systems.

What's Already Built

The foundation for formal verification is in place:

  • Contract AST nodes: Contracts are first-class, type-checked, and preserved through compilation
  • Rich expression language: Quantifiers (forall, exists), old() expressions, membership testing
  • Named patterns: @sorted, @permutation, @pure—reusable specifications
  • MIR CFG: Structured intermediate representation suitable for symbolic execution
  • Property-based verification: Runtime contract checking with counterexample generation
  • Machine-readable output: All tools produce JSON for pipeline integration

Status: The specification language and IR infrastructure are in active development. Property-based testing provides immediate value while formal verification is developed.

What's Next: SMT Solver Integration

The path from property testing to formal proofs requires three components:

1. Verification Condition Generation (VCGen)

Transform MIR + contracts into logical formulas. For each function:

  • Precondition becomes an assumption
  • Postcondition becomes an assertion to prove
  • Loop invariants (when specified) enable bounded verification
  • Path conditions encode control flow
# Source
@pre(n >= 0)
@post(result >= 1)
f factorial(n: Int) -> Int

# Generated verification condition (simplified)
∀n: Int.
  (n >= 0) →   # precondition assumed
  (factorial(n) >= 1)  # postcondition to prove

2. SMT Solver Backend

Integration with Z3 or CVC5 to discharge verification conditions:

  • Translate VCs to SMT-LIB format
  • Query solver for satisfiability of negated postcondition
  • UNSAT = postcondition always holds (proven)
  • SAT = counterexample found (violation)

3. Refinement Types (Future)

Embed specifications in the type system itself:

# Future syntax concept
type PosInt = Int where self > 0
type SortedList = [Int] where @sorted

f factorial(n: Nat) -> PosInt  # Nat = Int where self >= 0

Possible Future: Formal Verification for Critical Systems

The architecture we've built doesn't just support runtime verification—it's structured to enable formal proofs if and when they're needed. This is a potential direction, not a promise.

Where the Architecture Could Go

The MIR representation and contract system are already compatible with SMT solver integration. If formal verification becomes a priority, the path is clear:

  • Export verification conditions to SMT-LIB format
  • Generate proof certificates for audit trails
  • Enable bounded verification for critical functions

For most use cases, the current property-based verification is sufficient. But for teams that need mathematical guarantees—medical, financial, aerospace—the foundation is ready.

Why This Matters Now

AI code generation is accelerating. The question isn't whether AI will write critical systems code—it's whether that code will be verifiable.

  • Regulation is coming: EU AI Act, FDA software guidelines, and financial regulations increasingly require demonstrable correctness for automated systems
  • Scale demands automation: Human code review doesn't scale to AI generation rates. Machine-checkable proofs do.
  • The window is now: Languages designed today will be used for critical systems in 5 years. Verification must be built in, not bolted on.

Core Architectural Decisions

The engineering choices that made this possible

Decision: Second-class references instead of explicit lifetimes

The problem: In translation benchmarks, 94.8% of AI failures targeting Rust are compilation errors. While dependency resolution dominates (61.9%), lifetime and ownership errors add a second layer of complexity that AI systems struggle with.

The solution: References can be passed and used locally, but cannot be stored in data structures or returned from functions. This eliminates the need for lifetime annotations while preserving memory safety.

The tradeoff: Some patterns (self-referential structs, arena allocators) require different idioms. For AI-generated code, this is a net win—the common case is dramatically simpler.

Decision: Contracts are AST nodes, not comments

The problem: Languages that treat specifications as comments lose them during compilation. This makes verification an afterthought rather than a first-class concern.

The solution: Contracts are parsed into the AST, type-checked against function signatures, and preserved through all compilation stages.

The payoff: Contracts can be extracted for documentation, checked for consistency, translated to verification conditions, and round-tripped through the formatter.

Decision: MIR as the semantic anchor

The problem: Multiple backends (interpreter, LLVM) risk semantic drift. High-level AST is too complex for direct analysis.

The solution: A single MIR representation that both backends consume. All semantic analysis (borrow checking, type checking, contract extraction) happens before MIR generation.

The payoff: One IR to analyze, one IR to verify, one IR to execute. Future formal verification operates on the same representation as the production runtime.

Decision: Interpreter-first, LLVM-later

The problem: Native compilation is slow to iterate on. Language evolution requires fast feedback.

The solution: forma run uses the interpreter for development velocity. forma build targets LLVM for production performance.

The payoff: Fast iteration during language development, native performance when it matters. The MIR-based architecture means both paths have identical semantics.

Development Timeline

51 sprints from concept to optimizing compiler

Phase 1-2

Complete

Language Foundation + Production Features

Core syntax, type system, memory model, modules, async/await, HTTP, sockets, FFI, LSP server.

Phase 3-4

Complete

Stability + Backend Hardening

Pattern matching fixes, import resolution, formatter, REPL, MIR robustness, CLI ergonomics, LLVM pipeline.

Phase 5-6

Complete

Memory Safety + Launch Hardening

FFI safety layer, capability security model, panic hardening, transitive imports, CI coverage expansion.

Phase 7-8

Complete

LSP Expansion + Verification UX

Full LSP features (formatting, symbols, signature help, references), JSON error contracts, contract language expansion (old(), quantifiers, named patterns), forma explain and forma verify --report.

Phase 9-10

Complete

Contract Patterns + MIR Optimization + Stdlib

35 named contract patterns across 6 categories, runtime contract helpers, pattern correctness fixes (@rotated, @stable), MIR optimization pass (constant folding, copy propagation, dead block elimination, peephole), public repo hardening (CI supply-chain, action pinning, security docs), stdlib completeness (higher-order builtins: map/filter/reduce/any/all, vec_sort, str_to_float, math functions, stdlib modules).

Future

Planned

Formal Verification Integration

VCGen implementation, SMT solver backend (Z3/CVC5), proof certificate export, refinement types exploration.

Design Inspirations

Standing on the shoulders of giants

FORMA synthesizes ideas from several language traditions, combining them in a way that's optimized for AI code generation and verification.

Rust

What we took: Memory safety without garbage collection. Ownership semantics. Algebraic data types and pattern matching. Systems-level performance through LLVM.

What we changed: Eliminated lifetime annotations entirely. Second-class references provide memory safety with dramatically simpler syntax—exactly what AI code generation needs.

Python

What we took: Indentation-based block structure. Clean, readable syntax. Implicit variable declaration.

What we changed: Added static typing and compile-time safety. Kept the visual clarity while enabling the performance and correctness guarantees Python can't provide.

Dafny & Eiffel

What we took: First-class contract expressions. @pre/@post annotations. The philosophy that specifications belong in the language, not in comments.

What we changed: Designed contracts for AI consumption: machine-readable output, property-based testing as immediate value, SMT integration as the future goal.

Haskell & ML

What we took: Pipeline operators for data transformation. Functional patterns. Strong type inference.

What we changed: Made pipelines the primary composition mechanism, reducing token count and making data flow explicit and left-to-right readable.

The Synthesis

FORMA isn't just a remix of existing languages. The combination creates something new: a systems language with built-in verification that AI can actually generate correctly. Each design choice—from second-class references to pipeline syntax to contract expressions—was made with a single question in mind: does this help AI write correct code?

Competitive Landscape

How FORMA compares

Capability FORMA Rust Mojo Dafny
Memory safe
No lifetime annotations
Contract expressions (@pre/@post)
Grammar export for constrained decoding
JSON diagnostics for self-correction Partial Partial
Contract explanation (explain command)
Automated verification reports Partial
Capability-based sandboxing
Native compilation

FORMA's unique position: Combines systems-language capability (like Rust) with verification tooling (like Dafny), while being designed from the ground up for AI code generation workflows.

The bottom line

FORMA is not just another programming language. It's infrastructure for a future where AI generates critical systems code and humans verify it through specifications, not line-by-line review. The contract system, MIR architecture, and verification toolchain are the foundation. SMT integration is the next milestone. Formally verified AI-generated code is the destination.