Emergent Hardware Verification

Chapter E The Synthesizable Form of the Actor Framework

This appendix establishes the foundational technical fact the rest of the book’s hardware argument rests on: an actor is a finite state machine, finite state machines synthesize, and therefore an actor is synthesizable. Chapter 7’s automatic substrate swap, Appendix D’s spec-to-silicon continuum, and Appendix G’s emulation flow all assume this; here it is demonstrated, twice, on real gates.

The class-based actor framework Chapter 6 defines uses runtime constructs — mailbox, fork/join, virtual dispatch, dynamic allocation — that do not synthesize. It is worth being precise about what that means and what it does not. Those constructs are the simulation rendering of an actor: the conveniences a software simulator offers for executing the actor’s behavior on a workstation. They are not the actor. The actor — its local state, its single typed-message handler, its ‘WIRE wiring — is the authored knowledge, and that knowledge has more than one rendering. This appendix defines the second one: the synthesizable rendering, in which the same state machine runs as gates.

E.1 The thesis: one authored actor, rendered per substrate

An actor is local state, a handler that consumes one typed message and may publish typed messages, and a set of ‘WIRE edges declaring who receives what. That is, exactly, a finite state machine with typed input and output channels. A finite state machine is the canonical synthesizable object — it is what every RTL designer writes by hand. So the actor is synthesizable for the same reason any FSM is.

What changes between substrates is the rendering, not the actor:

  • For software simulation, the framework renders the actor as a SystemVerilog class (Chapter 6) or a C++ object (Appendix K): new(), a mailbox, a forked run() loop, virtual dispatch on the message type. These are the fast, flexible mechanisms a simulator gives you for free.

  • For FPGA, emulator, or silicon, the framework renders the same actor as synthesizable RTL: registers for the state, an always_comb next-state block for the handler, FIFOs for the mailboxes, ready/valid wires for the channels. This appendix’s Five Rules are the discipline that makes the rendering mechanical.

The renderings the book develops, and where each lives. The simulation-versus-hardware split above is the conceptual core, but the book carries the hardware side into several concrete targets, each in its own appendix and all from the same authored actor. This appendix is the anchor the others point back to:

  • Synthesizable RTL — this appendix: a counter actor synthesized and a two-stage counter chain placed and routed on iCE40 at 126.6 MHz (§E.3) and a whole verification loop, scoreboard included, synthesized to gates (§E.4).

  • The constraint solver — Appendix F: the one verification actor whose synthesizability is least obvious, the constrained-random stimulus generator’s solver, compiled to fabric on a production corpus (an open-source RISC-V instruction generator), checked both directions against the reference solver and certified in Lean. The stimulus actor of §E.4 is an LFSR; this is the same actor with a real constraint set behind it.

  • The host C++ object — Appendix K §K.5: the substrate-swap example’s four actors, rendered as C++ objects, run as the software substrate and give results bit-identical to the same four synthesized onto the fabric; the same object is also the host side of every genuine hardware seam.

  • SystemC for high-level synthesis — Appendix J §J.6: the Five Rules transfer unchanged, and because Catapult and Stratus already accept synthesizable SystemC, this is a near-term path to RTL with no new tooling.

  • The GPU kernel and the accelerator tile — Appendix M: a warp or persistent kernel, or a Cerebras/Tenstorrent tile, is one more backend for the same lowering, pointed at CUDA or an on-chip mesh in place of a netlist.

What unifies them is §E.5: the lowering from the authored actor to any of these targets is a compiler pass, not a re-authoring, and on the hardware targets it carries a formal cycle-exactness guarantee. Authoring happens once; the rendering is chosen at build time.

The author writes the actor once; the framework renders every node of the graph for whatever substrate the run targets, with no user rewrite. Pointing at the class form’s new() and fork and concluding “actors do not synthesize” is the same error as pointing at a Verilog testbench’s $display and concluding Verilog does not synthesize: it mistakes one rendering for the thing rendered. The synthesizable form below is the structural shape that was always behind the class.

The contrast that makes it leverage. A UVM component has no FSM behind its class and no hardware rendering — uvm_scoreboard, sequences, and the register layer are software with no path to gates, which is why moving them onto an emulator means throwing them away and hand-writing RTL (Chapter 7 §7.17). The actor carries its hardware rendering; the UVM component does not. That asymmetry is the whole of the methodology’s leverage at the emulation boundary, and this appendix is its engineering basis.

E.2 The Five Rules

The synthesizable rendering of an actor obeys five rules. They are the discipline that restricts the class form into RTL-emissible shape, and they are methodological, not language-specific.

  • 1. No dynamic allocation. No new(), no growing queues, no recursion. Mailbox depth is a parameter; if zero the actor accepts one message at a time directly into its registers.

  • 2. No virtual dispatch in the hot path. A class hierarchy may be used during elaboration to generate parameterized modules (the same way Chisel and Amaranth use Scala or Python generators), but at runtime the handler is a fixed FSM, not a virtual call.

  • 3. Bounded mailboxes only. Every queue has a parameterized depth set at elaboration. The synthesizable FIFO primitive is prim_fifo_sync (OpenTitan’s standard), an inferred FIFO, or any standard-library equivalent.

  • 4. Fixed-cardinality fan-out. ‘WIRE is resolved at elaboration; the topology is fixed at before_end_of_elaboration (SystemC) or generate-block resolution (SystemVerilog). Dynamic subscription/unsubscription is not synthesizable.

  • 5. Ready/valid handshake on every channel. Backpressure-aware composition; never assume a downstream actor is always ready. The handshake is what makes the topology safe under arbitrary timing — and, not incidentally, what makes every ‘WIRE edge a latency-insensitive cut a partitioner can place across an FPGA boundary at near-zero cost (§E.5; Chapter 7 §7.17).

These are the same restrictions Bluespec, Chisel, and Amaranth impose on their synthesizable forms; the actor framework is in the same family. What is unique about the actor pattern is that the rules are methodological: SystemVerilog, SystemC, Bluespec, Chisel, and Amaranth can all host an actor implementation that obeys them, and the synthesizable form is mostly the framework minus dynamic allocation, virtual-class polymorphism, and unbounded queues — which is to say, minus the parts that were only ever the simulation rendering.

E.3 Worked example one: a single actor to real silicon

The mapping is mechanical. A class-based actor in actor_pkg/ style:

class CounterActor extends Actor;
  int count;
  task act(MsgBase msg);
    count++;
    `PUBLISH(count);
  endtask
endclass

becomes a synthesizable SystemVerilog module:

module counter_actor #(parameter int unsigned MSG_W = 32)(
     input   logic                clk_i,
     input   logic                rst_ni,
     // inbound channel
     input   logic                in_valid_i,
     output logic                 in_ready_o,
     input   logic [MSG_W-1:0]    in_data_i,
     // outbound channel
     output logic                 out_valid_o,
     input   logic                out_ready_i,
     output logic [MSG_W-1:0]      out_data_o
);
     logic [MSG_W-1:0] count_q, count_d;
     logic                out_valid_q, out_valid_d;
     logic [MSG_W-1:0] out_data_q,      out_data_d;


     logic in_fire, out_fire;
     assign in_fire     = in_valid_i   && in_ready_o;
     assign out_fire = out_valid_q && out_ready_i;


     assign in_ready_o     = !out_valid_q || out_ready_i;
     assign out_valid_o = out_valid_q;
     assign out_data_o     = out_data_q;


     always_comb begin
       count_d = count_q; out_valid_d = out_valid_q; out_data_d = out_data_q;
       if (out_fire) out_valid_d = 1'b0;
       if (in_fire) begin
           count_d      = count_q + 1'b1;
           out_data_d   = count_q + 1'b1;
           out_valid_d = 1'b1;
       end
     end


     always_ff @(posedge clk_i or negedge rst_ni) begin
       if (!rst_ni) begin count_q <= '0; out_valid_q <= 1'b0; out_data_q <= '0;
       end else         begin count_q <= count_d; out_valid_q <= out_valid_d;
                              out_data_q    <= out_data_d; end
     end
endmodule

(-tikz- diagram)

Figure E.1: Structural shape of the synthesizable counter actor (counter_actor.sv). An always_comb block computes the next-state values — count_q+1 through a carry chain, plus the ready/valid handshake — and an always_ff bank of D flip-flops latches count_q, out_data_q, and out_valid_q on clk; count_q and out_valid_q feed back into the combinational logic. Yosys synthesizes one such actor to 98 standard cells; the two-instance counter_chain places and routes to 44 iCE40 LCs at 126.6 MHz.

The translation has three structural moves:

  • • Class becomes module; class state becomes registered *_q signals.

  • mailbox.get() becomes a ready/valid handshake on the input channel.

  • publish() becomes a ready/valid handshake on the output channel.

The topology composition (‘WIRE) becomes a wire connection at the parent module:

counter_actor u_stage0 (
     .clk_i, .rst_ni,
     .in_valid_i   (trig_valid_i),   .in_ready_o   (trig_ready_o), .in_data_i ('0),
     .out_valid_o (stage0_valid),    .out_ready_i (stage0_ready), .out_data_o(stage0_data)
);
counter_actor u_stage1 (
     .clk_i, .rst_ni,
     .in_valid_i   (stage0_valid),   .in_ready_o   (stage0_ready), .in_data_i (stage0_data),
     .out_valid_o (final_valid_o), .out_ready_i (final_ready_i),.out_data_o(final_data_o)
);

Two actors wired in series: first counts inbound triggers, second counts how many times the first emitted. The ‘WIRE edge in class form is now a direct wire connection. There is no dynamic dispatch, no runtime configuration; the topology is fixed at elaboration.

Synthesis through Yosys

Both shipped modules pass Verilator lint with -Wall clean (the file carries a two-line unused_in_data tie-off, elided from the listing above, so the deliberately unread in_data_i does not trip UNUSEDSIGNAL). Yosys synthesizes the chain to a gate-level netlist. From make synth:

=== counter_actor (instantiated as u_stage0 and u_stage1) ===


         47 wires             264 wire bits           8 ports
         98 cells
         30   $_AND_           1   $_NAND_            30   $_XOR_
         33   $_DFFE_PN0P_    1    $_NOT_             1    $_XNOR_
          1   $_MUX_           1   $_ORNOT_


=== counter_chain (top) ===
         10 wires              72 wire bits           7 ports
         (instantiates u_stage0 and u_stage1)

98 standard cells per actor instance: 33 DFFE flip-flops for the registers (32-bit counter plus the output-valid latch — out_data_q always carries the same value as count_q, so synthesis merges the two), 30 XOR + 30 AND + 1 XNOR for the 32-bit increment, plus a small handful of cells for the handshake decode. Real hardware. The actor pattern does not impede synthesis; it produces clean, comprehensible RTL. The counter_chain top wires two instances together — the same ‘WIRE-style composition that connects scoreboards to monitors in the architecture model, now connecting RTL stages at synthesis time.

End-to-end FPGA flow on real silicon cells

The Yosys generic synthesis result is the entry point. The full FPGA flow goes one step further: real cell-library mapping (Lattice iCE40), place-and-route (nextpnr), and timing analysis. Running make fpga drives Yosys’s synth_ice40 followed by nextpnr-ice40 place-and-route on an HX8K device.

=== counter_chain ===
   30 SB_CARRY      (carry-chain cells for the 32-bit increment)
   34 SB_DFFER      (D flip-flops with reset and clock-enable)
   38 SB_LUT4       (4-input lookup tables)
   61 wires


Info: Device utilisation:
Info:    ICESTORM_LC :    44 / 7680     ( 0% used)
Info:    ICESTORM_RAM:      0 /    32   ( 0% used)
Info:    SB_IO      :     38 /    256   (14% used)


Info: Max frequency for clock 'clk_i': 126.60 MHz (PASS at 100.00 MHz)
Info: Routing complete.
Info: Max frequency for clock 'clk_i': 157.51 MHz (PASS at 100.00 MHz)
Info: Critical path: 32-bit carry chain through the increment

Each cell has an exact correspondence to the actor’s structural shape: the 30 SB_CARRY cells implement one 32-bit +1 increment; the 34 SB_DFFER hold a 32-bit count plus the output-valid and handshake bits; the 38 SB_LUT4 implement the next-state logic and the handshake decode. The chain wires two counter instances, but stage 0’s count payload is never observed downstream (u_stage1 counts stage 0’s emissions, not its data), so synth_ice40 folds the two stages to a single counter’s worth of carry and register cells — hence 44 LCs rather than \(\sim \)88. Two actor instances composed by a wire connection synthesize to 0.6% of an HX8K, meet timing at 126.6 MHz by nextpnr’s conservative post-placement bound — 26% margin over the 100 MHz target — and 157.5 MHz once routed (the conservative bound is the number quoted throughout the book; the routed figure varies with toolchain version and seed), use no BRAM, and have a clean critical path (the carry chain through the +1 increment, exactly what the actor’s behavior implies). A bitstream produced by icepack would program an HX8K board and run at 100 MHz immediately.

The numbers are not chosen for headline appeal — they fall out of the design. A counter actor in the synthesizable form is exactly as small as the equivalent hand-written counter, because the synthesizable form is the structural shape a careful designer would write. The actor methodology adds no overhead at the silicon level.

E.4 Worked example two: the whole verification loop synthesizes

The counter is a design-side actor, and a skeptic could grant that a DUT-like actor synthesizes while doubting the verification actors — the scoreboard with its golden model and match tables, the coverage collector, the stimulus generator. Those are exactly the actors a UVM flow cannot put on hardware. So the second worked example synthesizes a complete verification loop, the scoreboard included, and runs it. It lives at appG_firesim_substrate_swap.

Four actors are wired into one fabric: a stimulus actor (a 16-bit Galois LFSR emitting a deterministic message stream), an accumulator DUT, a scoreboard actor (an independent golden accumulator, an expected-value FIFO, and a comparator with checks and fails counters), and a coverage actor (eight buckets keyed on the low payload bits). The same four actors are authored once and run two ways: as C++ software objects (the demo_actors.h rendering), and as the synthesizable RTL fabric (tb_fabric.sv) under Verilator. Both report identical results — 256 transactions, zero mismatches, full coverage — and in the hardware run the host does nothing but reset the fabric, clock it, and read the final counters. The scoreboard and coverage are on the fabric, not on the host.

Yosys lowers each actor to gates (make synth); Table E.1 gives the per-actor result.

Table E.1: Every actor in a verification loop maps to gates — the scoreboard, with its golden model, FIFO, and counters, included. Yosys generic synthesis, the substrate-swap example.
Actor Flip-flops (state) Cells
stimulus_actor (LFSR + counter) 26 49
accumulate_actor (DUT) 33 197
scoreboard_actor (golden model + FIFO + comparator) 231 869
coverage_actor (8 buckets) 8 55
whole loop \(\approx \)300

The scoreboard is the point. Its 231 flip-flops carry the golden accumulator (32), the four-deep expected-value FIFO (4\(\times \)32), the read/write pointers and occupancy, and the checks and fails counters; its 869 cells include the comparator that flags a mismatch. This is a verification actor — a checker with internal reference state — synthesized to real gates, lint-clean, and producing bit-identical results to its software rendering. The claim that “the verification environment synthesizes” is, after this example, demonstrated rather than asserted.

The same fabric onto FireSim. The example’s ./firesim/ scaffold takes the identical tb_fabric onto the open FireSim platform as a FAME-1 target, with the host reading status over a peek/poke bridge. In FireSim metasimulation the whole fabric runs in Verilator with no FPGA, bit-/cycle-exactly reproducible against an FPGA run (Appendix G §G.10). This is the runnable, laptop-provable core of Chapter 7’s automatic substrate swap: the whole graph, synthesized, on the fabric, verified before any bitstream exists.

E.5 The rendering is mechanical and cycle-exact

The two examples translate the class form to RTL by hand. That the translation can be automated, and that the automated result is provably faithful, is what lets Chapter 7 call the substrate swap automatic and Appendix D call the derivation a proof.

An actor in the synthesizable form is a primitive latency-insensitive bounded dataflow network (LI-BDN) node in the sense the open Golden Gate compiler uses (Magyar/Biancolin et al., ICCAD 2019): local state, ready/valid channels, the partial-implementation, self-cleaning, and no-extraneous-dependencies properties. Golden Gate takes arbitrary FIRRTL and emits a cycle-exact hardware model automatically, with a formal guarantee — the partial-implementation property, machine-checked per optimized model — that the emitted model’s output token history matches the source cycle for cycle. The by-hand lowering this appendix performs on a counter, and on a scoreboard, is exactly what Golden Gate performs on arbitrary RTL: a structural lowering, not a re-authoring. Where an actor models something with no direct gate form (a golden reference, a memory’s timing), the FASED precedent applies (Biancolin et al., FPGA 2019): write the model as target-time RTL and apply the same transform, binding host services behind the same handshake — which is what scoreboard_actor’s golden model already is.

The consequences ripple outward. Because the rendering is a compiler pass, “the whole graph synthesizes” is a build target, not a manual port (Chapter 7 §7.17; Appendix G §G.2). Because every channel is ready/valid (Rule 5), the ‘WIRE graph is a latency-insensitive partition plan a multi-FPGA partitioner places at near-zero cost (FireAxe, ISCA 2024). And because the lowering carries a cycle-exactness guarantee, the derivation can stand in for the equivalence proof in verification by construction (Appendix D §D.8). The same guarantee is what lets AI-driven authoring move up the flow without putting the silicon at risk: a human or a model authors the actor, and a proven compiler — not a hopeful hand-port — turns it into gates, so the creative work rises to the specification while the lowering below stays mechanical (Appendix H §H.3).

E.6 What this demonstrates and what it does not

What it demonstrates. The actor methodology is not merely a software-side verification convenience. The same authored pattern produces real hardware when rendered to the synthesizable form — and not only for design blocks: the verification actors synthesize too, the scoreboard with its golden model and FIFO included (§E.4). A team that writes architecture exploration in actors carries the structure forward into RTL, not as a port to a different language but as the same actor rendered with timing specified. Verification artifacts written against the architecture model continue to apply against the synthesizable form because the channels and message types are unchanged.

What it does not demonstrate. The examples are deliberately small. A production verification IP — a TileLink master, an alert handler, a deep register layer — in synthesizable form would be larger and would use FIFO primitives instead of single-element registers for buffering. But example E.4 retires the substantive doubt: the hard case, a stateful checker with a golden model and a comparator, is shown to synthesize, so what remains is scale, not shape. The five rules and the same handshake discipline apply to a deeper FSM and a parameterized FIFO unchanged. The end-to-end bring-up of a full SoC’s actor graph on a commercial emulator or a large FPGA board is bounded engineering on this basis (Chapter 7 §7.17), not architectural novelty.

What is on the methodology roadmap. Two adjacent steps stand out. The first is an actor-aware DSL or HLS preset (Catapult or Stratus tuned for the actor pattern) that emits the synthesizable form from the class-based description automatically; the mapping in this appendix is the specification such a tool would implement, and Golden Gate (§E.5) is the existence proof that the emission is mechanical and cycle-exact. The second is direct synthesis of the SystemC actor framework (Appendix J) through existing SystemC HLS tooling; because Catapult and Stratus already accept synthesizable SystemC, the SystemC actor port is a near-term path to RTL without new tool development, and the five rules transfer to SystemC unchanged.

The artifacts this appendix ships — a single actor synthesized and placed-and-routed on iCE40 at 126.6 MHz, and a whole verification loop (the scoreboard included) synthesized to gates and run against its software rendering with identical results — are the proof that the methodology produces real hardware, verification environment and all. Everything beyond is engineering scale, not architectural novelty.

E.7 Files

  • appE_synth/RULES.md — the five rules and the class-to-module translation table.

  • appE_synth/examples/counter_actor.sv — one synthesizable actor, 100 lines of SystemVerilog.

  • appE_synth/examples/counter_chain.sv — two-stage composition demonstrating the ‘WIRE equivalent.

  • appE_synth/Makefilemake lint runs Verilator, make synth runs Yosys, make fpga runs the iCE40 flow.

  • appG_firesim_substrate_swap/ — the whole verification loop (stimulus, DUT, scoreboard, coverage) as synthesizable actors; make synth prints the per-actor gate counts of Table E.1, make run checks the synthesized fabric against the software rendering, and ./firesim/ carries it onto FireSim.