Emergent Hardware Verification

Chapter D From Spec to Silicon: One Model, Many Languages

Chapter 6 named the deeper claim: hardware is structurally an actor system, and choosing the same model of computation in software collapses the four parallel codebases the industry carries for one design — specification, architecture model, RTL implementation, and verification environment — into one continuous flow. This appendix unfolds the seven-step continuum concretely, identifies what is runnable today and what is straight-line additional work, grounds the synthesis and emulation steps in the now-detailed Chapter 7 and Appendix G, and answers a question the chapter only implied: if the model of computation is what matters, what about the host language? SystemVerilog is one answer; C++ and SystemC are equally valid, SystemC in some respects stronger.

The continuum’s one-sentence thesis is the book’s: one authored artifact — the actor graph — crosses every boundary from specification to silicon as a re-rendering, never a rewrite, so the verification investment compounds across the flow instead of restarting at each boundary. Chapter 7 proved this at the hardest boundary, simulation to emulation, where the rewrite the rest of the industry pays is a multi-week specialist port (§7.17). This appendix shows the same property holds at every boundary of the flow, and that the property is a consequence of picking the model of computation once.

D.1 The rebuild tax the continuum removes

The status quo carries one design as a chain of artifacts, and crosses a boundary between each: an English specification becomes an architecture model becomes RTL becomes a verification testbench becomes — for emulation — a hand-written synthesizable bus-functional model and transactor, and finally a silicon bring-up harness. Each boundary is a manual rebuild in a different language with a different model of computation, and each rebuild is a fresh interpretation of intent with its own bugs and its own verification debt. The interpretation gap between these artifacts is the dominant source of late-cycle bugs: an ambiguity discovered during implementation, fixed in one artifact, forgotten in another.

The actor methodology carries one artifact across every boundary. The same act() bodies and the same ‘WIRE graph are re-rendered — as a behavioral model, as RTL, as an emulator image, as on-silicon instrumentation — by tools, not re-authored by people. Table D.1 maps each boundary’s status-quo rebuild to the actor re-rendering and names where this book demonstrates it.

Table D.1: The rebuild tax at each spec-to-silicon boundary, and the re-rendering that replaces it.

Boundary	Status-quo rebuild	Actor re-rendering	Shown in
Spec \(\to \) architecture model	English document re-coded as a C/SystemC model	the spec is the executable actor model	App. C; Steps 1–2
Architecture \(\to \) RTL	the model re-implemented as Verilog by hand	the `act()` FSM synthesized (Five Rules; Golden Gate is the existence proof)	App. E; Step 4
RTL \(\to \) verification	a UVM testbench built from scratch in class-based OOP	the same actor graph wraps the RTL; no separate testbench	App. C dv/; Step 3
Simulation \(\to \) emulation	testbench split, BFM hand-written, SCE-MI hand-plumbed, partition hand-hinted	the whole graph re-rendered to the fabric; the seam generated	Ch. 7, App. G; Step 5
Emulation \(\to \) silicon	a new bring-up / debug harness against the part	the same instrumentation over JTAG / UART transports	Step 5; App. G
Every respin	re-verify the changed RTL from scratch	the derivation is the proof (Golden Gate PI / SEC)	Step 7

The pattern across the rows is one claim made six times: at each boundary the status quo re-authors an artifact, and the actor methodology re-renders one. Re-authoring is a translation a human performs, with the interpretation gap and the verification debt that follow; re-rendering is a compiler pass or a graph walk over a declaration that already exists. The rest of the appendix walks the seven steps that make this concrete.

D.2 Step 1 – Specification as Executable Actor Code

A specification today is two things separated by an inevitable interpretation gap: a document describing externally visible behavior, and a set of register descriptions and interface protocols. The document is read by humans and approximated by RTL designers and verification engineers, each of whom rebuilds the intent in code in incompatible shapes. This is the interpretation gap the appendix opened with, at the boundary where it does the most damage.

In an actor-based methodology the specification is the actor code. The externally visible behavior of each block is the actor’s act() method written at functional level: receive this message, advance this state, publish that message. Interface contracts are the typed message structs. Cross-block protocols (alert-to-reset, escalation-to-lifecycle, watchdog-to-reset) are ‘WIRE edges between specific actors. The register-access surface is the symbolic-name RAL, generated from the same Hjson descriptions tools like reggen already consume. The specification is now executable, type-checked, and runnable end-to-end at architectural level the day it is written.

The OpenTitan example in Appendix C demonstrates this: model/ip/uart/uart_actor.sv (~150 lines) is a specification of how UART externally behaves — not how the OpenTitan UART RTL implements it. The same actor accepts inbound bytes, drives the wire, samples the wire, decodes 8-N-1 frames, and publishes each received byte. Reading that file is reading the spec. Running it is running the spec.

D.3 Step 2 – Architecture Exploration

With the specification executable, architectural questions become measurements rather than arguments.

• Latency budgets. Wire actors with timestamps; let the chip scoreboard report observed end-to-end latency for any message chain. “Alert assertion to chip-level reset takes how many cycles?” is no longer an estimate but an empirical reading.
• Throughput constraints. An actor reports its mailbox occupancy, drop count under try_put backpressure, and time-spent-blocked. Architectural bottlenecks surface immediately.
• Power envelope. Add a power-cost field to each actor’s published events; let a power-tracking actor sum activity by clock domain and by IP. Pre-RTL power estimates ground in an executable model rather than spreadsheet projections — and the same instrumentation rides the substrate swap onto the emulator at workload scale, where it becomes the activity-based power flow of Chapter 7 §7.11.
• Topology trade-offs. “What if alerts route through a hierarchical aggregator instead of flat broadcast?” is a one-actor change; rerun in seconds against the same scoreboards. The OpenTitan reset supervisor pattern (Appendix C) was prototyped this way before being committed.
• Failure injection. Disabling one actor and watching the chip scoreboard’s report tells you the system’s resilience to that subsystem failing. Every actor is a fault-injection point; no separate “fault-injection mode” to write — and the same point becomes the ISO 26262 fault-campaign harness on the emulator (Chapter 7 §7.14).

The model/ side of the OpenTitan example runs all of Earl Grey — twenty-eight peripheral IP types plus the Ibex core — as actors with the full verification framework attached, and the chip scoreboard reports cross-IP causality (alert\(\rightarrow \)reset, bite\(\rightarrow \)reset) along with per-register access counts. This is architecture exploration with verification turned on from day one.

D.4 Step 3 – Verification at Both Phases

The verification environment built in step 1 verifies the specification as it is written. When RTL arrives, the same testbench topology — scoreboards, coverage actors, RAL, supervisors — wraps the RTL block instead of the behavioral actor. Nothing else in the verification code changes.

The appC_earlgrey/dv/ip_uart/uart_dv_tb_top.sv testbench demonstrates this concretely: it drives real OpenTitan UART RTL through the framework’s symbolic-name RAL — the same RAL definitions (define_uart_ral() in uart_ral_defs.sv) used by the model side. Three writes, two reads, ten observed UART tx pin edges, all checked by the same scoreboard. The model side and the RTL side share one verification artifact; only the DUT swaps. This is the RTL\(\to \)verification row of Table D.1: where the status quo builds a UVM testbench from scratch, the actor flow reuses the architecture-exploration graph unchanged.

For chip-scale on open-source simulators, the framework’s coroutine-driven scheduler hits Verilator’s documented --timing throughput limit (§D.10). On commercial simulators (VCS, Xcelium), this limit does not exist; chip-scale runs at native speed with the same actor testbench unchanged. OpenTitan’s own DV methodology documents commercial simulators as required for class-based DV ¹; the actor framework inherits the same boundary cleanly — and the emulation step (Step 5) escapes it entirely, because on the emulator the testbench is RTL, not class-based software.

¹ OpenTitan, Design Verification Setup (doc/getting_started/setup_dv.md): “The use of advanced verification constructs such as SystemVerilog classes (on which UVM is based on [sic]) requires commercial simulators.”

D.5 Step 4 – Synthesis

Each actor’s act() state machine is a synthesizable structural FSM — the same kind RTL designers write by hand — when restricted to the synthesizable form (Appendix E). The structural shape is preserved; only the timing axis sharpens.

• Typed messages become packed structs on a fixed-width wire bundle.
• Mailboxes become FIFOs with the same bounded capacity the actor declared.
• ‘WIRE edges become elaboration-time signal connections.
• Pub/sub fan-out becomes wire fan-out.
• Backpressure (the try_publish return value) becomes a ready/valid handshake.

The translation is mechanical, and there is an automatic existence proof. Bluespec SystemVerilog performs this from rules to RTL, Chisel from Scala generators to Verilog, Amaranth from Python generators to RTL — none actor-based, but each shows that high-level concurrent specifications synthesize to gates. The stronger evidence is the open FireSim/Golden Gate stack: an actor in the synthesizable form is a primitive latency-insensitive bounded dataflow network node, and Golden Gate (ICCAD 2019) takes arbitrary FIRRTL and emits a cycle-exact hardware model automatically, with a formal partial-implementation guarantee that the emitted model matches the source cycle for cycle (Appendix E §E.5; Appendix G §G.2). Much of the actor framework’s discipline — no shared state, no recursion, fixed-cardinality fan-out, statically-bounded mailboxes — is exactly the discipline synthesizability demands; the synthesizable form is mostly the framework minus its simulation-only conveniences (Appendix E).

What is shipped. This step is no longer a roadmap promise. Appendix E synthesizes a 32-bit counter actor through Yosys and places-and-routes a two-stage chain of it on a Lattice iCE40 HX8K at 126.6 MHz — real silicon, a mechanical class-to-RTL translation. appG_firesim_substrate_swap carries the claim onto the testbench: a stimulus actor, an accumulator DUT, a scoreboard actor (golden model, expected-value FIFO, comparator), and a coverage actor are synthesized to gates (roughly 300 flip-flops for the whole loop) and run as one fabric, with results identical to the software rendering. What remains additional work is a standardized actor-to-RTL pass (an actor-DSL or a SystemC HLS preset, §D.12) that automates the manual translation Appendix E does by hand — the gap is the automation, not the feasibility.

D.6 Step 5 – FPGA Emulation

Synthesized actor RTL becomes an FPGA bitstream or an emulator image that runs at MHz speeds, and — this is the step Chapter 7 and Appendix G develop in full — the conversion from the simulation flow is automatic: the same authored graph re-renders onto the hardware substrate with no manual rewrite (the simulation\(\to \)emulation row of Table D.1). The same actor framework runs at four substrate levels, and the topology and scoreboards are identical across all four:

• Software simulation against behavioral actor DUTs (architecture exploration; Verilator at full speed).
• Software simulation against real RTL DUTs (verification; commercial simulators at full speed, Verilator at the documented --timing limit).
• Hardware execution of the synthesized actor graph — a commercial emulator (Palladium / ZeBu / Veloce) reached across a generated SCE-MI transactor, or FireSim across a generated FAME bridge (real-time performance and software bring-up; fabric clock 50–200 MHz, effective rate is that divided by the FPGA Multiple Ratio).
• Real silicon (deployment; the same actor instrumentation observed via JTAG / UART / SPI / Ethernet through the distributed-transport bridges).

The conversion is provable without hardware. FireSim metasimulation runs the FAME-transformed graph under Verilator with no FPGA, bit- and cycle-exactly reproducible against an FPGA run. The substrate-swap example’s ./firesim/ scaffold is exactly this: the whole verification loop, validated before any bitstream exists (Appendix G §G.10). The substrate underneath swaps across all four levels; the model does not, and the swap onto hardware needs no human step — only the back-end compiler and the generated seam adapter differ between a commercial emulator and FireSim (Chapter 7 §7.18; Appendix G §G.4). That seam adapter is the universal seam primitive in its hardware guise — one TransportBridgeActor declaration whose carrier (a SCE-MI transactor, a FireSim token channel, or a ZMQ link in distributed regression) is the only part that changes (Appendix G §G.6; Appendix L §L.4).

D.7 Step 6 – AI-Driven Design

AI-generated RTL today works from English specifications to Verilog. The translation crosses an enormous specification gap: English is unstructured; Verilog is structurally concurrent. The AI must reconstruct the concurrent structure from prose. Output is often broken because reconstruction is error-prone, and the prose itself is ambiguous.

When the input is actor code, the gap is substantially smaller: actor code is already structurally concurrent and already typed. The translation is shape-preserving — one structured concurrent representation to another. State machines map to state machines; mailboxes to FIFOs; ‘WIRE edges to signal connections. The AI is no longer inventing structure; it is producing the deterministic implementation of an explicit specification. This is tractable today, and as AI-RTL tooling matures around structured inputs it becomes the path of least resistance; Appendix H works the pipeline.

More importantly, AI participation shifts up the flow. Today AI is asked to do the hardest, most error-prone step (English \(\rightarrow \) Verilog) on the most ambiguous input (English specs). With actor specifications, AI assists at every step: refining the spec, exploring architecture variants, suggesting optimizations to the synthesizable mapping, generating coverage points from observed traces. The AI’s role becomes that of a collaborating engineer, not a transpiler hoping to guess right.

D.8 Step 7 – Verification by Construction

If RTL is mechanically derived (synthesized or AI-generated) from the actor specification, the derivation is the equivalence proof — provided the derivation tool itself is trusted. The verification effort spent in step 3 verified the specification, not the implementation; subsequent RTL changes cannot violate the specification because they were generated from it. This is the every-respin row of Table D.1: the status quo re-verifies each changed RTL from scratch; the actor flow inherits the verification through the derivation.

This is the same property a maintained equivalence flow offers: every RTL revision is re-proven against the reference, so the reference’s verification carries forward. The actor methodology achieves it by construction at synthesis time rather than by per-revision proof. The verification investment is amortized across every silicon respin — one specification verified, every implementation derived from it inherits the verification.

When the derivation already carries the proof. The “by construction” framing is strongest when the derivation tool is itself proven. Two concrete cases now exist. With Kami (Coq-to-Bluespec extraction) the machine-checked guarantee reaches the Bluespec level, the compiler below it trusted. And with Golden Gate (§D.5), the FIRRTL-to-hardware-model lowering carries the latency-insensitive bounded dataflow network’s partial-implementation guarantee — a formal, machine-checked statement that the derived model matches the source cycle for cycle. Where the derivation is one of these, the equivalence is not re-proved; it is inherited. Appendix E §E.5 details the Golden Gate lowering as exactly this kind of self-certifying derivation, applied per actor node.

Where SEC still earns its place. For pragmatic derivations (a hand-written actor-DSL transpiler, a SystemC HLS pass tuned for the actor pattern, or AI-assisted RTL generation), the trusted-tool assumption does not yet hold, and Sequential Equivalence Checking (SEC) between the actor specification’s reference behavior and the post-derivation RTL is still the industrial-strength check (Chapter 3 §3.5). SEC stays in the flow until the derivation pipeline is itself formally verified; the actor framework reduces the SEC workload by shrinking the gap between the two artifacts being equivalence-checked — the reference and the implementation are the same shape — but does not eliminate it. Appendix H §H.7 works this out as three regimes: the AI authors the actor and a proven compiler lowers it (testbench plus proof, no SEC needed); the AI applies the lowering itself but the result is model-checkable per node; or the AI generates RTL directly, where SEC is the anchor and the actor framework gives it the smallest possible problem.

D.9 Implementation Languages: SystemVerilog, C++, SystemC

The actor framework presented in Chapter 6 and used throughout the OpenTitan example is in SystemVerilog because that is the language the verification community uses today. SystemVerilog is one host language for the actor pattern; it is not the only one, nor necessarily the best.

C++. A pure C++ actor framework is straightforward and has industrial precedent (libcaf, the C++ Actor Framework). Each actor is a class with a typed message handler and a thread-safe queue. Concurrency is via std::thread or fibers; messages are values passed by move semantics. C++ has the broadest ecosystem of any host language: linear algebra libraries, machine-learning models, network stacks, GUI frameworks, profiling tools. A C++ actor specification is directly callable from production firmware, host-side test infrastructure, regression dashboards, and CI runners. The drawback for our use case is that C++ alone has no built-in timing model; for hardware modeling, timing must be added on top. (Appendix K §K.5 ships the pure-C++ tier; the substrate-swap example’s verification actors are written in it and reused unchanged across substrates.)

SystemC. SystemC provides exactly the missing piece: a discrete-event kernel with delta cycles, wait-based blocking primitives, sc_event for signaling, and sc_fifo for buffered communication. A SystemC actor framework maps remarkably cleanly:

• Each actor is an sc_module with local state (private members of the module class).
• Inbound mailbox is an sc_fifo; put/get are blocking with built-in scheduling.
• The actor’s run() task is an SC_THREAD that the SystemC kernel schedules.
• ‘WIRE edges are sc_port connections established at before_end_of_elaboration.
• Pub/sub fan-out uses analysis-port-style multi-export, which SystemC supports natively.
• Distributed transports (the C++ ZMQ binding, NATS C++ client, libfabric) attach without going through DPI — same process, native C++.

This mapping is so clean that SystemC users who structure their code with sc_module as the unit of computation, sc_fifo as the means of communication, and no shared module-to-module state are already writing actor code; they just have not been calling it that. The actor methodology gives a name and a discipline to a pattern many SystemC codebases approach informally.

Why SystemC may be the stronger host. For hardware-related actor work, SystemC has structural advantages over SystemVerilog:

• Native timing. SystemC’s kernel handles concurrency, blocking, and timing without the --timing co-routine overhead Verilator pays for SystemVerilog fork/mailbox/process. Chip-scale simulation with class-based actor testbenches runs at full speed on Verilator (Verilator generates SystemC modules with --sc and the SystemC kernel drives them).
• Synthesis-ready ecosystem. SystemC HLS (Catapult, Stratus) accepts a synthesizable subset of SystemC and emits RTL. A synthesizable SystemC actor maps to RTL through the existing tool chain rather than waiting for an actor-DSL — and the same FIRRTL back-end that Golden Gate uses is reachable in principle from a SystemC-to-FIRRTL path, which would close Steps 4 and 5 with one tool chain (Appendix J §J.5).
• Free interop with software. A SystemC actor specification calls into and is called from production C++. Firmware drivers can drive the same actor; the same actor can run on the host as a virtual prototype.
• Mature distributed ecosystem. ZMQ, NATS, gRPC, Apache Arrow are first-class C++ libraries. The actor_distributed_pkg bridges become normal library calls.
• Concurrency without simulator flags. SystemC schedules SC_THREAD coroutines natively — cooperative and single-threaded by the standard, but built into the language rather than enabled by a simulator-specific flag such as Verilator’s --timing.

The implications are stronger than “you could also use SystemC.” They are: a SystemC implementation of the actor framework would not need SystemVerilog actors at all for hardware verification, because the same SystemC actors that drive verification synthesize to RTL through HLS, run as virtual prototypes for firmware bring-up, and connect to distributed regression infrastructure through native C++ libraries. SystemVerilog’s role narrows to what it is uniquely good at: cycle-accurate RTL coding and SVA assertions on synthesizable designs.

Why SystemVerilog still earns its place. The current implementation in this book is in SystemVerilog for two reasons. First, it is the language the audience knows and the language the existing verification flows already use; meeting verification engineers where they are matters for the methodology to be adoptable. Second, SystemVerilog actors run inside a single simulator alongside the RTL DUT without bridges, which simplifies the worked example. A substrate-level SystemC port is worked out in Appendix J; a full parallel implementation across the sibling packages is on the methodology roadmap, not as a replacement but as the upper-end deployment that brings the synthesis and HLS continuity into reach.

The point, restated. The model of computation is the methodology contribution. The host language is an engineering choice. C++, SystemC, and SystemVerilog all support the same actor pattern; SystemC particularly so because its kernel removes the timing overhead and its HLS path closes the synthesis gap. A team whose first concern is end-to-end design flow continuity should pick SystemC; a team whose first concern is fitting the existing SystemVerilog verification ecosystem should pick SystemVerilog. The framework’s claims — typed messages, ‘WIRE topology, RAL by symbolic name, distributed transport, lifecycle and supervision discipline — hold in both.

D.10 Methodology on Verilator: An Honest Boundary

The OpenTitan example’s chip-scale RTL DV demonstrates a real boundary worth stating explicitly. SystemVerilog fork/mailbox/process::self/process::kill require Verilator’s --timing mode; --timing pays a coroutine-scheduling overhead per resumption that is invisible at IP scale (where the DUT’s per-cycle evaluation cost is small) and dominant at chip scale (where the DUT’s per-cycle cost is huge). OpenTitan’s own DV methodology document is explicit on this boundary: class-based verification constructs require commercial simulators (footnote, §D.4).

The pure-actor SystemVerilog testbench at chip scale (dv/chip/chip_actor_tb.sv) compiles cleanly against the real OpenTitan chip RTL — hundreds of modules, a few-minute build, a tens-of-megabytes simulator binary. The runtime throughput on Verilator with --timing is the documented constraint, not a framework defect. The same testbench runs at native speed on VCS or Xcelium where coroutines have first-class scheduler support, and on a SystemC host where the kernel handles scheduling without the SystemVerilog-specific overhead. The boundary also closes as the continuum advances: at Step 5 the testbench is synthesized RTL on a fabric, so the --timing cost — a property of class-based software scheduling — does not apply at all.

D.11 What the Methodology Replaces

Stated as a contrast against the four-codebase status quo:

Status quo artifact	MoC chosen	Actor-methodology equivalent
Specification document (English)	natural language	Executable actor code; `act()`, message types
C++/SystemC behavioral model	sequential / event-driven	Same actor code, different substrate
RTL implementation (Verilog/SV)	cycle-accurate FSMs	Synthesized actor form (HLS or actor-DSL)
UVM verification environment (SV)	OOP class hierarchy	Same actor code, plus scoreboard / coverage / RAL actors

The status quo column contains four artifacts in three languages, each with a different model of computation, and a manual rebuild at every boundary between them (Table D.1). The actor-methodology column has one artifact with one model of computation; the substrate underlying it changes across phases (architecture-level interpretation, RTL synthesis, FPGA emulation, silicon), but the artifact itself does not. Verification is no longer a separate codebase — it is the spec, run with checkers attached.

D.12 The Methodology Roadmap

Concretely, the steps from the framework as it ships today to the full continuum:

1. Today. Steps 1–3 are runnable end-to-end in SystemVerilog: the OpenTitan example demonstrates 28-IP architectural exploration (model/) and IP-scale RTL DV (dv/ip_uart/), and distributed regression via actor_distributed_pkg is operational across machines. Step 4 is demonstrated by Appendix E’s iCE40 counter and the substrate-swap example’s whole-loop synthesized fabric; Step 5 is demonstrated in metasimulation by the substrate-swap example and scaffolded onto FireSim.
2. Near-term. A SystemC parallel implementation of actor_pkg brings native timing, full-speed chip-scale Verilator coexistence, and the C++ ecosystem. A synthesizable subset of the SystemC framework plus a SystemC HLS path standardizes Step 4’s translation; an end-to-end FPGA-board bring-up of the appG_firesim_substrate_swap fabric closes Step 5 on real hardware.
3. Mid-term. An actor-DSL (or a SystemC HLS preset tuned for actor patterns) standardizes the synthesis path. Commercial emulators and FireSim accept actor specifications through the generated seam adapters (Appendix G); the full-SoC bring-up of an actor graph onto a commercial emulator is engineered.
4. Long-term. AI design tools accept actor specifications as their primary input format (Step 6). RTL generation becomes shape-preserving translation rather than English-to-Verilog reconstruction. Verification by construction (Step 7) becomes the default, with proven derivation pipelines (Golden Gate, Kami) carrying the equivalence guarantee and SEC covering the pragmatic remainder.

The framework as shipped serves as the foundation for all four phases; each successive phase adds tooling on top of the same model of computation, never replacing it. Choosing the model of computation correctly once carries forward through every subsequent step — which is the whole of the continuum’s claim, and the reason the verification investment compounds across the flow rather than restarting at each boundary.