
under a unified 3D PDK and evaluation environment. Our Multi-
Plugin-Pin3D-Flow is implemented as a set of Python/Tcl scripts
that automate the full SP&R flow. Files are used to connect different
SP&R stages and different tools.
The primary objective of Multi-Plugin-Pin3D-Flow is to enable
fair, reproducible, and tool-agnostic implementation and benchmark-
ing of 3D ICs using existing 2D physical design infrastructures.
Looking ahead, we plan to integrate three key enhancements: (1)
heterogeneous technology integration, which enables multiple nodes
such as Nangate45 and ASAP7 to coexist across stacked tiers; (2)
backside power delivery network (BSPDN) modeling, which provides
a pathway to implement modern backside power architectures; and
(3) hybrid bonding terminal (HBT) modeling, which includes both
fast geometric-level abstractions using VIA-equivalent models and
detailed timing-aware models based on physical-only “buffer cells”
with library-based delay characterization. We invite the community to
contribute to this open initiative. Key opportunities include extending
flow stages, supporting new PDKs, refining BSPDN and HBT models,
and developing true-3D optimization plug-ins (see Section III-A).
With sustained collaborative effort, Multi-Plugin-Pin3D-Flow can
evolve into a robust, extensible foundation for both academic research
and industrial exploration of 3D IC design methodologies.
2) ORFS-agent: ORFS-agent [8] [56] seeks to automate a con-
crete, measurable part of the OpenROAD system – the OR-
AutoTuner – which determines and sets hyperparameters such as
core density for a given (PDK,circuit) pair when ORFS is run. These
hyperparameters directly determine final QoR, and a fundamental
task is to find best-possible settings within an available number of
iterations. Across two major PDKs, ORFS-agent improves over OR-
AutoTuner in terms of both Routed Wirelength and Effective Clock
Period, using 40% fewer iterations. Similar superiority is seen in
multi-objective and constrained optimization scenarios.
ORFS-agent uses the Claude Sonnet 3.5 class of models; it uses
tools to inspect and analyze the results of flow runs, and to model
them for picking a best set of hyperparameters to use in the next
iteration of flow runs. Once a sufficient number of such iterations
occurs, ORFS-agent can determine the best set of hyperparameters
for a given (PDK,circuit) pair for a given QoR metric goal. ORFS-
agent is model-agnostic and modular, with no fine-tuning: any LLM
can be slotted in, meaning that availability of a new language model
can be immediately exploited. We may view ORFS-agent as a first
step toward OpenROAD-agent, an envisioned end-to-end coding agent
within the OpenROAD ecosystem that is detailed in Section IV-C2
below. To be specific: ORFS-agent, as part of ORFS-Research, will
likely be employed as a tool to verify and optimize atop code-
level changes implemented by OpenROAD-agent, which will reside
within OpenROAD-Research.
Operationally, the agent follows a simple loop— observe, analyze,
propose, apply. It launches a diverse batch of runs, consolidates
artifacts into a table of inputs (the chosen knob values) and outputs
(the observed metrics), and then reasons with context. Here, “context”
includes PDK/testcase metadata, legal ranges and scales for each
variable, historical priors, and partial proxies when end-to-end metrics
are unavailable (e.g., using early-stage indicators when detailed
routing times out). The analysis step mirrors the workflow of a careful
data scientist: inspect distributions and trade-offs, state a hypothesis
about what variables are binding in the current regime, and turn that
hypothesis into a small set of proposed next candidates that balance
promise and diversity. The apply step materializes decisions as
concrete edits to config.mk and SDC so that runs are reproducible
inside or outside the agent loop.
Three design choices are central. (1) Context first. Decisions are
conditioned on the semantics of each variable and on process/design
realities, avoiding “blind” proposals. (2) Emulation of human experts.
The agent explains itself through an evidence trail – what was
observed, why a region of the space looks promising, and exactly
which knobs were changed – so results can be reviewed and rolled
back. (3) Modularity. The LLM is a replaceable controller, and
analysis and selection routines are swappable; this future-proofs
the approach as models, surrogates and heuristics improve. Indeed,
customizability of ORFS-agent is arguably its greatest asset – the
behavior of the agent is tunable on the level of prompts and tooling,
which means that in the advent of, e.g., a blackbox superior BO tool
which can be encoded as a function call, the agent is augmented
instead of being made obsolete.
Within the broader ML EDA context, ORFS-agent is not a new
optimizer so much as a thin, model-agnostic operator layer. Modern
RTL-to-GDS flows expose many coupled controls whose effects
depend on process, libraries and design style; exhaustive sweeps
are costly and one-size-fits-all recipes are never a best fit. In this
setting, value accrues from systems that can read evidence, respect
domain constraints, and produce small, well-motivated changes to
configuration. ORFS-agent plays that role. It is compatible with
learned surrogates, Bayesian search, or GPU-accelerated exploration,
but does not require any particular choice; the controller can be any
capable LLM that can inspect artifacts, reason about context, and
write files.
As an artifact in ORFS-Research, ORFS-agent provides a trans-
parent, reproducible baseline for LLM-in-the-loop autotuning. It
complements specialized learning components by turning raw flow
outputs into actionable next steps, and it frames evaluation in terms
that match real resource budgets: QoR versus iterations, constraint
satisfaction rate, and time-to-first acceptable layout. Furthermore,
while OpenROAD-Research contains tools that will produce the
next generation of OpenROAD-based RTL-to-GDS flows, ORFS-
agent and similar tools in ORFS-Research provide harnessing to
ensure that on the level of hyperparameters, these code changes ship
with promptness and maximal impact. In other words, the goal of
OpenROAD-agent would be to produce code diffs that impact the
RTL-to-GDS pipeline; the goal of ORFS-agent and similar tools
would be to smooth the adaptation of existing tools such as Ray
Tune to the optimization problem at hand.
IV. SERVING AN ML EDA COMMONS
The landscape of ML EDA has been rapidly growing through
efforts in dataset generation [28][29], benchmarking [31], con-
tests [24][65], leaderboards [31], open-source initiatives [67], and
cyberinfrastructure for chip design [32]. However, these efforts re-
main fragmented, often duplicating one another and facing recurring
challenges such as a lack of incentives and coordination. In this
context, the need for an ML EDA Commons has become increasingly
evident [68], [66], [69]. An ML EDA Commons focuses on advancing
ML EDA applications by providing shared resources (datasets, tools,
flows, benchmarks, metrics). A vision for an ML EDA Commons was
outlined in [6], encompassing three key pillars: (1) maturing existing
EDA infrastructure to better support ML research; (2) establishing
standards for benchmarks, metrics, and data formats through gover-
nance that involves key stakeholders; and (3) improving accessibility
and reproducibility by providing open data, tools, models, and
workflows with cloud resources – thereby lowering barriers to entry
and promoting robust research practices through artifact evaluations,
canonical evaluators, and integration pipelines.
This section reviews efforts that serve as components of such a
vision, including new datasets, benchmarks, contests, hackathons,
4