

# Xyce Parallel Circuit Simulator

**Eric Keiter**, and the Xyce development team, et al  
Sandia National Laboratories

DARPA IDEA Workshop  
Arlington, VA  
April 13, 2017



Eric Keiter, Sandia National Laboratories  
2017 DARPA IDEA Workshop, Arlington, VA



# Acknowledgements

Xyce team



[xyce.sandia.gov](http://xyce.sandia.gov)

- Jason Verley (PI)
- Eric Keiter (Research Lead)
- Tom Russo
- Heidi Thornquist
- Rich Schiek
- Ting Mei
- Aadithya Karthik
- Sivasankaran Rajamanickam

Trilinos team

[trilinos.sandia.gov](http://trilinos.sandia.gov)



Dakota team

[dakota.sandia.gov](http://dakota.sandia.gov)





# Xyce Parallel Circuit Simulator



- Xyce: Massively Parallel circuit simulator:
  - Distributed Memory Parallel (MPI-based)
  - Unique solver algorithms
  - SPICE “Compatible”
  - Industry standard models (BSIM, PSP, EKV, VBIC, etc)
  - ADMS model compiler
- Analysis types
  - DC, TRAN, AC
  - Harmonic Balance (HB)
  - Multi-time PDE (MPDE)
  - Model order reduction (MOR)
  - Direct and Adjoint sensitivity analysis
- Sandia-specific models
  - Prompt Photocurrent
  - Prompt Neutron
  - Thermal
- Other, non-traditional models
  - Neuron/synapse
  - Reaction network
  - TCAD (PDE-based)
- Xyce Release 6.7 pending
  - Open Source!
  - GPL v3 license



<http://xyce.sandia.gov>

**Open Source Releases (starting in 2013):**  
**Versions 6.0, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6**  
**>1000 unique external downloads since 6.0.**  
**Next release (v6.7) ~May 2017**



Eric Keiter, Sandia National Laboratories  
2017 DARPA IDEA Workshop, Arlington, VA



# What Xyce Is, and Is Not

- **Xyce is: “True Spice”**

- Large, monolithic, single Jacobian matrix.
- Accurate.
- Known parallel linear solvers don’t scale perfectly.



Circuit Matrix spy plot



- **Xyce is not (currently): “Fast Spice”**

- Loosely coupled separate blocks
  - Implicit solver methods within blocks
  - Explicit methods used to couple blocks
- Table models
- Model order reduction
- Exploits circuit hierarchy
- Effective primarily for digital circuits
- less accurate than “true spice”

BTF+Hypergraph



4 processors



Sandia National Laboratories



# “True Spice” Xyce Circuit Simulation Challenges



- Need to efficiently and seamlessly simulate circuits that have a wide range of devices ( $10-10^6$ ) and complexity
  - Advanced preconditioning techniques for iterative solvers
  - Must support direct solvers for smaller circuits
- Internal Customer simulation efforts inspired many of the past and current advancements for circuit simulation



~25K unknowns

~600K unknowns\*

~680K unknowns

~430K unknowns

~2M unknowns



Eric Keiter, Sandia National Laboratories  
2017 DARPA IDEA Workshop, Arlington, VA



# Xyce

# Simulation Flow

- Parsing
  - Convert netlist file syntax to equivalent devices and network/circuit connectivity
  - Distribute devices over multiple processors
  - Determine global ordering and communication
- Device Evaluation
  - Loop through all devices for state evaluation and matrix loading
- Linear Solve
  - Sparse linear algebra and solvers used to solve linearized system
- Advanced Analysis Methods
  - Sampling: Monte Carlo, LHS (DAKOTA)



# New Linear Solver Achieves 19x Speedup for ASIC Simulation



- ShyLU is a sparse linear solver framework, based on Schur complements

(S. Rajamanickam, E. Boman, M. Heroux)

- Incorporates both direct and iterative methods
- Coarse-scale (multi-processor) and fine-scale (multi-threaded) parallelism
- Can be a subdomain solver / preconditioner or stand-alone linear solver

- In-house ASIC: 1.6M total devices, ~ 2M unknowns:
  - Xyce w/ KLU solver takes ~ **2 weeks**, w/ ShyLU solver takes ~ **1 day**
  - ShyLU: Optimal # partitions = 64; number of rows in S = 1854 (4 MPI procs)



Strong scaling of Xyce's simulation time and ShyLU linear solve time for different configurations of MPI Tasks X Threads per node on TLCC



# Xyce Model Support with ADMS

ADMS = *Automatic Device Model Synthesizer*

Verilog-A: industry standard format for new models: e.g. VBIC, Mextram, EKV, HiCUM, etc

ADMS translates Verilog-A to Xyce-compliant C/C++ code;

API automatically handles data structures, matrices, tedious details.

```

1 // Series RLC
2 // Version 1a, 1 June 04
3 // Ken Kundert
4 //
5 // Downloaded from The Designer's Guide Community
6 // (www.designers-guide.org).
7 // Taken from "The Designer's Guide to Verilog-AMS"
8 // by Kundert & Zinke. Chapter 3, Listing 14.
9
10 "include "disciplines.vams"
11
12 module series_rlc2 (p, n);
13     parameter real r=1000;
14     parameter real l=1e-9;
15     parameter real c=1e-6;
16     inout p, n;
17     electrical p, n, i;
18     branch (p, i) rl, (i, n) cap;
19
20     analog begin
21         V(rl) <+ r*I(rl);
22         V(rl) <+ ddt(l*I(rl));
23         I(cap) <+ ddt(c*V(cap));
24     end
25 endmodule

```

Run admsXyce

Verilog-A



Activities via Sacado automatic differentiation  
and to include Stochastic Expansions via Stokhos.

```

// -- code converted from analog/code block// I(
((V(p,internal1)/R))staticContributions[admsNodeID
((probeVars[admsProbeID_V_p_internal1])/instanceP
deID_internal1] -=
((probeVars[admsProbeID_V_p_internal1])/instanceP
((probeVars[admsProbeID_V_internal1_internal2])*i
internal1,internal2) <+
(CapacitorCharge))dynamicContributions[admsNodeID
(CapacitorCharge);dynamicContributions[admsNodeID
(CapacitorCharge);InductorCurrent = (probeVars[ad
V(internal2,n) <+
((L*ddt(InductorCurrent)))dynamicContributions[ad
(instancPar_L*(InductorCurrent));

```

C++ code snippet  
(actual Xyce file is 1500 lines)

# Tool Flow



# Tool Flow (More Detail, ASIC)



Tying to the theme of  
the workshop: RTL to  
GDS flow





# Extra Slides



Eric Keiter, Sandia National Laboratories  
2017 DARPA IDEA Workshop, Arlington, VA



# Linear Solver Strategy Comparison: 100K Transistor IC Problem

Original



ParMETIS+AMD



BTF+Hypergraph



4 processors

BTF+Hypergraph



8 processors

| Strategy | Method                        | Residual  | GMRES Iters | Solver Time (seconds) |
|----------|-------------------------------|-----------|-------------|-----------------------|
| 1        | Local AMD<br>ILUT<br>ParMETIS | 3.425e-01 | 500         | 302.573               |
| 2        | BTF<br>KLU<br>Hypergraph      | 3.473e-10 | 3           | 0.139                 |

Strategy 2 Scaled Speedup



# Optimization and Uncertainty Quantification (UQ)

- Xyce has a number of capabilities to support uncertainty quantification (UQ) and/or optimization.
- Mostly these come via the DAKOTA framework, or from ROL.
- General idea: given uncertainty on parameter inputs, how to estimate that on circuit outputs.
- Most simulators support “brute force” approaches based on sampling.
- Many much more sophisticated methods exist, including stochastic collocation methods, etc.



# Embedded sensitivity calculations

- Many advanced UQ techniques can use parameter derivatives.
  - Failure analysis methods
  - Regression-based Polynomial Chaos (PCE)
  - Gradient-based optimization
- Embedded sensitivities in an application code is better than relying on finite differences:
  - much faster (single forward solve)
  - much more accurate.
- Sensitivities can be useful by themselves. Consider an SEE (Single Event Effects) example:
  - Want to apply SEE mitigations (redundant circuitry) to a large IC.
  - Want to apply them selectively, only where absolutely necessary.
  - Which transistors of a circuit are most sensitive to SEE?
  - An adjoint method can determine this for you, in a single calculation → determines sensitivity of an output with respect to currents applied at each transistor.



# Parameter sensitivities

- Xyce (v6.5) now supports:
  - Direct and Adjoint steady state (.DC)
  - Direct and Adjoint transient (.TRAN)
- Parameter sensitivities (steady state):

$$\frac{dO}{dp} = \frac{\partial O}{\partial x} \left( \frac{\partial F}{\partial x} \right)^{-1} \frac{\partial F}{\partial p}$$

Handled by the Xyce expression library

Xyce Jacobian Matrix (inverse)

Provided by device models (analytic or finite difference)



$$\left( \frac{\partial F}{\partial x} \right) \frac{\partial x}{\partial p} = \frac{\partial F}{\partial p}$$

**Direct**

$$\frac{dO}{dp} = \frac{\partial O}{\partial x} \left[ \left( \frac{\partial F}{\partial x} \right)^{-1} \frac{\partial F}{\partial p} \right]$$

**Adjoint**

$$\frac{dO}{dp} = \left[ \frac{\partial O}{\partial x} \left( \frac{\partial F}{\partial x} \right)^{-1} \right] \frac{\partial F}{\partial p}$$

$$\left[ \frac{\partial O}{\partial x} \left( \frac{\partial F}{\partial x} \right)^{-1} \right] = \left( \frac{\partial F}{\partial x} \right)^{-T} \frac{\partial O}{\partial x}$$

O=objective function (scalar)  
 p=parameter (scalar)

F=residual vector

X=solution vector

$$\left( \frac{\partial F}{\partial x} \right)^T \theta^* = \frac{\partial O}{\partial x}$$

$$\theta^* = \frac{\partial O}{\partial F}$$

# Xyce devices



$$\frac{d}{dt} \vec{q}(\vec{x}(t)) + \vec{f}(\vec{x}(t), \vec{u}(t)) = \vec{0}$$



# Development Status and Plans

```

1 // Series RLC
2 // Version 1a, 1 June 04
3 // Ken Kundert
4 //
5 // Downloaded from The Designer's Guide Community
6 // (www.designers-guide.org).
7 // Taken from "The Designer's Guide to Verilog-AMS"
8 // by Kundert & Zinke. Chapter 3, Listing 14.
9
10 `include "disciplines.vams"
11
12 module series_rlc2 (p, n);
13     parameter real r=1000;           // resistance (Ohms)
14     parameter real l=1e-9;          // inductance (H)
15     parameter real c=1e-6;          // capacitance (F)
16     inout p, n;
17     electrical p, n, i;
18     branch (p, i) r1, (i, n) cap;
19
20     analog begin
21     V(r1) <+ r*I(r1);
22     V(r1) <+ ddt(l*I(r1));
23     I(cap) <+ ddt(c*V(cap));
24     end
25 endmodule

```

Verilog-A



Verilog-A (high level modeling language) Input

Run admsXyce

Ready-to-compile

C++ code



Sandia National Laboratories

NextGen (ATDM)

Kokkos



Embedded UQ

Stokhos

$$f(\theta) = \sum_{k=0}^{\infty} F_{pk} \Phi_k(\epsilon)$$

Dynamic Linking

\*.so files

Param Sensitivities

Sacado

$$\frac{d\vec{f}(\vec{x})}{dp}, \frac{d\vec{q}(\vec{x})}{dp}$$

DAE support

Sacado (Jacobian)

$$\frac{d}{dt} \vec{q}(\vec{x}(t)) + \vec{f}(\vec{x}(t), \vec{u}(t)) = \vec{0}$$

Planned

Implemented