# Selected Inversion for Vectorless Power Grid Verification by Exploiting Locality

Jianlei Yang Yici Cai Qiang Zhou Wei Zhao Tsinghua National Laboratory for Information Science and Technology Department of Computer Science and Technology Tsinghua University, 100084, Beijing, China yjl09@mails.tsinghua.edu.cn zhouqiang@tsinghua.edu.cn weakest3@gmail.com

Abstract-Vectorless power grid verification is a practical approach for early stage safety check without input current patterns. The power grid is usually formulated as a linear system and requires intensive matrix inversion and numerous linear programming, which is extremely time-consuming for large scale power grid verification. In this paper, the power grid is represented in the manner of domain-decomposition approach, and we propose a selected inversion technique to reduce the computation cost of matrix inversion for vectorless verification. The locality existence among power grids is exploited to decide which blocks of matrix inversion should be computed while remaining blocks are not necessary. The vectorless verification could be purposefully performed by this manner of selected inversion while previous direct approaches are required to perform full matrix inversion and then discard small entries to reduce the complexity of linear programming. Meanwhile, constraint locality is proposed to improve the verification accuracy. Experimental results show that the proposed approach could achieve significant speedups compared to previous approaches while still guaranteeing the quality of solution accuracy.

Keywords—Selected Inversion, Locality, Vectorless Power Grid Verification, IR Drop, Signal Integrity.

## I. INTRODUCTION

With the continuous scaling of process technology, a trend of decreasing supply voltages, increasing power density and tighter noise margins has been observed to have a critical impact on voltage integrity which is becoming a big concern for high performance integrated circuits design. Excessive voltage fluctuation on the power grids can result in longer circuit delays and lead to functional failures. Consequently, the safety check of the voltage integrity on power grids has become crucial in reliable chip design.

Power grid verification is traditionally performed by *simulation* approach, which requires the information of the current excitations by the circuit loadings. Once the power grid is simulated and the voltage noise at every node is determined, it is easy to check the grid safety. Total safety check using this approach requires the simulation of a comprehensive set of current patterns which will be extremely time-consuming. Furthermore, it is impracticable to verify the power grid at an early stage with uncertainty mode while the circuit behaviors are unknown. To overcome these issues, the *vectorless* verification is first proposed as a formal approach to verify power grid with uncertainty mode [1], and later many contributions

are made to further improvements. It is based on the concept of *current constraints* to capture the uncertainty of the circuit behaviors. The current constraints are a set of upper bounds on each current drawn and can be obtained from the knowledge of the overall power dissipation on each circuit block. Under these constraints, power grid verification becomes a problem for computing the worst-case voltage fluctuations at each node subject to current constraints which can cover all possible current waveforms.

There are several further research topics among the recent works. On the aspect of power grid modeling method, DC model is first considered in [1], RC model is introduced to perform transient verification in [2], and RLC model is adopted to verify power grid circuits operating at high frequencies in [3]. On the aspect of current constraints, local constraints and global constraints are first proposed in [1], transient current constrains are introduced for practical transient noise predictions in [4], and hierarchical current and power constraints are adopted for more realistic verification in [5]. Due to the extremely high complexity of the verification problem, many contributions are made to reduce the problem size for efficient verification. They include several efficient sparse matrix inverse methods, such as SPAI technique [6], AINV method [7] and  $\mathcal{H}$  Matrix approach [8]. In [9], the authors proposed a hierarchical matrix inversion algorithm to speed up the computation of each row of the inverse. And [10] reduces the number of LP problems by examining dominance relations among node voltage drops and branch currents. In [11], a node elimination approach is utilized to systematically reduce the grid size and accurately compute the upper bounds while the node safety criteria is still guaranteed. In [12], model order reduction (MOR) was adopted to reduce the complexity of formulating LP problems. Meanwhile, a convex dual algorithm was proposed to solve LP problems more efficiently in [13].

Despite significant improvements among these approaches, it is still extremely expensive to verify a large scale power grid by vectorless approach because the number of LP problems to be solved and the size of each LP problem are proportional to the size of the power grid. Therefore, more efficient approaches should be proposed to perform vectorless verification for practical use. In this paper, we propose a novel *selected inversion* technique for efficient vectorless verification by exploiting *grid locality* and *constraint locality*. This is a sufficient method that takes advantage of *locality* across the power grid. To verify each node, it can capture the current sources that are most influential on this node, and subsequently

This work was supported by the National Natural Science Foundation of China (NSFC) under Grant No.61274031.

it selectively computes the matrix inverse to formulate a reduced-size optimization problem while still guaranteeing the quality of solution accuracy. Experimental results show that the proposed approach could achieve significant improvements in runtime compared to previous approaches.

The rest of this paper is organized as follows. A brief introduction to the problem formulation and the previous approaches are provided in Section II. The details of the proposed selected inversion technique are presented in Section III. Experimental results are illustrated in Section IV, and concluding remarks are given in Section V.

## II. PRELIMINARIES

### A. Vectorless Power Grid Verification

Power grid analysis is usually performed by *Kirchoffs Law* which builds the R/RC/RCL network as linear systems. For clarity, only DC model is presented for static verification in this paper. And without loss of generality, RC and RCL model can also be formulated for transient verification. Considering a *n*-node power grid with purely resistive model, static analysis can be formulated using traditional modified nodal analysis method as the following linear system of equations [14]:

$$\mathcal{G}\mathbf{v} = \mathbf{i} \tag{1}$$

where  $\mathcal{G} \in \mathbb{R}^{n \times n}$  is the grid conductance matrix,  $\mathbf{v} \in \mathbb{R}^{n \times 1}$  is the vector of node voltage, and  $\mathbf{i} \in \mathbb{R}^{n \times 1}$  is the vector of current sources representing the underlying circuitry. By properly tackling the voltage sources, the above system can be reformulated as a revised system equation which can be solved directly to obtain the voltage drop values [1]. From the revised system equation, the vector  $\mathbf{v}$  is to represent the voltage drops of the circuit nodes.

The vectorless verification approach is proposed for early verification on power grids when the details of the underlying circuitry may not be known. Current constraints provide a way to capture the uncertainty about circuit behavior, which can be obtained from the knowledge of the design specifications, such as chip area, power budget and engineering judgments. Under this formulation of the grid, the verification problem becomes a linear optimization problem where the vector of voltage drops is maximized subject to the current constraints.

Following the previous works [1][9], we use two types of constraints: local constraints and global constraints. Local constraints are upper bounds on the individual current excitations and represent the maximum current drawn by individual cells or blocks which are connected to the grid nodes. They can be represented as

$$0 \le \mathbf{i} \le \mathbf{I}_L \tag{2}$$

where  $\mathbf{I}_L \in \mathbb{R}^{n \times 1}$  is a vector of fixed current values. Global constraints are introduced to provide an upper bound on the sum of currents drawn by groups of current sources. They are typically chosen based on some knowledge of the peak power dissipation of a group of circuit blocks. Assuming there are m global constraints, and then can be expressed as matrix form

$$0 \le \mathcal{U}\mathbf{i} \le \mathbf{I}_G \tag{3}$$

where  $\mathcal{U} \in \mathbb{R}^{m \times n}$  is an Boolean matrix that indicates which current sources are present in each global constraint, and  $\mathbf{I}_{G} \in$ 

 $\mathbb{R}^{m \times 1}$  is a vector of upper bounds on the sum of currents included in each circuit block.

## B. Problem Formulation

The vectorless verification is interested in finding the worstcase voltage variations at all the nodes, that is, obtaining upper bounds on the maximum voltage drops of each node under all the possible current patterns that satisfy the local and global constraints. Thus, the verification problem has been formulated as an optimization problem of maximum voltage noise under linear current constraints in [13]. Consider an RC power grid of *n* nodes with conductance matrix  $\mathcal{G}$  and capacitance matrix  $\mathcal{C}$ . Let  $A = \mathcal{G}$  for DC analysis mode or  $A = \mathcal{G} + \frac{\mathcal{C}}{\Delta t}$  for transient analysis mode with time step  $\Delta t$ . Given local and global current constraints with parameters  $\mathbf{I}_L$ ,  $\mathbf{I}_G$  and  $\mathcal{U}$ , solve for each node  $1 \le j \le n$ ,

$$\begin{array}{ll} \text{Maximize } v_j & \text{s.t.} \\ A\mathbf{v} = \mathbf{i}, & 0 \le \mathbf{i} \le \mathbf{I}_L, & 0 \le \mathcal{U}\mathbf{i} \le \mathbf{I}_G \end{array} \tag{4}$$

where i is the decision variables of current sources, v is the vector of corresponding voltage noises, and  $v_j$  is the *j*th element of v.

Since A is known to be an  $n \times n$  symmetric positive definite  $\mathcal{M}$  matrix, it is invertible while  $A^{-1}$  is also symmetric and satisfies  $A^{-1} \ge 0$ . Subsequently, the optimization problem can be decomposed into two sub-problems as follows [13]:

I : Compute 
$$\mathbf{c}_j$$
 by solving  $A\mathbf{x} = \mathbf{e}_j$   
II : Maximize  $v_j = \mathbf{c}_j^T \mathbf{i}$  s.t. (5)  
 $0 \le \mathbf{i} \le \mathbf{I}_L, \ 0 \le \mathcal{U}\mathbf{i} \le \mathbf{I}_G$ 

where  $\mathbf{c}_j \in \mathbb{R}^{n \times 1}$  is a vector of coefficients, and  $\mathbf{e}_j \in \mathbb{R}^{n \times 1}$ is a Boolean vector with 0s except for its *j*th element being 1. Computing  $c_i$  is equivalent to picking up the *j*th column of  $A^{-1}$ , which can be obtained by solving linear equations shown in sub-problem I. It can be considered as the sensitivity of the *j*th node to the current sources attached to the circuit nodes across the grid. The sub-problem I has been studied in other similar researches, such as equivalent resistance analysis between each two circuit nodes, and electric static discharge (ESD) analysis [15]. The problem decomposition (5) provides a friendly verification approach [13]. For verifying each node, it requires to solve linear equations once and LP problem once while both of the problem size is n. The total complexity of full grid verification is proportional to the number of nodes in the grid, which is still very challenging or even prohibitive for practical use.

## C. Previous Approaches

The primary goal is to reduce the problem size or improve the solving efficiency of sub-problem I and sub-problem II. Sparse approximate inverse (SPAI) technique is utilized to compute an approximate  $c_j$  with a small number of nonzero entries using least square methods, and then the involved linear programming problem can be much simplified and solved efficiently [6]. In the later work [9], convex dual algorithms are adopted to solve the LP problem more efficiently, and a preconditioned conjugate gradient (PCG) method is utilized to solve sub-problem I. But as we have observed, the used SPAI technique and PCG method are still relatively inefficient in verifying large scale power grids. Since it is obvious that direct linear solvers are more efficient than iterative solvers for transient simulation, such as Cholmod [16], we realize that direct solvers should also be more efficient when utilized to solve sub-problem I. Consequently, it is of great necessity to investigate novel approaches for more potential improvements in large scale verification.

#### III. SELECTED INVERSION

The idea of selected inversion originally arose from several scientific applications which need to calculate a subset of the entries of the inverse for a given matrix. Examples include examining inverse covariance matrices in uncertainty quantification [17], finding a rational approximation for Fermi-Dirac functions in the electron density function theory [18], etc. From a computational viewpoint, it is natural to develop algorithms for selected inversion that are faster than inverting the whole matrix. Especially for some model problems, when A is obtained from a finite difference discretization of a Laplacian operator or from lattice models in statistical or quantum mechanics with a local Hamiltonian [19], the specified diagonal entries of the matrix inverse can be efficiently extracted by selected inversion. Similarly, it is of great interest to take the advantage of selected inversion for verifying power grids because it is also formulated by Laplacian discretization. In this section, a selected inversion implemented on block level is demonstrated for efficient vectorless power grid verification.

## A. Methodology

The standard approach for computing  $A^{-1}$  is to first perform Cholesky factorization  $A = \mathcal{L}\mathcal{L}^T$ , where  $\mathcal{L}$  is a lower triangular matrix. Subsequently,  $A^{-1} = (\mathbf{x}_1, \mathbf{x}_2, \dots, \mathbf{x}_n)$  can be obtained by solving a number of triangular systems

$$\mathcal{L}\mathbf{y} = \mathbf{e}_j, \quad \mathcal{L}^T \mathbf{x}_j = \mathbf{y}$$
 (6)

for j = 1, 2, ..., n, where  $\mathbf{e}_j$  is the *j*th column of the identity matrix. The computational cost of such *direct inversion* is generally  $\mathcal{O}(n^3)$ , with *n* being the dimension of *A*.

However, when A is sparse symmetric positive and definite, we can exploit the sparsity structure of  $A^{-1}$  to perform selected inversion on block level by exploiting grid locality. The inverse of a non-singular sparse matrix is dense, especially for a matrix that results from a mesh structure; its inverse is almost full. However, the values of the entries among the inverse are observed to decay exponentially as one moves away from the diagonal when A is diagonally dominant and symmetric positive definite. This phenomenon can be confirmed from the sensitivity analysis among the grid. And it can be utilized to construct a specific sparsity pattern for approximate representation. Especially for the power grid with Flip-Chip package, the locality effect [20] can preserve this sparsity pattern to be more significant.

Locality means that the absorbing current sources can only trigger voltage drop within a local area around them. And beyond this area that is called *grid shell* in [20], the triggered voltage drop will attenuate to zero very fast. That is, for such a grid node, its voltage drop sensitivity is mainly influenced by the current sources attached to the nodes within the grid shell. Take a small grid with flip-chip package as an example, a strong current source is attached to the center node of a



Fig. 1. Locality effect on Flip-Chip (a strong current source attached to the center node of a sample  $100 \times 120$  power grid with  $6 \times 6$  pads array).

 $100 \times 120$  grid with uniformly distributed  $6 \times 6$  pads array while small current sources are attached to other nodes. Fig. 1 shows the voltage drop distribution triggered by this strong current source. The area that contains dominant voltage drop is around the center part of the grid. Meanwhile, the voltage drop decreases dramatically with the increasing distance from the center node. Especially for the area outside the four nearest pads array, the voltage drop is affected slightly by the strong current source. And for the area outside the second nearest pads array, the voltage drop is even much smaller. Hence, the area bounded by the nearest pads array can be considered as a local area to exploit sparsity representation.

Considering a power grid with Flip-Chip package, the conductance matrix A can be reformulated as a domaindecomposition form. As shown in Fig. 2, the power grid is partitioned into several subgrids according to the pad distribution. For each subgrid, the nodes inside the subgrid are denoted as internal nodes, and the nodes on the subgrid boundaries are referred as interface nodes or global nodes. Assuming through possible row and column permutations that the full grid is partitioned into p subgrids  $\{\Omega_k\}$  for  $k = 1, 2, \ldots, p$ , and then A has the following form

$$A = \begin{pmatrix} B_1 & & & F_1 \\ & B_2 & & F_2 \\ & & \ddots & & \vdots \\ & & & B_p & F_p \\ F_1^T & F_2^T & \cdots & F_p^T & G \end{pmatrix} := \begin{pmatrix} B & F \\ F^T & G \end{pmatrix}$$

where  $B_k \in \mathbb{R}^{n_k \times n_k}$  is the conductance matrix of kth subgrid  $\Omega_k$ ,  $F_k \in \mathbb{R}^{n_k \times n_G}$  is a negative matrix representing the conductance links between internal nodes and global interface nodes,  $G \in \mathbb{R}^{n_G \times n_G}$  is the conductance matrix representing the relationship between all global interface nodes, for  $k = 1, 2, \ldots, p$ , and  $n_G + n_B = n$  with  $n_B := \sum_{k=1}^p n_k$ , while  $B_k$  and  $F_k$  are denoted as

$$B := diag (B_1, B_2, \dots, B_p)$$
$$F^T := (F_1^T, F_2^T, \dots, F_p^T)$$

for intuitive representation. Each  $F_k$  represents a set of connections from the internal nodes of subgrid k to its global interface nodes. Define  $n_{F_k}$  as the number of nonzero columns of  $F_k$ ,



Fig. 2. Illustration of Power Grid Partitioning.

consequently it is determined by the number of these global interface nodes that are adjacent to subgrid  $\Omega_k$ , and it satisfies  $\sum_{k=1}^{p} n_{F_k} \ll n_G$  for a relatively small p.

Before we present the selected inversion approach, it will be helpful to first review the major operations involved in the block level  $\mathcal{LDL}^T$  factorization

$$A = \begin{pmatrix} I \\ H^T & I \end{pmatrix} \begin{pmatrix} B \\ G - F^T B^{-1} F \end{pmatrix} \begin{pmatrix} I & H \\ I \end{pmatrix}$$

where  $H := B^{-1}F$ , and  $S := G - F^T B^{-1}F = G - F^T H$ is known as the *Schur complement*. Taking the inverse of the above matrix yields

$$A^{-1} = \begin{pmatrix} B^{-1} + B^{-1}FS^{-1}F^{T}B^{-1} & -B^{-1}FS^{-1} \\ -S^{-1}F^{T}B^{-1} & S^{-1} \end{pmatrix}$$
$$A^{-1} := \begin{pmatrix} B^{-1} + HS^{-1}H^{T} & -HS^{-1} \\ -S^{-1}H^{T} & S^{-1} \end{pmatrix}$$
(7)

Since B is a block diagonal matrix,  $H \in \mathbb{R}^{n_B \times n_G}$  can be obtained by solving p sequences of  $n_G$  linear systems

$$B_k H_k = F_k, \quad k = 1, 2, \dots, p \tag{8}$$

where  $H^T := (H_1^T, H_2^T, \dots, H_p^T)$  and  $H_k \in \mathbb{R}^{n_k \times n_G}$ . Notice that each  $H_k$  can be interpreted as a *transformed* influence of the local inverse  $B_k^{-1}$  on the global interface nodes, and has at least  $n_{F_k}$  nonzero columns. Accordingly, the number of linear equations to be solved in (8) can be reduced from  $n_G$  to  $n_{F_k}$ for each k.

The global Schur complement S can be computed by

$$S = G - \sum_{k=1}^{p} F_k^T H_k \tag{9}$$

where each term of the sum can be regarded as a local contribution, and the sparsity of  $F_k$  can be further exploited. If the dimension  $n_G$  of Schur complement is not relatively large, the directly computing  $S^{-1}$  can be efficient because it is known that S can be generally well-approximated by a sparse matrix.

After that, the term  $HS^{-1}H^T$  in the upper-diagonal block of (7) can be obtained by representing it as

$$A^{-1} := \begin{pmatrix} B^{-1} + HL & -U \\ -L & S^{-1} \end{pmatrix}$$
(10)



Fig. 3. Illustration of the entries in  $HS^{-1}H^T$ .

where  $U = L^T$ , and thus it requires to solve a sequence of  $n_B$ linear equations of the form  $SL = H^T$  with  $L \in \mathbb{R}^{n_G \times n_B}$ , which can be carried out locally by distributing S to each subgrid and solving

$$SL_k = H_k^T, \ k = 1, 2, \dots, p$$
 (11)

by a direct method when  $n_G$  is not relatively large, where  $L^T := (L_1^T, L_2^T, \dots, L_p^T)$  and  $L_k \in \mathbb{R}^{n_G \times n_k}$ . Subsequently, the resulted HL can be similarly obtained by locally computing  $H_k L_k$  for  $k = 1, 2, \dots, p$ .

## B. Grid Locality and Constraint Locality

If the number of global nodes  $n_G$  is relatively small,  $S^{-1}$  can be formed explicitly and the sparsity can be considered to perform a well-approximate inverse. Notice that the term  $B^{-1} + HS^{-1}H^T$  in the upper-diagonal block in (7) dominates the most dimensions of  $A^{-1}$ , while  $B^{-1}$  can be obtained by inverting the conductance of each subgrid. The internal details of  $HS^{-1}H^T$  can be represented as

$$HS^{-1}H^{T} = \begin{pmatrix} H_{1}S^{-1}H_{1}^{T} & H_{1}S^{-1}H_{2}^{T} & \cdots & H_{1}S^{-1}H_{p}^{T} \\ H_{2}S^{-1}H_{1}^{T} & H_{2}S^{-1}H_{2}^{T} & \cdots & H_{2}S^{-1}H_{p}^{T} \\ \vdots & \vdots & \ddots & \vdots \\ H_{p}S^{-1}H_{1}^{T} & H_{p}S^{-1}H_{2}^{T} & \cdots & H_{p}S^{-1}H_{p}^{T} \end{pmatrix}$$

where each block  $H_k S^{-1} H_t^T$  can indicate the relationship between subgrid  $\Omega_k$  and  $\Omega_t$  for  $1 \le k, t \le p$ . And the entries in this term dominate the total number of the entries of the full inverse. But as we have observed, the entries of the most of the blocks in this term are significantly small. This phenomenon has confirmed the existence of the locality effect among the grid. Taking a power grid with 9 partitions as an example, the entries of  $HS^{-1}H^T$  can be illustrated in Fig. 3. The corresponding entries representing subgrid  $\Omega_k$  and subgrid  $\Omega_t$ is known as  $H_k S^{-1} H_t^T$ . If subgrid  $\Omega_k$  is far enough away from subgrid  $\Omega_t$ , the influence between them is significantly small, which is called grid locality and can be exploited for selected inversion. As shown in Fig. 3, if only the influence between the neighbored subgrids is considered, the entries  $H_k S^{-1} H_t^T$ of corresponding blocks are kept, otherwise there is no need to figure them out.

Even though the values of ignored entries are significantly small, the total number of ignored entries may be very large. Generally, their accumulative effects cannot be simply ignored and subsequently more considerations should be taken to improve the verification accuracy. From the viewpoint of verification, the problem can be expressed as

$$\begin{pmatrix} B & F \\ F^T & G \end{pmatrix} \begin{pmatrix} \mathbf{v}_{inner} \\ \mathbf{v}_{inter} \end{pmatrix} = \begin{pmatrix} \mathbf{i}_{inner} \\ \mathbf{i}_{inter} \end{pmatrix}$$
(12)

where  $\mathbf{v}_{inner}$  and  $\mathbf{v}_{inter}$  are the grid nodes to be verified for internal nodes of each subgrid and global interface nodes, respectively; and  $\mathbf{i}_{inner}$  and  $\mathbf{i}_{inter}$  are the current sources which are attached to internal nodes and interface nodes, respectively. By employing the represented inverse in (7), the verification problem can be reformulated as

$$\begin{pmatrix} \mathbf{v}_{inner} \\ \mathbf{v}_{inter} \end{pmatrix} = \begin{pmatrix} B^{-1} + HS^{-1}H^T & -HS^{-1} \\ -S^{-1}H^T & S^{-1} \end{pmatrix} \begin{pmatrix} \mathbf{i}_{inner} \\ \mathbf{i}_{inter} \end{pmatrix}$$

that is,  $\mathbf{v}_{inner}$  and  $\mathbf{v}_{inter}$  can be verified independently by

$$\mathbf{v}_{inner} = B^{-1}\mathbf{i}_{inner} + HS^{-1}H^{T}\mathbf{i}_{inner} - HS^{-1}\mathbf{i}_{inter}$$
$$\mathbf{v}_{inter} = -S^{-1}H^{T}\mathbf{i}_{inner} + S^{-1}\mathbf{i}_{inter}$$



Fig. 4. Constraint Locality Demonstration.

Since there are many entries of blocks in  $HS^{-1}H^{T}$ that are not kept due to the grid locality, it will result in significant errors when verifying the voltage noise. In this work, a concept of constraint locality is proposed to take this phenomenon into account for efficient verification. That is, the sensitivity of voltage drop has a locality characteristic for global constraint. To verify the internal nodes in subgrid  $\Omega_k$ , consider if the subgrid  $\Omega_t$  is far away enough from it, the corresponding entries  $H_k S^{-1} H_t^T$  will be ignored due to the grid locality. Aiming to improve the verification accuracy, the current sources attached to the internal nodes of subgrid  $\Omega_t$  are considered to be removed and attached on the interface nodes around them. The constraint locality is that, regardless of whether the current sources are attached to the internal nodes in subgrid  $\Omega_t$  or attached to the interface nodes around them, the influence of them to verify subgrid  $\Omega_k$  is almost the same. As shown in Fig. 4, the current sinks  $i_{inner}$  attached to the internal nodes are properly moved to the interface nodes around them. The most exact way to mapping them to the interface nodes is to compute  $H_k S^{-1} H_t^T$ , but as we have observed, it should be sufficient to roughly assign them as  $\hat{i}_{inner}$  on average since the constraint locality can preserve the verification accuracy.

## C. Implementation

Based on the grid locality and constraint locality, the resulted selected inversion technique is implemented as shown in Algorithm 1. The major computation cost of selected inversion is first to perform inversion for each subgrid, and subsequently

to compute the resulted H, U and X by matrix multiplication. However, their sparsity pattern should be carefully further exploited with a proper threshold for efficient implementation. The term  $\langle k \mid t \rangle$  in line 10 is to decide whether the corresponding entries of  $H_k S^{-1} H_t^T$  between subgrid  $\Omega_k$  and subgrid  $\Omega_t$  are computed or ignored. Define a parameter sense\_level to control the depth number of neighbored subgrids to be considered, that is, if abs(k-t) is no more than *sense\_level*, then  $H_k S^{-1} H_t^T$  cannot be ignored, otherwise ignored. It is worth mentioning that there is actually no need to explicitly figure out all of the entries at once and then solve the involved LP problems, which will require extremely large scale memory resources. A preferable approach is to first verify the global interface nodes, and subsequently to compute the entries of the corresponding inverse for each subgrid one by one while the involved LP problems are solved one by one.



# D. Complexity Analysis

Similar to SelInv in [18], we also present the computation advantage of the proposed selected inversion technique compared to the direct inversion method. Without loss of generality, suppose for a power grid with  $n \times n$  mesh, the total number of grid nodes is  $N = n \times n$ . And suppose that a  $m \times m$ pads array is uniformly distributed throughout the power grid, where  $m \ll n$  and denote  $M = m \times m$ . It is easy to know that the power grid can be partitioned into  $p = (m - 1)^2$  subgrids. The number of global interface nodes is  $(2mn - m^2)$ , that is, the matrix dimension of the Schur complement satisfies  $S \in \mathbb{R}^{(2mn-m^2) \times (2mn-m^2)}$ . And the matrix dimension of each subgrid is  $\frac{n^2 - (2mn - m^2)}{p} = \frac{(n-m)^2}{p} = \left(\frac{n-m}{m-1}\right)^2$ .

Suppose that the complexity of computing the Cholesky decomposition is  $\mathcal{O}(N^3)$ , and substitution of the triangular matrix is  $\mathcal{O}(N^2)$ , the total complexity of directly computing the inverse is about  $\mathcal{O}(N^3)$ . For inversion of all subgrids, the total complexity of Cholesky factorization is about  $\frac{(n-m)^6}{p^2} \propto \frac{N^3}{p^2}$ , and the total complexity of substitution is about  $\frac{(n-m)^4}{p} \propto \frac{N^2}{p}$ . For computing the inversion of Schur

TABLE I. PERFORMANCE RESULTS OF THE PROPOSED SELECTED INVERSION COMPARED WITH THE DIRECT INVERSION APPROACH. THE RUNTIME IS ILLUSTRATED IN SECOND. THE VOLTAGE DROP IS ILLUSTRATED IN mV.

| Benchmarks |     | Direct Inversion |            |               | Selected Inversion |            |               |           | Speedup               |                     |
|------------|-----|------------------|------------|---------------|--------------------|------------|---------------|-----------|-----------------------|---------------------|
| N          | M   | $T^d_{inv}$      | $T^d_{LP}$ | $v_{\rm max}$ | $T^s_{inv}$        | $T^s_{LP}$ | $E_{\rm max}$ | $E_{avg}$ | $T^d_{inv}/T^s_{inv}$ | $T^d_{LP}/T^s_{LP}$ |
| 7200       | 12  | 34.49            | 9.73       | 50.41         | 1.64               | 10.55      | 0.67          | 0.08      | 21                    | 1                   |
| 27000      | 25  | 688.94           | 142.78     | 49.71         | 26.23              | 79.65      | 3.15          | 1.16      | 26                    | 2                   |
| 34200      | 42  | 898.42           | 260.75     | 46.71         | 42.15              | 78.64      | 3.07          | 1.36      | 21                    | 3                   |
| 50600      | 56  | 2157.10          | 721.33     | 48.46         | 94.70              | 104.38     | 4.61          | 2.49      | 23                    | 7                   |
| 90000      | 100 | 6040.58          | 4003.01    | 47.18         | 329.73             | 115.18     | 2.19          | 1.50      | 18                    | 35                  |
| 140000     | 144 | 17954.62         | 14903.61   | 46.12         | 881.09             | 175.31     | 5.47          | 2.73      | 20                    | 85                  |
| 560000     | 484 | 307204.93        | 3963075.20 | 45.10         | 17162.19           | 858.86     | 4.43          | 1.07      | 18                    | 4614                |

complement, the total complexity of Cholesky factorization is about  $(2mn - m^2)^3 \propto (MN)^{1.5}$ , and the total complexity of substitution is about  $(2mn - m^2)^2 \propto MN$ . The total computation cost of matrix multiplication in step 5, step 6, step 12 and step 13 can be presented as

$$\left[\frac{(n-m)^4 (2mn-m^2)}{p} + (n-m)^2 (2mn-m^2)^2\right] \propto \frac{N^{2.5}}{p}$$

It can be observed that, the major computation cost of the selected inversion is to perform matrix inversion of each subgrid. Consequently, the total complexity of selected inversion can be roughly estimated as  $\frac{N^3}{p^2}$ . In practice, the sparsity pattern of the matrix  $B_k$ ,  $F_k$ ,  $H_k$  and  $U_k$  has been further utilized for efficient implementation to make the actual computation costs to be less than such theoretical bounds. Accordingly, the complexity of direct inversion still remains greater than selected approach. Consequently, the proposed selected inversion will be much more efficient than the direct approach when the grid locality and constraint locality can be properly exploited.

#### IV. EXPERIMENTAL RESULTS

## A. Experimental Setup

Various experiments are carried out to validate the proposing performance of the proposed selected inversion technique. Experiments are carried out on a 64-bit Linux server with Intel Xeon E5-2650 CPU@2.00GHz and 128GB RAM. Several structured power grids are generated according to the typical electrical parameters from industrial designs. The voltage suppliers/pads array are uniformly distributed throughout the power grids (Flip-Chip package), while all of the grid nodes are connected to current sources. Local current constraints are assigned according to the typical value from industrial power grids, and global current constraints are generated by scaling down the total amount of current drawn by groups of current sources. For each power grid, we specify 9 global constraints in our experiments.

The proposed selected inversion technique has been implemented by Matlab scripts for evaluation. The direct inversion approach is to compute an accurate inverse to obtain the exact upper bounds of voltage noises. The popular direct solver Cholmod [16] is employed to solve all of the involved linear equations through Cholesky factorization. And for selected inversion approach, a threshold of  $10^{-3}$  is chosen to drop the extremely small entries of the resulted solution for exploiting its sparsity pattern. Gurobi optimizer [21] is invoked to solve all of the involved LP problems for verifying the voltage noises. It should be noticed that a significant reduction in computation cost can be achieved by keeping the returned *vbasis* and *cbasis* after each LP optimization and then pass them to the next LP optimization as an advanced warm start. Since the constraints information of each LP model is the same as others, the variable basis and constraint basis could be reused for verifying different nodes while it is possible to simply update the objective coefficients.

## B. Performance Results

The comprehensive performance results of the proposed selected inversion are listed in Table I when compared with the direct inversion approach. N is the number of synthetic grid nodes. M is the number of voltage suppliers/pads.  $T_{inv}^d$ and  $T_{LP}^d$  are the runtime of direct inversion and involved LP optimization, respectively.  $T_{inv}^s$  and  $T_{LP}^s$  are the runtime of selected inversion and involved LP optimization, respectively. Since the verification accuracy is decided by sense\_level, more considerations should be taken for the implementations of selected inversion. As we have observed, it is accurate enough to set *sense\_level* as 2 for the synthetic benchmarks. Meanwhile, the drop tolerance is chosen as  $10^{-3}$  for sparsity pattern exploiting which is also accurate enough. The runtime is separately reported by inversion time and LP time for comprehensive comparison.  $v_{\rm max}$  is the exact solution of maximum voltage drop across the grid. This golden solution is computed by the direct inversion approach and listed in column 5.  $E_{\text{max}}$  and  $E_{avg}$  are the maximum solution error and the average solution error respectively between the above two approaches. As shown in Table I, the solution errors of selected inversion approach are about several mV which can be adopted for practical verification. The proposed selected inversion approach can improve the runtime efficiency both on matrix inversion and LP problem solving. For computing the matrix inversion, it can achieve about 20X speedups compared with the direct inversion approach, while both of them use Cholmod [16] as internal linear solver. And for LP problem solving, it can achieve about tens or hundreds of speedups since the computed coefficients are much more sparse while the solution accuracy is still guaranteed by constraint locality.

In addition, the proposed selected inversion is compared with the hierarchical inversion approach [9] and the performance results are listed in Table II. N is the number of synthetic grid nodes.  $\hat{N}$  is the number of grid nodes used in [9].  $T_{inv}^h$  and  $T_{LP}^h$  are the runtime of hierarchical inversion and involved LP optimization respectively which are reported in [9]. Due to different benchmarks used for them, we compare the runtime of verifying each thousand nodes while the nodes number is almost same for each pair of benchmarks.

TABLE II. PERFORMANCE RESULTS OF THE PROPOSED SELECTED INVERSION COMPARED WITH THE HIERARCHICAL INVERSION APPROACH [9]. THE RUNTIME IS ILLUSTRATED IN SECOND.

| Selected Inversion |                  |                 | Hierarchical Inversion [9] |             |            |                  |                 | Speedup                         |                               |  |
|--------------------|------------------|-----------------|----------------------------|-------------|------------|------------------|-----------------|---------------------------------|-------------------------------|--|
| N                  | $T^s_{inv,perK}$ | $T^s_{LP,perK}$ | $\hat{N}$                  | $T^h_{inv}$ | $T^h_{LP}$ | $T^h_{inv,perK}$ | $T^h_{LP,perK}$ | $T^h_{inv,perK}/T^s_{inv,perK}$ | $T^h_{LP,perK}/T^s_{LP,perK}$ |  |
| 7200               | 0.23             | 1.47            | 5875                       | 10.31       | 9.91       | 1.75             | 1.69            | 8                               | 1                             |  |
| 27000              | 0.97             | 2.95            | 22939                      | 123.60      | 106.20     | 5.39             | 4.63            | 6                               | 2                             |  |
| 34200              | 1.23             | 2.30            | 35668                      | 304.20      | 233.40     | 8.53             | 6.54            | 7                               | 3                             |  |
| 50600              | 1.87             | 2.06            | 51195                      | 636.60      | 495.00     | 12.43            | 9.67            | 7                               | 5                             |  |
| 90000              | 3.66             | 1.28            | 90643                      | 2062.80     | 1573.20    | 22.76            | 17.36           | 6                               | 14                            |  |
| 140000             | 6.29             | 1.25            | 141283                     | 5184.00     | 3816.00    | 36.69            | 27.01           | 6                               | 22                            |  |
| 560000             | 30.65            | 1.53            | 562363                     | 103680.00   | 69120.00   | 184.36           | 122.91          | 6                               | 80                            |  |

 $T^s_{inv,perK}$  and  $T^s_{LP,perK}$  are the runtime for verifying per thousand nodes by direct inversion and involved LP optimization, respectively. Accordingly,  $T^h_{inv,perK}$  and  $T^h_{LP,perK}$  are the runtime for verifying per thousand nodes by hierarchical approach, respectively. Even though disadvantage in hardware platforms and programming language, the proposed selected approach still obtains significant improvements. For matrix inverse computation, the selected approach can achieve several speedups than hierarchical approach which are listed in column 9. And for LP problem solving, the selected approach still can achieve about scores of speedups for the largest case. Consequently, it is worth noting that the proposed approach should be more efficient in verifying larger power grids, which make it scalable for large scale power grid verification.

The major advantage of the proposed selected approach is that it can significantly reduce the problem size by selectively computing the matrix inverse while the verification accuracy can still be ensured by constraint locality. For early stage power grid verification, the proposed approach can be very efficient for roughly estimating the hotspot regions as a prediction and subsequently other exact approaches can be further utilized on these regions for more accurate verification.

## V. CONCLUSIONS

In this paper, we presented a selected inversion approach for efficient vectorless verification of power grids. The original power grid is formulated in the manner of domaindecomposition approach, and grid locality is exploited to selectively compute the matrix inverse while the constraint locality is utilized to guarantee the verification accuracy. Experimental results have confirmed the efficiency of the proposed approach. The selected inversion technique has further improved the runtime efficiency which will ensure that vectorless verification can be practical for large scale power grids. Additionally, since both the proposed selected inversion algorithm and the involved LP problems can be performed in parallel at different levels, the verification problem will be conducted by exploiting parallelism in the future.

#### REFERENCES

- D. Kouroussis and F. N. Najm, "A static pattern-independent technique for power grid voltage integrity verification," in *Proc. ACM/IEEE Design Automation Conference (DAC)*, 2003, pp. 99–104.
- [2] M. Nizam, F. N. Najm, and A. Devgan, "Power grid voltage integrity verification," in *Proc. IEEE International Symposium on Low Power Electronics and Design (ISLPED)*, 2005, pp. 239–244.
- [3] N. H. A. Ghani and F. N. Najm, "Handling inductance in early power grid verification," in *Proc. IEEE/ACM International Conference on Computer-Aided Design (ICCAD)*, 2006, pp. 127–134.

- [4] X. Xiong and J. Wang, "Vectorless verification of RLC power grids with transient current constraints," in *Proc. IEEE/ACM International Conference on Computer-Aided Design (ICCAD)*, 2011, pp. 548–554.
- [5] C.-K. Cheng, P. Du, A. B. Kahng, G. K. H. Pang, Y. Wang, and N. Wong, "More realistic power grid verification based on hierarchical current and power constraints," in *Proc. ACM International Symposium* on *Physical Design (ISPD)*, 2011, pp. 159–166.
- [6] N. H. A. Ghani and F. N. Najm, "Fast vectorless power grid verification using an approximate inverse technique," in *Proc. ACM/IEEE Design Automation Conference (DAC)*, 2009, pp. 184–189.
- [7] M. Avci and F. N. Najm, "Early P/G grid voltage integrity verification," in Proc. IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2010, pp. 816–823.
- [8] W. Zhao, Y. Cai, and J. Yang, "A multilevel H-matrix-based approximate matrix inversion algorithm for vectorless power grid verification," in *Proc. 18th Asia and South Pacific Design Automation Conference (ASP-DAC)*, 2013, pp. 163–168.
- [9] X. Xiong and J. Wang, "A hierarchical matrix inversion algorithm for vectorless power grid verification," in *Proc. IEEE/ACM International Conference on Computer-Aided Design (ICCAD)*, 2010, pp. 543–550.
- [10] N. H. A. Ghani and F. N. Najm, "Power grid verification using node and branch dominance," in *Proc. ACM/IEEE Design Automation Conference* (*DAC*), 2011, pp. 682–687.
- [11] A. Goyal and F. N. Najm, "Efficient RC power grid verification using node elimination," in *Proc. Design Automation and Test in Europe* (*DATE*), 2011, pp. 257–260.
- [12] Y. Wang, X. Hu, C.-K. Cheng, G. K. H. Pang, and N. Wong, "A realistic early-stage power grid verification algorithm based on hierarchical constraints," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD)*, vol. 31, no. 1, pp. 109–120, 2012.
- [13] X. Xiong and J. Wang, "An efficient dual algorithm for vectorless power grid verification under linear current constraints," in *Proc. ACM/IEEE Design Automation Conference (DAC)*, 2010, pp. 837–842.
- [14] T.-H. Chen and C. C.-P. Chen, "Efficient large-scale power grid analysis based on preconditioned Krylov-subspace iterative methods," in *Proc. ACM/IEEE Design Automation Conference (DAC)*, 2001, pp. 559–562.
- [15] J. Rommes and W. H. A. Schilders, "Efficient methods for large resistor networks," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD)*, vol. 29, no. 1, pp. 28–39, 2010.
- [16] Timothy A. Davis, Cholmod in Suitesparse 4.1.2. [Online]. Available: http://www.cise.ufl.edu/research/sparse/SuiteSparse
- [17] C. Bekas, A. Curioni, and I. Fedulova, "Low cost high performance uncertainty quantification," in *Proc. 2nd Workshop on High Performance Computational Finance*, 2009, pp. 1–8.
- [18] L. Lin, C. Yang, J. C. Meza, J. Lu, L. Ying, and W. E, "SelInv—An algorithm for selected inversion of a sparse symmetric matrix," ACM *Transactions on Mathematical Software*, vol. 37, no. 4, 2011.
- [19] J. M. Tand and Y. Saad, "Domain-decomposition-type methods for computing the diagonal of a matrix inverse," *SIAM Journal on Scientific Computing*, vol. 33, no. 5, pp. 2813–2847, 2011.
- [20] E. Chiprout, "Fast flip-chip power grid analysis via locality and grid shells," in *Proc. IEEE/ACM International Conference on Computer-Aided Design (ICCAD)*, 2004, pp. 485–488.
- [21] Gurobi Optimizer 5.0.2, Gurobi Optimization Inc. [Online]. Available: http://www.gurobi.com/download/gurobi-optimizer