# Alleviating Through-Silicon-Via Electromigration for 3-D Integrated Circuits Taking Advantage of Self-Healing Effect

Yuanqing Cheng, Member, IEEE, Aida Todri-Sanial, Member, IEEE, Jianlei Yang, Member, IEEE, and Weisheng Zhao, Senior Member, IEEE

Abstract—Three-dimensional integration is considered to be a promising technology to tackle the global interconnect scaling problem for terascale integrated circuits (ICs). Three-dimensional ICs typically employ through-silicon-vias (TSVs) to vertically connect planar circuits. Due to its immature fabrication process, several defects, such as void, misalignment, and dust contamination, may be introduced. These defects can significantly increase current densities within TSVs and cause severe electromigration (EM) effects, which can degrade the reliability of 3-D ICs considerably. In this paper, we propose an effective framework to mitigate EM effect of the defective TSV. At first, we analyze various possible TSV defects and their impacts on EM reliability. Based on the observation that EM can be significantly alleviated by self-healing effect, we design an EM mitigation module to protect defective TSVs from EM. To guarantee EM mitigation efficiency, we propose two defective TSV protection schemes, i.e., neighbor sharing and global sharing. Experimental results show that the global-sharing scheme performs the best and can improve the EM mean time to failure by more than 70× on average with only 0.7% area overhead and less than 0.5% performance degradation compared with naked design without any EM protection.

*Index Terms*—3-D integrated circuits (ICs), electro migration (EM), reliability, self-healing effect, through-silicon-via (TSV).

#### I. INTRODUCTION

WITH the continuous technology scaling, chip integration density keeps on increasing sharply. Billions of transistors can be built within a single chip. As a consequence, power consumption on chip also rockets up. At the same time, supply voltage decreases gradually for each technology generation. Thus, current density on chip elevates fast. High current

Manuscript received September 8, 2015; revised December 21, 2015 and January 31, 2016; accepted March 9, 2016. Date of publication April 20, 2016; date of current version October 21, 2016. This work was supported in part by the Beijing Natural Science Foundation under Grant 4154076, in part by the Beijing Municipal of Science and Technology under Grant D15110300320000, and in part by the National Natural Science Foundation of China under Grant 61401008.

Y. Cheng and W. Zhao are with the School of Electrical and Information Engineering, Beihang University, Beijing 100191, China (e-mail: yuanqing@ieee.org; weisheng.zhao@buaa.edu.cn).

A. Todri-Sanial is with the Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier, Centre National de la Recherche Scientifique, University of Montpellier, Montpellier 34095, France (e-mail: aida.todri@lirmm.fr).

J. Yang is with the Electrical and Computer Engineering Department, University of Pittsburgh, Pittsburgh, PA 15260 USA (e-mail: jiy64@pitt.edu). Color versions of one or more of the figures in this paper are available

online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TVLSI.2016.2543260 density may induce significant electromigration (EM) effect, which severely threatens the chip operation reliability [1], [2].

EM is caused by mass transport within metal interconnects. When current flows in the metal line, electrons collide with metal atoms and drag them away from their original positions. As a result, voids generate within the region where metal atoms are dragged away while hillocks form where they aggregate together. Void introduces open defect and hillock causes short with the neighboring interconnects.

For traditional metal interconnects, the following equation is usually used for the estimation of interconnect mean time to failure (MTTF) due to EM [1]:

$$MTTF = A \cdot J^{-n} \cdot e^{\frac{Q}{k \cdot T}}$$
(1)

where A is a constant, depending on the interconnect fabrication technology, J is the interconnect current density, Q is the activation energy for EM, k is the Boltzmann constant, and T is the temperature in Kelvin. As the technology node enters a deep submicrometer regime, EM is becoming a severe challenge for VLSI designers due to the rocketing up current density.

On the other hand, as semiconductor feature size continues to shrink, global interconnects become major performance bottlenecks as they cannot scale at the same rate as transistors. The recently widely investigated 3-D integration is considered to be one of the most promising techniques to mitigate the above problem [3]. By stacking planar dies and connecting them with vertical through-siliconvias (TSVs), the chip performance and form factor can be improved dramatically [4], [5]. Furthermore, 3-D integrated circuits (ICs) enable disparate technologies, such as phase change random access memory, magnetic random access memory, and CMOS, to be integrated together without changing the fabrication process a lot [6], [7].

However, 3-D ICs also face several challenging issues. Among them, EM occurrence on a TSV threatens the reliability of 3-D ICs [8], [9]. It is caused by several factors. First, due to mismatch of thermal coefficients of a TSV filling material (e.g., copper) and surrounding oxide layer, TSVs may suffer from stress and strain and break down during repetitive thermal cycling [8]. Second, current densities of 3-D ICs are much larger than their 2-D counterparts as integration density and power consumption increase [10]. High current density imposes high current flows in TSVs, which

1063-8210 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.

can lead to EM issues. Furthermore, as TSV fabrication is immature, defects, such as TSV void, misalignment, and bonding surface contamination, may be introduced during fabrication [11], [12]. Different from failed TSVs, defective TSVs, which are functional without cutting off signal path completely, are usually not replaced by redundant TSVs in testing procedure. However, compared with normal TSVs, they suffer much severe EM effect and are prone to fail in a short lifetime, as shown in Section II. More importantly, those defective TSVs determine the whole chip MTTF due to bucket effect (i.e., the whole chip reliability is determined by the lifetime of the weakest part). Therefore, it demands an in-depth investigation on EMs impact on TSV lifetime and requires some EM mitigation techniques to enhance TSV and chip reliability, especially for defective TSVs.

There are some papers focusing on TSV EM modeling and mitigation techniques, such as [9], [13], and [14]. However, most of them focus on EM modeling and its impact on electrical and mechanical properties of TSV, lacking of a holistic method to mitigate EM effect. On the other hand, although there are already a lot of researching efforts on EM mitigation for 2-D interconnects, such as [1] and [15]–[17], they cannot be applied directly due to different electrical characteristics and fabrication mechanisms between 2-D interconnects and TSVs.

In this paper, we propose a framework to alleviate EM effects of defective TSVs at circuit and architecture level. The framework takes advantage of self-healing effect, which is observed in [18], stating that bidirectional currents can alleviate EM effect better than dc currents as alternating current flows can cancel out EM effect caused by each other. Therefore, the reliability of the metal line can be enhanced effectively [17]. We explore to use it to mitigate EM effect on defective TSVs. First, we analyze the relationship between various TSV defects and EM-induced TSV MTTF degradation, and observe that defective TSVs can suffer from more severe EM than normal ones. Then, we propose a framework to protect defective TSVs and improve EM MTTF by balancing their current flow directions, such that EM can be mitigated by self-healing effect. It consists of two stages: offline defective TSV identification and online EM mitigation. To improve the effectiveness of defective TSV protection, TSVs are divided into groups and EM mitigation circuit is configured to each group. Then, we propose three different defective TSV protection schemes, i.e., static protection, neighbor sharing, and global sharing. The first scheme protects defective TSVs with local EM mitigation modules in the group while the latter two try to share spare EM mitigation modules among groups through switching network to reduce hardware overhead. When the defective TSV is connected to the module, it will change current flows within the TSV alternatively. Therefore, TSV immunity to EM can be enhanced. Experimental results show that our proposed framework can improve TSV and the whole chip EM MTTF dramatically, especially when global sharing is adopted, which can improve EM MTTF by more than  $70 \times$  with 0.7% area overhead and less than 0.5% performance degradation on average. The experimental results validate the effectiveness and efficiency of our framework.



Fig. 1. TSV defects: void, misalignment, and bonding surface contamination.

The rest of this paper is organized as follows. Section II investigates the relationship between TSV defects and EM-induced MTTF, which motivates this paper. Section III presents the framework to identify defective TSVs and mitigate their EM effects by three protection schemes. Experimental results are shown in Section IV. Section V presents the related work and Section VI concludes this paper.

## II. ELECTROMIGRATION IMPACT ON DEFECTIVE TSVs

### A. EM Effect on TSVs With Different Functionalities

Since normal TSVs have strong immunity to EM effect due to their larger sizes compared with metal interconnects (several micrometers versus several tens of nanometers), we assume only defective TSVs to be taken into consideration for EM protection in this paper. In addition, EM effect on defective TSVs strongly depends on TSVs' functionalities. Considering single end TSV, the loading capacitor can only be charged or discharged through one end. The amount of currents injected can balance those flowing out naturally. Therefore, this type of TSV interconnection will not suffer from EM effect. In contrast, bidirectional TSV data bus can suffer from EM effect. We use an example to delineate this point. As shown in Fig. 1, we assume that A and B are two ends of a TSV signal line and the initial state of TSV is 0. Then, if A sends 1 to B, the current will flow from A to B to charge TSV. Next, if B sends 0 to A, current will keep on flowing from A to B to discharge TSV. As a result, for bidirectional TSV data bus, current flow direction depends on data patterns transmitted and can suffer from EM effect. As bidirectional TSV data bus is widely used for data transmission between different cache levels or between cache and memory [19], [20], it is imperative to mitigate EM effect on such kind of bus. In addition to bidirectional TSV data bus, power delivery TSVs can also suffer from EM due to large dc current flowing through them continuously. However, as power TSVs are usually wider than signal TSVs, making them stronger in terms of EM effect [21], the EM mitigation of power delivery TSV is out of the scope of this paper.

As shown in (1), EM MTTF is strongly dependent on current density. The current density J of TSV can be computed as follows:

$$J = \frac{C \cdot V_{\rm dd}}{S} \cdot f \cdot p \tag{2}$$

where C is the TSV capacitance,  $V_{dd}$  is the supply voltage, S is the cross-sectional area of TSV, f is the clock frequency, and



Fig. 2. EM effect on bidirectional TSV data bus. (a) Initial status of TSV data bus. (b) A sends 1 to B. (c) B sends 0 to A.



Fig. 3. Normalized TSV EM MTTF and TSV conductive area variations with different void sizes.

*p* is the signal's switching activity. Equations (1) and (2) are used to derive EM MTTF of defective TSVs in our following evaluations. We assume TSV diameter is 5  $\mu$ m and its aspect ratio is 5:1 according to [22].<sup>1</sup> The filling material is copper, and silicon dioxide isolates copper from substrate. The size of bonding pad is assumed to be 6  $\mu$ m × 6  $\mu$ m [22]. TSV is fabricated by a via-last process. Three common types of TSV defects, i.e., TSV void, misalignment of TSV bonding pads, and dust contamination on the bonding surface shown in Fig. 2, are examined.

#### B. EM Occurrence Due to Void

During TSV filling process, due to anisotropic metal deposition on the sidewall and center of TSV, void may be formed in the TSV [11]. It reduces the effective cross-sectional area of TSV and increases current density according to (2). Increased current density can elevate EM effect suffered by TSV. Therefore, MTTF of TSV with voids may degrade according to (1).

Using the aforementioned TSV feature size, we calculate TSV MTTF under different void sizes. The result is plotted in Fig. 3. The *x*-axis denotes void size, the left *y*-axis denotes corresponding MTTF value, which is normalized to that of TSV without void, and the right *y*-axis represents TSV conductive area. Fig. 3 indicates that TSV MTTF decreases



Fig. 4. Normalized TSV EM MTTF versus misalignment error.



Fig. 5. Right-angle current is formed due to misalignment error. (a) Without misalignment. (b) With misalignment.

rapidly with void size increasing. For instance, when void size exceeds 3.5  $\mu$ m, MTTF reduces over 50%. Thus, void defect can degrade TSV immunity to EM significantly.

#### C. EM Occurrence Due to Bonding Pad Misalignment

To stack tiers together, a bonding process is required for 3-D ICs. In order to guarantee bonding yield, each TSV is attached with a pad. TSVs on different tiers can be bonded together using bonding pads. Taking face-to-back bonding style as an instance, bonding pads of TSVs in the upper tier will be matched with those in the bottom tier, as shown in Fig. 1. However, bonding process may incur some errors due to limited alignment accuracy of the bonding equipment. As a result, TSV bonding pad from the upper tier may not be aligned accurately with that of the bottom tier [12]. Misalignment can reduce conductive area of TSV, as shown in Fig. 1. Therefore, current density increases and EM MTTF reduces according to (1) and (2). We calculate TSV EM MTTF values based on different misalignment errors. The result is shown in Fig. 4. In Fig. 4, the x(y)-axis denotes misalignment error in the X(Y) direction in the unit of micrometer. The z-axis represents an MTTF value, which is normalized to that of TSV without misalignment. As shown in Fig. 4, MTTF decreases rapidly as misalignment error increases. Note that the curved surface becomes discontinuous when misalignment error exceeds TSV diameter due to the formation of right-angle current flow, as shown in Fig. 5, which can aggravate EM effect abruptly [16]. Therefore, MTTF degradation induced by TSV misalignment should also be considered for reliability enhancement.

<sup>&</sup>lt;sup>1</sup>However, the effectiveness of our work does not depend on specific TSV feature size. As the technology node scales, we believe that defective TSV will face more severe reliability issues. The TSV size used here is only for illustration purpose.



Fig. 6. Normalized TSV EM MTTF/TSV conductive area versus dust size.



Fig. 7. Workflow of our proposed TSV EM mitigation framework.

### D. EM Occurrence Due to Contamination

During TSV fabrication, dusts floating around environment may contaminate TSV bonding surface, which reduces effective cross-sectional area of TSV and degrades TSV MTTF, as indicated by (1) and (2). We plot the normalized TSV EM MTTF versus TSV conductive area with different dust sizes as in Fig. 6. It shows that MTTF reduces quickly with dust size increasing. When it exceeds 4.5  $\mu$ m in our case, MTTF reduces over 50%.

In summary, it shows that defective TSVs can suffer more severe EM effect than normal TSVs. Due to bucket effect, 3-D chip lifetime is determined by those defective TSVs instead of normal ones. Therefore, it is imperative to protect them from EM, such that chip reliability can meet the design specification.

## III. OUR PROPOSED FRAMEWORK TO ALLEVIATE ELECTROMIGRATION EFFECT FOR DEFECTIVE TSVs

### A. Overview of Our Framework

In this section, we propose an EM mitigation framework to protect defective TSVs, taking an advantage of a selfhealing mechanism. The workflow of our framework shown in Fig. 7 contains two stages, i.e., offline identification of defective TSVs and online EM mitigation. In the first stage, we can take advantage of some existing research efforts to identify defective TSVs and obtain the defect map. Then, an on-chip switching network connects defective TSVs to EM mitigation modules. In the second stage, the EM mitigation modules monitor current flows within these TSVs and balance current flow directions in time to alleviate EM by selfhealing effect. Section III-B will discuss the defective TSV identification. Section III-C will present the EM mitigation module circuit design. Section III-D illustrates the design of



Fig. 8. Resistance variations caused by different TSV defects. (a) TSV resistance versus TSV void size. (b) TSV resistance versus misalignment error. (c) TSV resistance versus dust size.

switching network connecting EM mitigation modules and defective TSVs.

#### B. Defective TSV Identification

As mentioned in Section II, defects affect the effective conducting cross-sectional area of TSV. It can increase current density and elevate EM. On the other hand, defects also introduce resistance variations due to variations of conducting area of TSV, as shown in Fig. 8. Since all these defects can increase TSV resistance, we can identify these defects easily by detecting TSV resistance variations. In this paper, we identify a TSV as defective if its resistance is more than  $5 \times$ larger than that of normal TSV, which implies that EM MTTF of the defective TSV falls down by more than  $10 \times$  compared with the normal TSV referring to Figs. 3, 4, and 6. We adopt the TSV test structure proposed in [23] for defective TSV identification, which is shown in Fig. 9. In Fig. 9, V<sub>ref</sub> is set at a threshold voltage, according to the normal TSV resistance value. Then, we can apply the voltage dividing principle to sense potential difference between TSV under test and  $V_{ref}$ . If it exceeds the threshold voltage, the test result indicating a defective TSV will be latched to a scan register. Then, we can





TABLE I Relationship Between Current Flow and Data Pattern [17]

| Original Bus State | Data Transmission     | Current Flow                       |
|--------------------|-----------------------|------------------------------------|
| .0,                | '1' A $\rightarrow$ B | $\mathbf{A} \to B$                 |
|                    | '1' A $\leftarrow$ B  | $\mathbf{A} \leftarrow \mathbf{B}$ |
|                    | <b>'</b> 0'           | -                                  |
| '1'                | '0' A $\rightarrow$ B | $\mathbf{A} \leftarrow B$          |
|                    | '0' A $\leftarrow$ B  | $A \rightarrow B$                  |
|                    | '1'                   | _                                  |

determine TSV defect map, which will be used in the EM mitigation stage.

#### C. TSV EM Mitigation Module Structure

The EM mitigation circuit monitors current flows within defective TSVs and tries to balance their directions, such that EM effect can be alleviated by self-healing effect [24]. Assume two ends of a TSV are A and B, respectively, as shown in Fig. 2. Depending on the data transmitted previously and data to be transmitted, the current flow direction within TSV can be derived based on Table I. The circuit used to identify the current flow direction of a TSV can be implemented by several logic gates and a decoder [17], as shown in the left part of Fig. 10. The current direction balance circuit is also shown in Fig. 10. When the chip is powered ON, the counter loads a preset value, which is the midvalue of its counting range.

Referring to Table I, if current direction is from A to B (in our case, it means current flows from the top tier to the bottom tier), the counter increases by 1. The counter decreases by 1 and vice versa. When the counter overflows or approaches to zero, it indicates that the current has flowed along a specific direction for a long time and needs to be reversed for self-healing. Then, the output of OR gate enables the inverter to change the signal value, such that the current flow is prevented from flowing along that direction again. Sender signal controls whether sending data path or receiving data path is used. There is the same circuit module residing on the other layer. If send buffer goes through the inverter path in top tier,



Fig. 10. Online EM mitigation circuit structure.



Fig. 11. Switching network connecting the defective TSV to EM mitigation circuit.

the signal also goes through the inverter path in bottom tier, so the inverted signal can be recovered. The clock signal from on-chip clock network is used to provide synchronization between EM mitigation modules residing on both tiers.

Since the counter is used to monitor current flows within TSV, the number of bits within the counter determines the time interval to alternate current flow direction (activate self-healing). If it is small, the circuit is activated very often and can balance current flows in a short time interval but much power is consumed. Otherwise, more counter bits will reduce the frequency of invoking current balancing, which result in lower power consumption but incurs larger area overhead. In our case, we find that a 10-bit counter can achieve the optimal tradeoff. In Section IV, we will discuss the power and area overhead of the EM mitigation module in detail.

## D. Interconnections Between EM Mitigation Modules and Defective TSVs

1) Allocating EM Mitigation Module Statically for Defective TSVs: To protect defective TSVs from EM, an intuitive method was proposed in [25]. The working procedure is described as follows. During chip design stage, TSVs are divided into several groups. One or more EM mitigation circuit modules are allocated for each group. Depending on which TSV is defective referring to defect map derived from testing, and the EM module is connected to it through a cross-bar switch within the group, as shown in Fig. 11. Note that TSV group size and the number of EM mitigation modules available within the group depend on TSV defective rate,<sup>2</sup> and is a tradeoff between area overhead and EM reliability enhancement. If TSV group size is large and only few EM modules are allocated to each group, it would be highly possible that some defective TSVs may not be protected due to lack of EM mitigation modules within the group. On the other hand, if TSV group size is small and more EM modules are allocated for each group, all defective TSVs within each group will have higher possibility to be protected but hardware overhead will be larger compared with the previous case. In experimental results, we will take this scheme as a baseline and compare it with neighbor-sharing and global-sharing schemes, which will be introduced shortly, in terms of MTTF enhancement, hardware overhead, and so on.

2) EM Module Sharing Among Neighboring TSV Groups: Although the above solution can protect defective TSVs within a single group, it will be failure if the number of EM mitigation modules in one group is smaller than defective TSV count. This problem can be solved by allocating more EM mitigation modules for each TSV group. However, it will introduce huge hardware overhead. In general, considering the number of defective TSVs only occupies a very small fraction of total TSV count, and there should be some spare EM mitigation modules in other TSV groups. Consequently, we can explore to share EM mitigation modules among groups. Then, the possibility of all defective TSVs being protected can increase without incurring unacceptable hardware overhead. As shown in Fig. 12(a), all TSV groups can be organized in a networkon-chip-like topology. Therefore, not only modules within the group but also those in neighborhood can be used for protecting defective TSVs in the group. In fact, this is a kind of a module-sharing scheme, and we call it neighbor sharing.

The working procedure of neighbor sharing is described as follows. First, defective TSVs are first repaired by EM mitigation modules within the group as stated above. If there are more defective TSVs requiring protection, the direct neighboring groups are considered. For the ease of implementation, the search order is fixed as north  $\rightarrow$  east  $\rightarrow$  south  $\rightarrow$  west. The spare EM mitigation module in corresponding neighbor group will be used to protect the defective TSV. In order to support the intergroup sharing, switching network in each group should be revised slightly, as shown in Fig. 12(b). Through neighbor sharing, it is expected to have higher protection rate compared with the first scheme.

3) EM Mitigation Module Global-Sharing Scheme: The drawback of neighbor-sharing strategy is that it is a locally optimal solution without the global awareness of available EM modules and may not be effective if there is no EM module available in direct neighborhood. We give an example to illustrate this point. Fig. 13 shows the solution using neighbor sharing. Because of fixed searching direction, EM mitigation



Fig. 12. (a) Switching network architecture for EM mitigation module sharing. (b) Detailed EM mitigation module-sharing architecture in each TSV group.



Fig. 13. Illustrative example showing the drawback of neighbor-sharing scheme. Defective TSVs 2 and 4 cannot be repaired in this case.

module A is first used for protecting defective TSV 1. Then, TSV 3 is protected by module C. Although there are still two spare modules B and D left, they cannot be used to protect TSVs 2 and 4.

In order to improve protection rate, an intuitive method should permit any spare EM modules in any group to protect defective TSVs in any other group. However, this new method will introduce a new problem. Using EM modules from the group further away will introduce extra wire delay that may degrade data transmission latency. Therefore, how many hops can be tolerated should be set as a constraint to avoid too large a data transmission latency. The problem can be described as follows.

<sup>&</sup>lt;sup>2</sup>We define TSV defective rate as (defective TSV count)/(total TSV count).



Fig. 14. (a) Bipartite graph constructed from example in Fig. 13 using global sharing scheme. (b) The optimal solution of EM protection when hop constraint is set to 2.

Assume that there are M TSV groups and each group has g TSVs. Group i (i = 1...M) has  $e_i$  spare EM mitigation modules after all defective TSVs within the group being protected using static protection. In addition, group i has  $d_i$  defective TSVs left to be protected after all EM mitigation modules within the group have been consumed. Note that  $e_i$  and  $d_i$  meet the following relationship: if  $e_i = 0$ ,  $d_i \ge 0$ ; if  $e_i > 0$ ,  $d_i = 0$ . The optimization object is to protect defective TSVs as many as possible with the hop count between EM mitigation module and defective TSV protected as the constraint.

This problem can be converted into a maximum bipartite graph matching problem. First, static protection is performed, and new defect map can be derived hereafter. Then, we can construct a bipartite graph: each defective TSV remained is represented by a vertex in the left part of the graph. Each spare EM mitigation module is taken as a vertex in the right part of the graph. If the number of hops between a left vertex and a right vertex is smaller than the specified constraint (determined by performance degradation tolerance), an edge is drawn between the two vertices. Then, a bipartite graph is established and we can use Hungarian algorithm to solve it [26]. The solution will determine connections between defective TSVs and EM mitigation modules.

We use the example shown in Fig. 13 to explain the working procedure of global sharing. The defective TSVs with indexes 1–4 are put into left vertex set. EM mitigation modules with labels A–D are put into right vertex set. Then, depending on hop constraint (we set hop constraint as 2 in this example), edges meeting it are added to the graph. Then, we can find the maximum matching solution, as shown in Fig. 14(a), with matching edges drawn in red color. The number on the edge denotes hop counts between EM mitigation module and defective TSV protected. The final solution is shown in Fig. 14(b). Therefore, to maximize protection rate, TSV 1–4 should connect to modules B, A, C, and D, respectively. In Section IV, we will compare the three different defective TSV protection schemes in terms of EM reliability enhancement, area, and performance overhead.

## **IV. EXPERIMENTAL RESULTS**

#### A. Experimental Setup

As shown in Fig. 15, the simulation target consists of two dies bonding together using the 3-D integration technology. The top tier contains CPU and L1 cache, and the bottom one



Fig. 15. Illustration of the 3-D target platform.

 TABLE II

 PARAMETERS OF THE SIMULATED 3-D PLATFORM

| CPU               | Alpha 21264 1.33GHz                     |  |
|-------------------|-----------------------------------------|--|
| Predictor         | Bimodal predictor,                      |  |
|                   | using a BTB with 2-bit counter          |  |
|                   | 3 cycles miss prediction penalty        |  |
| IFQ Size/LSQ Size | 4/8                                     |  |
| L1 D\$/I\$        | 32KB, 128B block size,                  |  |
|                   | 2-way associative, LRU replacement      |  |
|                   | write-back policy, 1 cycle hit latency  |  |
| L2 Unified \$     | 512KB, 128B block size,                 |  |
|                   | 2-way associative, LRU replacement      |  |
|                   | write-back policy, 6 cycles hit latency |  |
| ITLB/DTLB         | 64 entries/128 entries                  |  |
| TSV Bus Width     | 1024                                    |  |
| TSV Feature Size  | $5\mu$ m diameter, $25\mu$ m depth      |  |
| TSV Pad Size      | $6\mu m 	imes 6\mu m$                   |  |

is L2 cache. Communications between L1 and L2 cache are through TSV bundles. The main memory is assumed to be off-chip. Heat sink is attached to the top CPU tier to facilitate heat dissipation. Cache block size is assumed to be 128 byte and L1/L2 TSV bus width is set to 1024, such that a data block can be transferred within one memory access period. The details of architecture parameters are listed in Table II. To evaluate the effectiveness of our method, we simulate TSV data bus traffic between L1 and L2 cache by revising the SimpleScalar [27] simulator. SPEC2000 benchmark suite is used for our evaluations.

## B. Tradeoff Between Defective TSV EM Protection Rate and Hardware Overhead

1) Protection Rate Evaluation: According to Section III, TSV EM protection rate depends on several parameters: TSV group size, EM mitigation modules available in each group, and defective rate of TSV. It is also closely related to which protection scheme is adopted. To evaluate the effectiveness of each defective TSV protection scheme, we perform extensive simulations with different configurations (including TSV group size, EM mitigation modules in each group, and defective rate). In each configuration, we randomly distribute defective TSVs among all 1024 TSVs with specified defective rate, and simulate 10000× for each case. Then, we calculate the mean value as the protection rate.



Fig. 16. TSV EM protection rates with various configurations when using static-protection scheme. (a) TSV defective rate = 1%. (b) TSV defective rate = 3%. (c) TSV defective rate = 5%.



Fig. 17. TSV EM protection rate in three different cases. Case 1: TSV group size = 4, EM modules/group = 1, and TSV defective rate = 1%. Case 2: TSV group size = 16, EM modules/group = 1, and TSV defective rate = 3%. Case 3: TSV group size = 64, EM modules/group = 1, and TSV defective rate = 5%.

The evaluation results are shown in Fig. 16 when staticprotection scheme is adopted. In Fig. 16, the x-axis represents different group sizes. The y-axis denotes EM mitigation modules available in each group. The z-axis is the protection rate in percentage. It shows that with the decrease in group size and increase in EM mitigation modules available, protection rate increases accordingly. On the other hand, hardware overhead also increases (which will be discussed in Section IV-B2). Taking Fig. 16(c) as an example, the protection rate is 69.9% when EM mitigation module count in each group is 1 and TSV group size is 16. Then, we take a static-protection scheme as a baseline and compare its protection rate with neighbor-sharing and global-sharing schemes, which is shown in Fig. 17. In case 1, group size is 16, defective rate is 1%, and one EM mitigation module is available in each group. In case 2, defective rate is 3% and other configuration parameters remain the same. In case 3, defective rate becomes 5% and others remain the same. We set the hop count constraint as 3 when evaluating a global-sharing scheme (the hop count impact on data transmission delay will be discussed shortly). Fig. 17 shows that both neighbor sharing and global sharing can improve protection rate significantly. For example, neighbor-sharing scheme can improve the protection rate from 70% to 91.5% in case 3 compared with static protection.

The global-sharing scheme performs the best and can achieve 100% protection rate in all the three cases.

2) Hardware Overhead Evaluation: As stated in Section III, in order to connect defective TSVs with EM mitigation modules in the same group, it requires a switching network within the group. Assume that each group has g TSVs and *e* EM mitigation modules. Then, the switching network demands 2ge switching points. Each switching point includes two transmission gates connecting to normal working path and EM mitigation path, respectively. As a result, the switching network has 4ge transmission gates in total. Assume that both pMOS and nMOS transistors in the transmission gate have a minimum size. Then, the gate area is about  $9F^2$ , where F is the feature size of the fabrication technology node. If TSV data bus is divided into M groups, the total area of switching network is  $4Mge \times 9F^2$ . To share EM mitigation modules with other groups when using a neighbor-sharing or globalsharing scheme, each intergroup sharing of EM mitigation module requires extra 2g transmission gates, as shown in Fig. 12(b).

Next, we consider EM mitigation module area overhead. Since the TSV resistance testing structure and scan registers can be reused from design-for-testability circuitry, we do not take their area overheads into account. The proposed EM mitigation circuit only introduces several primary logic gates, pass transistors, and a counter. We use ST 90-nm technology library and Synopsys dc complier for circuit synthesis [28]. The area of one EM mitigation module is 264.6  $\mu$ m<sup>2</sup>. Although configuring more EM mitigation modules in each TSV group can increase protection rate, the area overhead also increases. The benefit of a neighbor-sharing or global-sharing scheme is that we can achieve higher protection rate by sharing some spare modules from other groups without increasing module count in each group. Taking a group size of 16 TSVs as an example, we assume that TSV defective rate is 5%. To achieve 99.9% protection rate, a static-protection scheme [25] needs three EM mitigation modules in each group. Neighbor sharing needs 2 in each group. Global sharing taking 3 hops as constraint only needs 1 EM mitigation module configured in each group. However, to achieve desirable protection rate, a neighbor-sharing scheme needs to support at most 2 EM mitigation modules shared from other groups. Global sharing requires at most



Fig. 18. Area overhead comparisons of static protection [25], neighborsharing and global-sharing schemes to achieve 99.9% protection rate, assuming that the TSV group size is 16, and TSV defective rate is 5%.

3 modules shared from other groups. Although the sharing scheme may incur some switching transmission gate overhead as stated above, it saves EM mitigation modules whose area dominates the overall area overhead. Fig. 18 compares area overheads, including both EM mitigation module area and switching network area, of different TSV protection schemes. As shown in Fig. 18, EM mitigation module area can be reduced significantly by neighbor sharing and global sharing. Therefore, the total area can be saved. A neighbor-sharing scheme reduces area overhead by 32.8% while global sharing reduces area overhead by 65.5%. Considering the chip area of our simulation target is  $1.6 \text{ mm} \times 1.6 \text{ mm}$  using the Taiwan Semiconductor Manufacturing Company (TSMC) 90nm technology, the area overhead incurred by a global-sharing scheme is only 0.7% while that incurred by a static-protection scheme [25] is 2%. We also use Synopsys PrimeTime PX [29] for power estimation of EM mitigation circuit under vectorless mode. The average power of each module is 7.13  $\mu$ W. The total power consumption of each scheme is also shown in Fig. 18. It indicates the similar trend as that of area overhead. Global sharing has only 0.46-mW power consumption while the power of static-protection scheme is 1.37 mW.

Considering that defective TSVs can also be protected from EM by allocating redundant TSVs, we compare their hardware overheads to show the effectiveness of our proposed method. In this paper, we adopt the fault tolerance scheme proposed in [12], and assume that the same method can be used to protect defective TSVs as well. First, we describe their method briefly as follows. By connecting TSVs' input and output terminals with two-input multiplexers, defective TSVs can be replaced by redundant TSVs, as shown in Fig. 19. In that paper, they group TSVs into many TSV clusters to improve repair effectiveness. Each cluster has 38 TSVs and they allocate each 38-TSV cluster with 2, 3, 4, 7, 11, and 38 redundant TSVs to achieve different repair rates. We take the one with lowest area overhead, i.e., 2 redundant TSVs for comparison. It can tolerate 2/38 = 5% defective rate that also coincides with our assumption in this paper. According to [30], for 5- $\mu$ m TSV size used in this paper, TSV pitch is 20  $\mu$ m to achieve a fine pitch TSV integration. Then, the number of redundant TSVs required is  $1024/38 \times 2 = 54$ .



Fig. 19. TSV fault tolerant scheme proposed in [12] to be compared with our proposed method.

Assume that they are organized into a  $6 \times 9$  array. Then, the redundant TSV area overhead is  $(TSV \text{ pitch})^2 \times 5 \times 8 =$ 16000  $\mu$ m<sup>2</sup>. As shown in Fig. 19, to shift out defective TSVs, each cluster needs 38 two-input multiplexers on input side and 37 two-input multiplexers on output side. Then, it requires  $1024/38 \times 75 = 2021$  two-input multiplexers totally. The minimum area of two-input multiplexer in the TSMC 90-nm technology library is 6.35  $\mu$ m<sup>2</sup>. Then, the total multiplexer area is 12833.35  $\mu$ m<sup>2</sup> and the total area, including both multiplexers and redundant TSVs, is 28833.35  $\mu$ m<sup>2</sup>, i.e., 1.13% of the whole chip area. If more flexible scheme, such as group sharing or global sharing, is used, more area overhead will be incurred. Compared with the redundant TSV scheme, the global-sharing scheme in this paper only incurs 16 934  $\mu$ m<sup>2</sup> area overhead, as shown in Fig. 18, which reduces area by 41% compared with redundant the TSV fault tolerance scheme. Furthermore, as mentioned in [12], TSV size cannot keep the same scaling pace as a CMOS transistor. With the technology node keeps scaling down, the area overhead gap between TSV and transistor will become more prominent, and our proposed EM mitigation scheme can show larger benefit in terms of hardware overhead.

# C. EM Reliability Enhancements by Three Different Protection Schemes

Using the evaluation method stated above, we perform extensive design space explorations and choose 16 as a TSV group size, EM mitigation module in each group as 1, and global hop count constraint for global sharing as 3 to achieve an optimal tradeoff between hardware overhead and protection effectiveness. In the following experiments, we will use this configuration to evaluate EM reliability enhancements among three different protection schemes. We assume that the defects are distributed randomly within the TSV bundle with 5% defective rate. Note that our proposed technique can be applied to any defective rate of TSVs. The only modification is that group size should be tuned accordingly to make sure all the defective TSVs can be protected.

At the beginning of the simulation, we fast forward ten million instructions to warm up the cache. Then, we run 100 million instructions for cycle-accurate simulation. During every L1 cache read/write miss or write back from L1 cache to L2 cache, we trace data patterns between them. The current direction in every clock cycle can be derived based on Table I. Subsequently, we can calculate the duty cycle of current flowing through each TSV. After that, the effective current



Fig. 20. Current flow direction differences and current density comparisons. (a) SPEC2000 integer benchmarks. (b) SPEC2000 floating point benchmarks.



Fig. 21. MTTF comparisons on (a) SPEC2000 integer benchmarks and (b) SPEC2000 floating point benchmarks.

density can be derived by the method proposed in [15], and MTTF of each TSV can be obtained by (1). The minimum MTTF value among all 1024 data bus TSVs determines the lifetime of the whole chip.

First, we assume that all defective TSVs can be protected to evaluate the effectiveness of a self-healing protection mechanism from a qualitative sense. Fig. 20 shows the current flow direction difference comparisons. As shown in Fig. 20, the current flow direction can be well balanced by our proposed EM mitigation technique for both SPEC2000 integer and floating point benchmarks. Fig. 20 also shows the current densities normalized to those without EM mitigation. The results show that current density can be dramatically reduced by self-healing effect for most applications.

To evaluate the protection effectiveness of different protection schemes, we define a metric  $MTTF_{eff}$  called effective



Fig. 22. Differences of reversing current directions within TSVs. (a) bzip2. (b) ammp.

EM reliability enhancement as follows:

$$MTTF_{eff} = MTTF_n \times (1 - p) + p \times MTTF_p \qquad (3)$$

where  $MTTF_n$  is the MTTF without any EM mitigation technique used,  $MTTF_p$  is the MTTF when using some EM mitigation technique, and p is the protection rate of defective TSVs. This metric can reflect the real reliability enhancement approached by the scheme adopted, since protection rate heavily impact the final reliability because of bucket effect. According to our configuration, the protection rate of three schemes are 70%, 91%, and 100%, as shown in Fig. 17. TSV EM MTTF comparisons are shown in Fig. 21. Fig. 21(a) shows the simulation results for SPEC2000 integer benchmarks, and Fig. 21(b) shows those for SPEC2000 floating point benchmarks. The y-axis denotes EM MTTF achieved by different protection methods normalized to that without EM mitigation. It indicates that EM MTTF of the chip can be improved dramatically for both benchmark suites, which implies that our proposed method can be effective for most of the applications. Static protection [25] can improve EM MTTF by  $83 \times$  for SPEC2000 integer benchmarks and  $51.9 \times$  for SPEC2000 floating point benchmarks on average. Compared with it, neighbor sharing and global sharing can achieve better reliability enhancements due to flexible EM mitigation module sharing. Among three schemes, global sharing performs the best, which can improve EM MTTF for SPEC2000 integer benchmarks and floating point benchmarks by  $119.6 \times$  and  $73.8 \times$  on average, respectively.

From Fig. 21, we also have an interesting observation. It is that depending on the characteristics of applications, the effectiveness of our method is also different. Taking bzip2 and ammp as examples, the MTTF is improved by only less than  $20 \times$  for bzip2 while it is improved by  $70 \times$  for ammp. To explain the reason behind this phenomenon, we trace the differences between different current flow directions within TSV bus for the two applications and show them in Fig. 22. The maximum difference between alternative current directions of bzip2 is much smaller than that of ammp (max.  $1.8 \times 10^4$  versus max.  $5 \times 10^4$ ), which means that bzip2



Fig. 23. Application execution performance comparisons between the case with EM mitigation and that without EM mitigation. (a) SPEC2000 integer benchmarks. (b) SPEC2000 floating point benchmarks.

has better current flow balance property than ammp. Therefore, the effectiveness of EM mitigation is more significant for ammp.

At last, we evaluate the performance overhead incurred by EM mitigation. To evaluate the performance impact of three different schemes, we use Cadence Spectre [31] for interconnect delay simulation. The interconnect geometry is derived from the predictive technology model with 90-nm technology node [32]. We construct an RLC ladder for both neighbor sharing and global sharing to estimate intergroup switching network timing overhead. Neighbor sharing can be treated as the special case with hop count as 1. It introduces 90-ps extra delay. In addition, as shown in Fig. 10, since EM mitigation circuit can work in parallel with signal transmission, only two inverters are connected in series with send buffer and lie in the critical path. The inverter propagation delay is 25 ps each. Therefore, the total timing delay is 140 ps. As for the global-sharing scheme, if the hop count is limited to 3 hops, the incurred delay, including both switching network and inverter, is 320 ps. No matter which case, we just simply add one extra clock cycle to L2 cache access time to evaluate the performance overhead. The performance comparisons are plotted in Fig. 23. In Fig. 23, the x-axis denotes different benchmarks for performance evaluation. The y-axis represents execution performance in instructions per cycle. It shows that our proposed TSV EM mitigation framework only introduces <1% performance degradation for both SPEC2000 integer and floating point benchmarks.

#### V. RELATED WORK

As early as the 1960s, several researchers have already observed EM in aluminum metal lines. Belch and Meieran [24] observed eletrotransport phenomenon in aluminum metal lines. In their experiments, voids or holes can be formed where electrons flow along with the increasing temperature direction, while, hillocks may generate where electrons flow along with the decreasing temperature direction. Black [1] proposed the famous MTTF formula, considering EM effect. For the detailed failure mechanism of EM, readers can refer to [33], which is a good survey of EM in aluminum metal lines. Starting from the 1980s, copper gradually replaces aluminum as the on-chip interconnect material. Due to its high melting point, EM effect is alleviated significantly. However, as the current density increases sharply for deep submicrometer interconnects, EM effect becomes a challenging issue again, and is believed to be more severe as the technology node scales further. Hau-Riege [34] investigated EM phenomenon in copper metal lines, and compared it with that of aluminum metal lines. Gozalez and Rubio [16] explored a relationship between metal line shape and EM effect.

Due to EM effect threatening lifetime of the chip significantly, a lot of papers in the literature proposed many effective techniques to mitigate it from various levels. Liew et al. [15] considered EM-related reliability issues at a register transfer level. Through judiciously mapping control data flow graph onto data buses, the MTTF considering EM effect can be improved significantly. Teng et al. [36] proposed a hierarchical analysis method for EM diagnosis. Liew et al. [15] observed that different current waveforms may cause different EM effects even if their magnitudes are the same. It implies that the metal interconnects have self-healing abilities under alternative current flows. Todri et al. [37] investigated the EM effect caused by the power-gating technique on power grid interconnects, and proposed a grid line sizing algorithm to solve this problem. Li et al. [2] concentrated on EM effect on power grid interconnect vias and explored the tradeoff between power signal integrity and area overhead. Jain et al. [38] proposed a new methodology to evaluate the EM effect on system-on-chips by separately characterizing individual current components and try to make retargeting reliability specifications across different markets or block levels within a chip. Chen et al. [13] considered varying temperature and current requirements at run time for the EM effect analysis. Jain *et al.* emphasized the importance of considering EM reliability across the whole workflow from foundry fabrication up to a system design [38]. Guan and Marek-Sadowska [39] analyzed the EM effect on signal line reliability, which carries ac current, and proposed a theoretical model to quantify healing effect due to ac currents. Chen et al. investigated the multibranch interconnects suffering from EM effect. By applying the first principle theory on wire stress evaluation, a theoretical model that can predict EM-induced reliability issues accurately was successfully built [40]. Abella et al. [17] proposed a refueling microarchitecture to alleviate EM effect and enhance the reliability of metal interconnects. However, due to unique fabrication process and electrical characteristics, they cannot be applied to TSVs directly.

When 2-D IC design transforms to 3-D IC design, EM problem can still occur on a TSV due to the high thermomechanical stress gradient between TSV and bonding pad. Tan *et al.* [8] investigated the TSV EM performance and evaluated different possible EM occurring sites on TSVs. Pak *et al.* [9] evaluated the EM impact on TSVs from layout perspective and provided some guidelines for the EM-robust TSV design. Frank *et al.* [14] explored the impact of EM on resistances of TSVs. Zhao *et al.* [41] investigated the EM impact on power delivery networks and showed that EM can cause large IR drop on power grid interconnects. However, most of them focus on EM modeling on the TSV without proposing any holistic EM mitigation methodology, especially for defective TSVs. Cheng *et al.* [25] proposed to take advantage of selfhealing effect to mitigate EM effect on defective TSVs but their method introduces large area overhead and cannot protect all defective TSVs from EM effectively.

## VI. CONCLUSION

As integration density rockets up with every technology generation, interconnect optimization becomes more intractable for 2-D ICs. Consequently, the 3-D technology emerges as an effective method to continue Moore's law. The reliability of 3-D ICs, however, requires investigation to improve the fabrication yield and chip lifetime. Among them, the EM-induced reliability issue is of a great concern. In this paper, we investigate the defect-induced EM of TSVs and analyze the relationship between defects and EM reliability of 3-D ICs. Then, we propose a framework to enhance TSV reliability by balancing current flows within TSVs. Furthermore, we proposed two defective TSV protection schemes (i.e., neighbor sharing and global sharing) to achieve the desirable tradeoff between EM MTTF enhancement and resulting hardware and timing overhead. Through extensive experiments, we show the effectiveness of our proposed method, especially when global-sharing protection scheme is adopted, which can improve the EM reliability significantly with negligible hardware and timing overhead.

#### ACKNOWLEDGMENT

The authors would like to thank Prof. L. Zhang from the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China, for the insightful suggestions and discussions on electromigration mitigation module design and optimization.

#### REFERENCES

- J. R. Black, "Electromigration failure modes in aluminum metallization for semiconductor devices," *Proc. IEEE*, vol. 57, no. 9, pp. 1587–1594, Sep. 1969.
- [2] D.-A. Li, M. Marek-Sadowska, and S. R. Nassif, "A method for improving power grid resilience to electromigration-caused via failures," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 23, no. 1, pp. 118–130, Jan. 2015.
- [3] W. R. Davis et al., "Demystifying 3D ICs: The pros and cons of going vertical," *IEEE Des. Test Comput.*, vol. 22, no. 6, pp. 498–510, Nov./Dec. 2005.
- [4] K. Banerjee, S. J. Souri, P. Kapur, and K. C. Saraswat, "3-D ICs: A novel chip design for improving deep-submicrometer interconnect performance and systems-on-chip integration," *Proc. IEEE*, vol. 89, no. 5, pp. 602–633, May 2001.
- [5] G. H. Loh, Y. Xie, and B. Black, "Processor design in 3D die-stacking technologies," *IEEE Micro*, vol. 27, no. 3, pp. 31–48, May/Jun. 2007.

- [6] G. Sun, X. Dong, Y. Xie, J. Li, and Y. Chen, "A novel architecture of the 3D stacked MRAM L2 cache for CMPs," in *Proc. IEEE 15th Int. Symp. High Perform. Comput. Archit. (HPCA)*, Raleigh, NC, USA, Feb. 2009, pp. 239–249.
- [7] W. Zhao and G. Prenat, Eds., Spintronics-Based Computing. Cham, Switzerland: Springer, 2015.
- [8] Y. C. Tan, C. M. Tan, X. W. Zhang, T. C. Chai, and D. Q. Yu, "Electromigration performance of through silicon via (TSV)—A modeling approach," *Microelectron. Rel.*, vol. 50, nos. 9–11, pp. 1336–1340, 2010.
- [9] J. Pak, M. Pathak, S. K. Lim, and D. Z. Pan, "Modeling of electromigration in through-silicon-via based 3D IC," in *Proc. IEEE 61st Electron. Compon. Technol. Conf. (ECTC)*, Lake Buena Vista, FL, USA, May/Jun. 2011, pp. 1420–1427.
- [10] D. Sekar et al., "A 3D-IC technology with integrated microchannel cooling," in Proc. IEEE Int. Interconnect Technol. Conf. (IITC), Burlingame, CA, USA, Jun. 2008, pp. 13–15.
- [11] B. Kim, C. Sharbono, T. Ritzdorf, and D. Schmauch, "Factors affecting copper filling process within high aspect ratio deep vias for 3D chip stacking," in *Proc. IEEE 56th Electron. Compon. Technol. Conf. (ECTC)*, San Diego, CA, USA, May/Jun. 2006, pp. 838–843.
- [12] I. Loi, S. Mitra, T. H. Lee, S. Fujita, and L. Benini, "A low-overhead fault tolerance scheme for TSV-based 3D network on chip links," in *Proc. IEEE/ACM Int. Conf. Comput.-Aided Design (ICCAD)*, San Jose, CA, USA, Nov. 2008, pp. 598–602.
- [13] Z. Chen, Z. Lv, X. Wang, Y. Liu, and S. Liu, "Modeling of electromigration of the through silicon via interconnects," in *Proc. IEEE 11th Int. Conf. Electron. Packag. Technol., High Density Packag. (ICEPT-HDP)*, Guilin, China, Aug. 2010, pp. 1221–1225.
- [14] T. Frank *et al.*, "Resistance increase due to electromigration induced depletion under TSV," in *Proc. IEEE Int. Rel. Phys. Symp. (IRPS)*, Piscataway, NJ, USA, Apr. 2011, pp. 3F.4.1–3F.4.6.
  [15] B.-B. Liew, N. W. Cheung, and C. Hu, "Projecting interconnect elec-
- [15] B.-B. Liew, N. W. Cheung, and C. Hu, "Projecting interconnect electromigration lifetime for arbitrary current waveforms," *IEEE Trans. Electron Devices*, vol. 37, no. 5, pp. 1343–1351, May 1990.
  [16] J. L. Gonzalez and A. Rubio, "Shape effect on electromigration in
- [16] J. L. Gonzalez and A. Rubio, "Shape effect on electromigration in VLSI interconnects," *Microelectron. Rel.*, vol. 37, no. 7, pp. 1073–1078, 1997.
- [17] J. Abella, X. Vera, O. S. Unsal, O. Ergin, A. González, and J. W. Tschanz, "Refueling: Preventing wire degradation due to electromigration," *IEEE Micro*, vol. 28, no. 6, pp. 37–46, Nov./Dec. 2008.
- [18] J. Tao, N. W. Cheung, and C. Hu, "Metal electromigration damage healing under bidirectional current stress," *IEEE Electron Device Lett.*, vol. 14, no. 12, pp. 554–556, Dec. 1993.
- [19] F. Li, C. Nicopoulos, T. Richardson, Y. Xie, V. Narayanan, and M. Kandemir, "Design and management of 3D chip multiprocessors using network-in-memory," in *Proc. IEEE/ACM 33rd Int. Symp. Comput. Archit. (ISCA)*, Boston, MA, USA, Jun. 2006, pp. 130–141.
- [20] T. Zhang, C. Xu, K. Chen, G. Sun, and Y. Xie, "3D-SWIFT: A highperformance 3D-stacked wide IO DRAM," in *Proc. IEEE/ACM 24th Greate Lake Symp. VLSI (GLVLSI)*, Houston, TX, USA, May 2014, pp. 51–56.
- [21] M. B. Healy and S. K. Lim, "A novel TSV topology for many-tier 3D power-delivery networks," in *Proc. IEEE/ACM Design, Autom. Test Eur. Conf. Exhibit. (DATE)*, Grenoble, France, Mar. 2011, pp. 1–4.
- [22] J. Van Olmen *et al.*, "3D stacked IC demonstrator using hybrid collective die-to-wafer bonding with copper through silicon vias (TSV)," in *Proc. IEEE Int. 3D Syst. Integr. Conf. (3DIC)*, San Francisco, CA, USA, Sep. 2009, pp. 1–5.
- [23] M. Cho, C. Liu, D. H. Kim, S. K. Lim, and S. Mukhopadhyay, "Design method and test structure to characterize and repair TSV defect induced signal degradation in 3D system," in *Proc. IEEE/ACM Int. Conf. Comput.-Aided Design (ICCAD)*, San Jose, CA, USA, Nov. 2010, pp. 694–697.
- [24] I. A. Blech and E. S. Meieran, "Direct transmission electron microscope observation of electrotransport in aluminum thin films," *Appl. Phys. Lett.*, vol. 11, no. 8, pp. 263–266, 1967.
- [25] Y. Cheng et al., "A novel method to mitigate TSV electromigration for 3D ICs," in Proc. IEEE Comput. Soc. Annu. Symp. VLSI (ISVLSI), Natal, Brazil, Aug. 2013, pp. 121–126.
- [26] T. H. Cormen, C. Stein, R. L. Rivest, and C. E. Leiserson, *Introduction to Algorithms*, 2nd ed. New York, NY, USA: McGraw-Hill, 2001.
  [27] T. Austin, E. Larson, and D. Ernst, "SimpleScalar: An infrastructure
- [27] T. Austin, E. Larson, and D. Ernst, "SimpleScalar: An infrastructure for computer system modeling," *Computer*, vol. 35, no. 2, pp. 59–67, Feb. 2002.
- [28] Synopsys. DC Compiler. [Online]. Available: http://www. synopsys.com
- [29] Synopsys. PrimeTime. [Online]. Available: http://www.synopsys.com

- [30] A. Dembla, Y. Zhang, and M. S. Bakir, "Fine pitch TSV integration in silicon micropin-fin heat sinks for 3D ICs," in *Proc. IEEE Interconnect Technol. Conf. (IITC)*, San Jose, CA, USA, Jun. 2012, pp. 1–3.
- [31] Cadence. Spectre. [Online]. Available: http://www.cadence.com
- [32] PTM. Interconnect. [Online]. Available: http://ptm.asu.edu
- [33] F. M. d'Heurle, "Electromigration and failure in electronics: An intro-
- duction," *Proc. IEEE*, vol. 59, no. 10, pp. 1409–1418, Oct. 1971.
  [34] C. S. Hau-Riege, "An introduction to Cu electromigration," *Microelectron. Rel.*, vol. 44, no. 2, pp. 195–205, 2004.
- [35] A. Dasgupta and R. Karri, "Electromigration reliability enhancement via bus activity distribution," in *Proc. 33rd Annu. Design Autom. Conf. (DAC)*, New York, NY, USA, Jun. 1996, pp. 353–356.
- [36] C.-C. Teng, Y.-K. Cheng, E. Rosenbaum, and S.-M. Kang, "Hierarchical electromigration reliability diagnosis for VLSI interconnects," in *Proc. 33rd Annu. Design Autom. Conf. (DAC)*, Las Vegas, NV, USA, Jun. 1996, pp. 752–757.
  [37] A. Todri, S.-C. Chang, and M. Marek-Sadowska, "Electromigration and
- [37] A. Todri, S.-C. Chang, and M. Marek-Sadowska, "Electromigration and voltage drop aware power grid optimization for power gated ICs," in *Proc. IEEE/ACM Int. Symp. Low Power Electron. Design (ISLPED)*, Aug. 2007, pp. 391–394.
- [38] P. Jain, S. S. Sapatnekar, and J. Cortadella, "A retargetable and accurate methodology for logic-IP-internal electromigration assessment," in *Proc.* 20th Asia South Pacific, Design Autom. Conf. (ASP-DAC), Tokyo, Japan, Jan. 2015, pp. 346–351.
- [39] Z. Guan and M. Marek-Sadowska, "Atomic flux divergence-based AC electromigration model for signal line reliability assessment," in *Proc. IEEE 65th Electron. Compon. Technol. Conf. (ECTC)*, San Diego, CA, USA, May 2015, pp. 2155–2161.
- [40] H. B. Chen, S. X. D. Tan, V. Sukharev, X. Huang, and T. Kim, "Interconnect reliability modeling and analysis for multi-branch interconnect trees," in *Proc. 52nd ACM/EDAC/IEEE Design Autom. Conf. (DAC)*, San Francisco, CA, USA, Jun. 2015, pp. 1–6.
- [41] X. Zhao, Y. Wan, M. Scheuermann, and S. K. Lim, "Transient modeling of TSV-wire electromigration and lifetime analysis of power distribution network for 3D ICs," in *Proc. IEEE/ACM Int. Conf. Comput.-Aided Design (ICCAD)*, Florence, Italy, Oct. 2013, pp. 363–370.



Aida Todri-Sanial (M'03) received the B.S. degree in electrical engineering from Bradley University, Peoria, IL, USA, in 2001, the M.S. degree in electrical engineering from Long Beach State University, Long Beach, CA, USA, in 2003, and the Ph.D. degree in electrical and computer engineering from the University of California at Santa Barbara, Santa Barbara, CA, USA, in 2009.

She was an R&D Engineer with the Fermi National Accelerator Laboratory, Batavia, IL, USA. She has held visiting positions with Mentor Graph-

ics, Wilsonville, OR, USA, Cadence Design Systems, San Jose, CA, USA, STMicroelectronics, Geneva, Switzerland, and the IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA. She is currently a Research Scientist with the French National Center of Scientific Research, Laboratoire dInformatique de Robotique et de Microlectronique de Montpellier, University of Montpellier, Montpellier, France.

Dr. Todri-Sanial was a recipient of John Bardeen Fellow in Engineering in 2009.



**Jianlei Yang** (S'12–M'15) received the B.S. degree in microelectronics from Xidian University, Xi'an, China, in 2009, and the Ph.D. degree in computer science and technology from Tsinghua University, Beijing, China, in 2014.

He was a Research Intern with Intel Laboratories China, Intel Corporation, Beijing, China, from 2013 to 2014. He is currently a Post-Doctoral Researcher with the Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA, USA. His current research interests

include numerical algorithms for VLSI power grid analysis and verification, spintronics, and neuromorphic computing.

Dr. Yang was a recipient of the first place prize in the TAU Power Grid Simulation Contest in 2011, and the second place prize in the TAU Power Grid Transient Simulation Contest in 2012. He was also a recipient of the IEEE International Conference on Computer Design Best Paper Award in 2013, and the Association for Computing Machinery Great Lakes Symposium on VLSI Best Paper Nomination in 2015.



Yuanqing Cheng (S'11–M'13) received the Ph.D. degree from the Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.

He spent one-year post-doctoral study with the Laboratoire d'Informatique de Robotique et de Microlectronique de Montpellier, Centre National de la Recherche Scientifique, University of Montpellier, Montpellier, France. He joined Beihang University, Beijing, as an Assistant Professor. His

current research interests include VLSI design for 3-D integrated circuits considering thermal and defect issues, and spintronics computing system architecture design.

Dr. Cheng is a member of the Association for Computing Machinery.



Weisheng Zhao (M'06–SM'14) received the Ph.D. degree in physics from the University of Paris-Sud, Orsay, France, in 2007.

He investigated spintronic devices-based logic circuits and designed a prototype for hybrid spintronic/CMOS (90 nm) chip in cooperation with STMicroelectronics, Geneva, Switzerland, from 2004 to 2008. He has been with the French National Center of Scientific Research, Paris, France, as a Tenured Research Scientist, since 2009. He became a Youth 1000 Plan Distinguished Professor with

Beihang University, Beijing, China, in 2014. He has authored or co-authored over 150 scientific papers, in journals such as *Advanced Materials*, *Nature Communications*, and the IEEE TRANSACTIONS, and he is also the Principal Inventor of four international patents. His current research interests include hybrid integration of nanodevices with CMOS circuit and new nonvolatile memory (40-nm technology node and below) like MRAM circuit and architecture design.

Dr. Zhoa is the Associate Editor of the IEEE TRANSACTIONS ON NANOTECHNOLOGY.