

### **Device Challenges and Opportunities**

#### **Prof. Tsu-Jae King Liu**

*Electrical Engineering and Computer Sciences Department University of California at Berkeley* 



#### March 24, 2016

DOE Workshop on Energy Efficient Electronics

## Improving CMOS Energy Efficiency



- To reduce power consumption, the chip operating voltage must be reduced – but this results in slower circuit operation.
- → Parallelism (multi-core processing) is used today to improve system throughput, within power constraints.

#### **Game Over for CMOS**

• When each core operates at the minimum energy, increasing chip performance requires more power.



• The energy-delay tradeoff for a CMOS logic circuit can be understood by considering a cascade of inverters:



**Fig. 1.2** Inverter-based model for combinational logic energy and performance. E. Alon, Ch.1, CMOS and Beyond: Logic Switches for Terascale Integrated Circuits, Cambridge University Press, 2015.

 The clock frequency of the microprocessor is limited by the delay of the combinational logic between the clocked registers.
L<sub>d</sub> stages, only 1 actively switching at a time (the other stages are static)

#### **CMOS Energy per Operation**



#### **New Logic Switch Requirement**



- Higher I<sub>ON</sub>/I<sub>OFF</sub> → lower Energy/op
- $\rightarrow$  Much steeper switching behavior is needed!

#### **Approaches to Facilitate Voltage Scaling**

 To operate with lower V<sub>DD</sub> without sacrificing circuit performance (*i.e.* maintaining high ON-state current) for a given I<sub>OFF</sub> specification, the MOSFET ON/OFF current ratio must be improved:



### **3-D Transistor ("Tri-gate" or "FinFET")**



- Superior gate control
- $\rightarrow$  steeper switching
- → Lower V<sub>DD</sub> for target I<sub>ON</sub>
- Multiple fins can be connected in parallel to achieve higher drive current.

### **Advanced Channel Materials**

- High-mobility semiconductor materials potentially can provide for improved performance:
  - Ge for PMOS
  - (In)GaAs for NMOS

|                                            | Si    | Ge    | GaAs  |
|--------------------------------------------|-------|-------|-------|
| Electron mobility<br>(cm <sup>2</sup> /Vs) | 1500  | 3900  | 8500  |
| Hole mobility<br>(cm²/Vs)                  | 450   | 1900  | 400   |
| Lattice constant (Å)                       | 5.431 | 5.646 | 5.653 |
| Band gap (eV)                              | 1.12  | 0.66  | 1.424 |
| Dielectric constant                        | 12    | 16    | 13    |

• Selective epitaxial growth directly on Si is facilitated by the use of a corrugated substrate:



J.-S. Park et al., Appl. Phys. Lett. 90 052113, 2007





J. Z. Li et al., Appl. Phys. Lett. 91 021114, 2007

#### **Heterogeneous CMOS Integration**

M. Heyns (IMEC), EuroNanoForum 2013



Demonstration of CMOS Ge/InP virtual substrate by ART (Aspect Ratio Trapping)

## **Remaining Issues with ART**

N. Waldron (IMEC), ISTDM 2012

#### "Perpendicular" view





Efficient defect necking effect

Effective double step formation on the "rounded-Ge" surface

APB observed only with an almost flat Ge surface

#### "Parallel" view

High defect density in parallel view

twins/Stacking Faults/APBs

APBs originate from single steps along [110]?



#### **Challenges for FinFET Architecture**

N. Waldron (IMEC), ISTDM 2012



## Si vs. In<sub>0.7</sub>Ga<sub>0.3</sub>As FinFETs

N. Xu (UC Berkeley), unpublished



- Narrower fin width is required for InGaAs FinFET vs. Si FinFET
- $V_{TH}$  is more sensitive to  $W_{fin}$  for InGaAs FinFETs

## **Outlook for III-V MOSFETs**

courtesy V. Moroz (Synopsys, Inc.)



- Any new technology should last for at least 2 technology nodes
- Si<sub>1-x</sub>Ge<sub>x</sub> channel is easier to manufacture

→ III-V channel materials have a narrow window of opportunity (?)

### **Sources of Variability**

- Sub-wavelength lithography:
  - Resolution enhancement techniques are costly and increase process sensitivity



Layout-dependent transistor performance:

courtesy Mike Rieger (Synopsys, Inc.)

- Process-induced stress is dependent on layout
- Random dopant fluctuations (RDF):
  - Atomistic effects become significant in nanoscale FETs



A. Brown et al., IEEE Trans. Nanotechnology, p. 195, 2002



# Impact of Misalignment



6-T SRAM Cell

PG

BLB

PD



#### Actual layout w/ vertical misalignment (channel width variations due to active jogs)



#### Impact of Variability on SRAM

•  $V_{\text{TH}}$  mismatch results in reduced static noise margin.  $\rightarrow$  lowers cell yield, and limits  $V_{\text{DD}}$  scaling



Y. Tsukamoto et al., Proc. IEEE/ACM ICCAD, p. 398, 2005

→Immunity to short-channel effects (SCE) and narrow-width effects as well as RDF effects is needed to achieve high SRAM cell yield.

## **Double Patterning of Gate**

6-T SRAM Cell

PG

BLB



#### **Future Device Requirements**

- Low operating voltage  $\rightarrow$  Low active power
- Robust to variations

• Zero leakage

- $\rightarrow$  Low cost
  - $\rightarrow$  Zero standby power

#### **Micro-Electro-Mechanical Switch**

#### OFF State:



- Zero OFF-state current (I<sub>OFF</sub>); abrupt switching
  - Turns on by electrostatic force ( $F_{elec}$ ) when  $|V_{GS}| \ge V_{Pl}$
  - Turns off by spring restoring force ( $F_{spring}$ ) when  $|V_{GS}| \leq V_{RL}$

### **Surface Micromachining Process**

#### **Cross-sectional View**



- Mechanical structures can be made using conventional microfabrication techniques
- Structures are freed by selective removal of sacrificial layer(s)

## **Relay Design for Digital Logic**



### **Logic Relay Structure & Operation**



#### **NEM Relay Switching Energy-Delay**

C. Qian et al., 2015 International Electron Devices Meeting (Paper 18.1)



### **IC Technology Advancement**



- Advanced back-end-of-line (BEOL) processes have air-gapped interconnects
- $\rightarrow$  can be adapted for fabrication of compact NEMS!



D. C. Edelstein, 214th ECS Meeting, Abstract #2073, 2008



#### **BEOL NEM Switch**

N. Xu et al. (UC Berkeley), 2014 IEEE International Electron Devices Meeting



courtesy of Dr. Kimihiko Kato (UC Berkeley)

 A relay can be implemented using multiple metal layers

Vias can be used for electrical connection and as torsional elements for lower  $k_{eff}$ 

- Actuation electrodes on opposite sides of movable electrode structure
  - → 2 stable states (contacting D0 or D1)
- Low-voltage (<1 V) operation can be achieved with small footprint (< 0.1 μm<sup>2</sup>).

#### **Non-Volatile NEMory Cell Structure**

K. Kato et al., IEEE Electron Device Letters, Vol. 37, pp. 31-34, 2016



#### **In-Memory Computing**

K. Kato et al., IEEE Electron Device Letters, Vol. 37, pp. 31-34, 2016

• NV-NEMory cell array for memory-based super-parallel data searching



#### Data Search Step 1: Match "0"

K. Kato et al., IEEE Electron Device Letters, Vol. 37, pp. 31-34, 2016

#### Reference Data: 1100



#### Data Search Step 2: Match "1"

K. Kato et al., IEEE Electron Device Letters, Vol. 37, pp. 31-34, 2016

#### Reference Data: 1100



### **Energy and Delay for Data Search**

K. Kato et al., IEEE Electron Device Letters, Vol. 37, pp. 31-34, 2016

#### Energy Delay 1 column 1 column 256 columns **Cells involved:** $\times$ 256 rows $\times$ 256 rows × 1 row Program ( $V_{\rm prog} = 2.5 \, \rm V$ ) < 10 ns N/A 15 fJ 2.0 pJ Match "0" or Match "1" N/A 1.2 pJ < 0.2 nsN/A

#### 256 × 256 NV-NEMory Array

- The location of a data string can be found in <0.5 ns with less than 2.5 pJ.
- → For a die size of 42 mm<sup>2</sup> (same as DDR4 DRAM) at F = 20 nm and cell density of 65% (similar to DRAM), a NV-NEMory chip would have the capacity 8 Gb and would consume only 300 nJ to find a match on the whole chip.
- $\rightarrow$  In comparison, it would take CPU+DRAM ~90 mJ, 80 ms for the same task.

Relatively fast read speed & low power consumption make NV-NEMory technology well-suited for real-time data searching applications!

#### Summary

- <u>Challenges</u>:
  - CMOS technology has a fundamental limit in energy efficiency, due to non-zero transistor OFF-state current.
  - $\rightarrow$ New logic switch designs are needed to overcome this limit!
    - Steeply switching with zero I<sub>OFF</sub>
    - Robust to process-induced variations
- **Opportunities**:
  - 2-D semiconductor materials, negative capacitance FETs, ...
    - Semiconductor device designs which do not utilize doping
  - Nanomanufacturing innovations to lower cost per function
  - Collaboration across domains of expertise to co-optimize device technology, circuit/system architecture, algorithms
    - Examples: Reconfigurable specializers, communication-avoiding and write-avoiding algorithms

#### **Cell-Level Comparison of Emerging NVM Technologies**

K. Kato et al., IEEE Electron Device Letters, Vol. 37, pp. 31-34, 2016

|                 | NAND<br>Flash                    | РСМ                     | Redox<br>RRAM             | STT-<br>MRAM                                        | NV-<br>NEMory                  | Stand<br>alone<br>DRAM  |
|-----------------|----------------------------------|-------------------------|---------------------------|-----------------------------------------------------|--------------------------------|-------------------------|
| Cell area       | $2.5F^{2}$                       | 6 <i>F</i> <sup>2</sup> | 5-8 <i>F</i> <sup>2</sup> | $20-40F^2$                                          | <b>8</b> <i>F</i> <sup>2</sup> | 6 <i>F</i> <sup>2</sup> |
| Program voltage | 18-20 V                          | 3 V                     | 0.5 V                     | 1.8 V                                               | $\sim 2 V$                     | 1.5 V                   |
| Program time    | > 10 µs                          | 50 ns                   | 5 ns                      | 100 ns                                              | < 10 ns                        | < 10 ns                 |
| Program current | n/a                              | 100 µA                  | 0.4 μA                    | 100 µA                                              | zero                           | n/a                     |
| Program energy  | > 1fJ                            | 2 pJ                    | 1 fJ                      | 4 pJ                                                | ~ 50 aJ                        | 2 fJ                    |
| Read voltage    | 0.1-0.5 V                        | 3 V                     | 0.2 V                     | 0.5 V                                               | < 0.1 V                        | 1.5 V                   |
| Read time       | 15-50 µs                         | 60 ns                   | 10 ns                     | 10-20 ns                                            | < 0.1 ns                       | < 10 ns                 |
| Endurance       | 10 <sup>4</sup> -10 <sup>5</sup> | 10 <sup>15</sup>        | 10 <sup>16</sup>          | 2×10 <sup>12</sup> @10ns<br>2×10 <sup>6</sup> @10ms | >10 <sup>16</sup>              | N/A                     |

 NV-NEMory technology offers much lower programming energy per bit and fast read access time as compared with other NVM technologies.