

# Reducing the Silicon Area Overhead of Counter-Based Rowhammer Mitigations

Loïc FRANCE, Florent BRUGUIER, David NOVO, Maria MUSHTAQ and Pascal BENOIT





### DRAM architecture



### DRAM architecture



### Rowhammer



### Rowhammer





[1] Lipp, Moritz, et al. "Nethammer: Inducing rowhammer faults through network requests." *EuroS&PW*, 2020.

[2] Kwong, Andrew, et al. "Rambleed: Reading bits in memory without accessing them." SP, 2020.

[3] Seaborn, Mark, and Thomas Dullien. "Exploiting the DRAM rowhammer bug to gain kernel privileges." *Black Hat*, 2015.

[4] Gruss, Daniel, Clémentine Maurice, and Stefan Mangard. "Rowhammer. js: A remote software-induced fault attack in javascript." DIMVA, 2016.

[5] de Ridder, Finn, et al. "SMASH: Synchronized Many-sided Rowhammer Attacks from JavaScript." USENIX Security, 2021.



[1] Lipp, Moritz, et al. "Nethammer: Inducing rowhammer faults through network requests." *EuroS&PW*, 2020.

[2] Kwong, Andrew, et al. "Rambleed: Reading bits in memory without accessing them." SP, 2020.

[3] Seaborn, Mark, and Thomas Dullien. "Exploiting the DRAM rowhammer bug to gain kernel privileges." *Black Hat*, 2015.

[4] Gruss, Daniel, Clémentine Maurice, and Stefan Mangard. "Rowhammer. js: A remote software-induced fault attack in javascript." DIMVA, 2016.

[5] de Ridder, Finn, et al. "SMASH: Synchronized Many-sided Rowhammer Attacks from JavaScript." USENIX Security, 2021.

### Rowhammer mitigation

#### In software:

- Isolate sensitive data from unsafe programs in the memory [5]
- Read performance counters to detect attack signatures and stop processes
   [6]

<u>In hardware:</u>

- Randomly refresh neighbors of activated rows [1]
- Detect most activated rows with counters [2][3][4]

Low performance cost, Limited modularity, Silicon area overhead

Highly modular, High performance cost

[1] Kim, Yoongu, et al. "Flipping bits in memory without accessing them: An experimental study of DRAM disturbance errors." *ISCA*, 2014.
[2] Park, Yeonhong, et al. "Graphene: Strong yet lightweight row hammer protection." *MICRO*, 2020.
[3] Yağlikçi, A. Giray, et al. "Blockhammer: Preventing rowhammer at low cost by blacklisting rapidly-accessed dram rows." *HPCA*, 2021.
[4] Lee, Eojin, et al. "TWiCe: Time window counter based row refresh to prevent row-hammering." *CAL*, 2017.
[5] Konoth, Radhesh Krishnan, et al. "ZebRAM: Comprehensive and Compatible Software Protection Against Rowhammer Attacks." *OSDI*, 2018.
[6] Alam, Manaar, et al. "Performance counters to rescue: A machine learning based safeguard against micro-architectural side-channel-attacks." *Cryptology ePrint Archive* (2017).

### Rowhammer mitigation

#### In software:

- Isolate sensitive data from unsafe programs in the memory [5]
- Read performance counters to detect attack signatures and stop processes
   [6]

In hardware:

- Randomly refresh neighbors of activated rows [1]
- Detect most activated rows with counters [2][3][4]

Low performance cost, Limited modularity, Silicon area overhead

Highly modular, High performance cost

[1] Kim, Yoongu, et al. "Flipping bits in memory without accessing them: An experimental study of DRAM disturbance errors." ISCA, 2014.

[2] Park, Yeonhong, et al. "Graphene: Strong yet lightweight row hammer protection." MICRO, 2020.

[3] Yağlikçi, A. Giray, et al. "Blockhammer: Preventing rowhammer at low cost by blacklisting rapidly-accessed dram rows." HPCA, 2021.

[4] Lee, Eojin, et al. "TWiCe: Time window counter based row refresh to prevent row-hammering." CAL, 2017.

[5] Konoth, Radhesh Krishnan, et al. "ZebRAM: Comprehensive and Compatible Software Protection Against Rowhammer Attacks.« OSDI, 2018.

[6] Alam, Manaar, et al. "Performance counters to rescue: A machine learning based safeguard against micro-architectural side-channel-attacks." *Cryptology ePrint Archive* (2017).

## Hardware mitigations principle





## Counter-based hardware mitigations principle



| Row       | Count |
|-----------|-------|
| 0x1010    | 5     |
| 0x2020    | 7     |
| 0x3030    | 3     |
| Spillover | 2     |















### Misra-Gries Algorithm:



$$N_{entry} = \left[\frac{W}{T_{RH} \div 4}\right]$$

W: maximum number of ACT during  $t_{REFW}$ ;  $T_{RH}$ : Rowhammer corruption threshold.

### BlockHammer

### Counting Bloom Filter (CBF):





### BlockHammer

Counting Bloom Filter (CBF):



$$P_{FP} = \left(1 - \sum_{l < N_{BL}} {\binom{kW}{l}} \left(\frac{1}{m}\right)^l \left(1 - \frac{1}{m}\right)^{kW-l}\right)^k$$
$$P_{FP} \propto \frac{W}{m} \implies m \propto \frac{W}{P_{FP}} (k \text{ const.})$$

W: maximum number of ACT during t<sub>REFW</sub>;
N<sub>BL</sub>: Rowhammer detection threshold;
k: number of hash functions
m: number of counters;



Graphene and BlockHammer: Bank-level implementation, size  $= K \times W$ 



### How many counters ?

Graphene and BlockHammer: Bank-level implementation, size  $= K \times W$ 



W: maximum number of ACTs during  $t_{REFW}$  at bank-level  $W_R$ : maximum number of ACTs during  $t_{REFW}$  at rank-level =  $16 \times W$ ?

## Memory bandwidth at different levels

| $t_{RC}$          | Same-bank ACT interval   | 45.8ns |
|-------------------|--------------------------|--------|
| t <sub>REFW</sub> | Refresh cycle duration   | 64ms   |
| t <sub>REFI</sub> | Refresh interval         | 7.8µs  |
| t <sub>RFC</sub>  | Refresh command duration | 350ns  |



#### Bank level:

$$W = \left[\frac{t_{REFW}\left(1 - \frac{t_{RFC}}{t_{REFI}}\right)}{t_{RC}}\right] \approx 1,33M$$

## Memory bandwidth at different levels

| $t_{RC}$          | Same-bank ACT interval   | 45.8ns  |
|-------------------|--------------------------|---------|
| t <sub>REFW</sub> | Refresh cycle duration   | 64ms    |
| t <sub>REFI</sub> | Refresh interval         | 7.8µs   |
| t <sub>RFC</sub>  | Refresh command duration | 350ns   |
| $t_{FAW}$         | Four-activate window     | 21.67ns |



Bank level:

$$W = \left[\frac{t_{REFW}\left(1 - \frac{t_{RFC}}{t_{REFI}}\right)}{t_{RC}}\right] \approx 1,33M$$

Rank level:

$$W_{R} = \left[\frac{t_{REFW}\left(1 - \frac{t_{RFC}}{t_{REFI}}\right)}{t_{FAW} \div 4}\right] \approx 11,3M \neq 16 \times W$$

### Reduction in considered ACTs

| Bank-level implementation | Bank-level implementation | Bank-level implementation    | Bank-level implementation |
|---------------------------|---------------------------|------------------------------|---------------------------|
| Bank-level implementation | Bank-level implementation | Bank-level implementation    | Bank-level implementation |
| Bank-level implementation | Bank-level implementation | Bank-level<br>implementation | Bank-level implementation |
| Bank-level implementation | Bank-level implementation | Bank-level implementation    | Bank-level implementation |

Total considered ACTs :  $16 \times W = 21.28M$ 



Total considered ACTs :  $W_R = 11.3$  M



## Consequences for Graphene & BlockHammer

|             | Bank-level implementation                                                                                    | Rank-level implementation                                                                                       | reduction     |
|-------------|--------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------|---------------|
| Graphene    | 162 entries<br>entry size: 30 bits*.<br>Total size: $16 \times 162 \times 30$ bits = <b>9</b> . <b>61KiB</b> | 1377 entries<br>entry size: 34 bits**<br>Total size: $1377 \times 162 \times 34$ bits = <b>5</b> . <b>79KiB</b> | - <b>40</b> % |
| BlockHammer | 2048 counters<br>13 bits / counter<br>Total size: $16 \times 2048 \times 13$ bits = <b>52KiB</b>             | 16384 counters***<br>13 bits / counter<br>Total size: 16384 × 13 <i>bits</i> = <b>26KiB</b>                     | - <b>50</b> % |

\*: row address – 16 bits, counter – 13 bits, overflow bit – 1 bit \*\*: row address – 20 bits, counter – 13 bits, overflow bit – 1 bit \*\*\*: keeps the same  $P_{FP}$  as for bank-level implementation

### Reduction in storage requirements

|                     | Bank level | Rank level | reduction |
|---------------------|------------|------------|-----------|
| $16 \times W / W_R$ | 21.28M     | 11.3M      | -47%      |
| Graphene            | 9.61KiB    | 5.79KiB    | -40%      |
| BlockHammer         | 52KiB      | 26KiB      | -50%      |

| Bank-level implementation | Bank-level implementation    | Bank-level implementation | Bank-level implementation    |
|---------------------------|------------------------------|---------------------------|------------------------------|
| Bank-level implementation | Bank-level implementation    | Bank-level implementation | Bank-level implementation    |
| Bank-level implementation | Bank-level<br>implementation | Bank-level implementation | Bank-level<br>implementation |
| Bank-level implementation | Bank-level implementation    | Bank-level implementation | Bank-level implementation    |

Total considered ACTs :  $16 \times W = 21.28M$ 

Rank-level implementation

Total considered ACTs :  $W_R = 11.3$  M



-40 - 50% storage

### Reduction in storage requirements – DDR5

|                     | Bank level | Rank level | reduction |
|---------------------|------------|------------|-----------|
| $32 \times W / W_R$ | 21.15M     | 8M         | -62%      |
| Graphene            | 9.38KiB    | 4.05KiB    | -57%      |
| BlockHammer         | 52KiB      | 19.5KiB    | -62.5%    |

| Bank-level implementation | Bank-level implementation | Bank-level implementation    | Bank-level implementation |
|---------------------------|---------------------------|------------------------------|---------------------------|
| Bank-level implementation | Bank-level implementation | Bank-level implementation    | Bank-level implementation |
| Bank-level implementation | Bank-level implementation | Bank-level<br>implementation | Bank-level implementation |
| Bank-level implementation | Bank-level implementation | Bank-level implementation    | Bank-level implementation |

Total considered ACTs :  $32 \times W = 21.15$ M



Total considered ACTs :  $W_R = 8M$ 



### Does it still work as it should ?



France, Loïc, et al. "Implementing Rowhammer Memory Corruption in the gem5 Simulator." 32nd International Workshop on Rapid System Prototyping (RSP). IEEE, 2021.



### Thank you for your attention

