

#### Fully asynchronous QDI implementation of DES in FPGA

#### Jan Bělohoubek, Jan Schmidt

belohja4@fit.cvut.cz, jan.schmidt@fit.cvut.cz Czech Technical University in Prague Faculty of Information Technology

12th CryptArchi Workshop – Annecy 2014 This work was supported by Hlávka Foundation



Aim of the work

Asynchronous design methodology on FPGA

- Explore the FPGA architecture in the consideration of the asynchronous circuit implementation
- Proper SW and HW selection (preferably available in our department)
- Asynchronous DES cipher implementation
- Review DPA resistivity of the proposed design



# Used technologies

FPGA and development tools selection

- Aim use the standard and the most available tools
- VHDL ghdl + gtkWave, ModelSim
- Synthesis ISE WebPack
- Nexys3 Spartan-6
  - 6-LUT
  - size # slices
  - available in DDD FIT CTU





# QDI and C-element

Faculty of Information Technology

QDI is quasi delay insensitive

- QDI is the asynchronous circuits design methodology
  - $\rightarrow\,$  usage of C-elements allows tolerate any delay variations
  - $\rightarrow$  same delays are required only in *isochronic forks*
- C-element
  - $\rightarrow$  the asynchronous *hazard-free* sequentional circuit
  - $\rightarrow$  holds last common value of its inputs
- Traditional C-element implementation and symbol:



#### Note:

Proposed design is not fully QDI by definition.





- C-element and its correct behavior model
- C-element FPGA implementation
- Possible sizes of C-element:
  - 6-LUT
    - $\rightarrow$  5-inputs or 4-inputs and reset
  - 4-LUT
    - $\rightarrow$  3-inputs or 2-inputs and reset





#### Note:

Described C-elements are correctly implemented using the automated synthesis tools.



Faculty of Information Technology



How to implement the *delay line*?

Timing

- difficult on FPGA don't use delay lines, if possible
- if necessary bring the signals out and create the delay line outside the FPGA
- QDI how to reach same delay in isochronic forks?
  use clock-buffers



Faculty of Information Technology

Signalling, essential hazard



- It's difficult to set up delays on the FPGAs
  - $\rightarrow QDI$  is the solution
- Data signalling (*dual-rail logic*) and completion detection circuits
- *QDI* which forks are isochronic?
  - ightarrow forks driving one group of flip-flops



Completion detection

Faculty of Information Technology



a) completion detection sensitive on the slowest  $(\uparrow)$  and the fastest  $(\downarrow)$  signal

**b)** completion detection sensitive on the fastest ( $\uparrow$ ) and the slowest ( $\downarrow$ ) signal

**c)** *DI* completion detection (sensitive on the slowest signal  $\rightarrow$  *delay insensitive*)



#### Faculty of Information Technology





Vulnerability

Faculty of Information Technology

Side channel attacks



Fully asynchronous QDI implementation of DES in FPGA



Vulnerability

Faculty of Information Technology

a: 0 1 0 1 1 . . . . 0  $\uparrow$   $\uparrow$   $\cdot \cdot \cdot \cdot \uparrow$   $d_H(a,b)$ b: 0 1 1 1 0 . . . . 1



???



## Vulnerability

Faculty of Information Technology

Global and local synchronization in the datapath





#### Note:

Global synchronization means hold all outputs in state 00 until all inputs leave 00.





Measurement results - local synchronization of datapaths







Measurement results - local synchronization of datapaths









Measurement results - global synchronization of datapaths





Measurement interpretation

- $\blacksquare$  Inverse and similar vectors has similar power traces  $\rightarrow$  dual-rail logic is automatically mapped relatively symmetrical
- Global synchronization leads to worse correlation (period is the Achilles heel)
  - $\rightarrow\,$  synchronization is derived from the fastest or the slowest signals







- Completion detection reimplementation using C-elements
  - ightarrow robust, but sensitive on the slowest signal
  - $\rightarrow$  larger overhead
- Completion detection reimplementation using
  ⊕ and ⊕ linked together using C-element
  - $\rightarrow\,$  attempt to move closer to the average period in all cases (period will be influenced by the slowest signal from one group and the fastest from the another)
  - $\rightarrow\,$  delay line can mask delay variations (same like in proposed implementation)
- Review the *fault-attack* vulnerability
  - $\rightarrow$  is the natural *QDI* resistivity sufficient?





- Faculty of Information Technology
  - Asynchronous DES successfully implemented on FPGA
  - Automated synthesis tools can be used when the appropriate methods are followed
  - If two power traces are similar, it can't be estimated that the processed data are similar too → we assume failure of trivial attacks, but we cannot exclude the possibility of success when using the advanced DPA techniques
  - Global synchronization of datapaths will probably increase the vulnerability
    - $\rightarrow\,$  vulnerability is caused by the derivation of synchronization pulses from datapaths
    - ⇒ Main vulnerability is in the timing area (of course, timing influences the power traces)