CryptArchi 2009

Abstracts

Implementing SHA1 and SHA2 Standards on the Eve of SHA3 Competition

Marcin Rogawski, Xin Xin, Ekawat Homsirikamol, David Hwang, and Kris Gaj

In spite of recent attacks against SHA-1, which have put in doubt the security of the entire line of NSA-developed SHA standards, SHA-1 and SHA-2 are likely to remain the most widely deployed cryptographic hash functions for at least several next years.

Starting in October 2009, 51 candidate algorithms are competing for the right to replace SHA families as a next American, and a de-facto world-wide standard. As already clearly demonstrated during the contest for Advanced Encryption Standard (AES) in 1997-2000, hardware efficiency of competing algorithms is likely to be one of the decisive factors in choosing a winner, especially in case of inconclusive results of security and software efficiency evaluations.

In this talk we present the results of our recent project on comprehensive evaluation of hash functions SHA-1, SHA-256, and SHA-512 using reconfigurable hardware. Each function has been implemented using four different architectures reported earlier in the literature: basic iterative (with possible rescheduling ad precomputations), quasi-pipelined, unrolled, and quasi-pipelined unrolled. In case of unrolled architectures, several different unrolling factors were used. The efficiency of the optimization techniques, such as unrolling and quasi-pipelining, as a function of the implemented algorithm, is defined, measured, and carefully analyzed.

All architectures have been implemented using three families of Xilinx FPGAs used in the previous studies, Virtex, Virtex II Pro, and Virtex 5. Our results for the basic iterative architecture match the best results reported by Chaves et al. at CHES 2006 and in the IEEE TVLSI in August 2008. At the same time, our results for the quasi- pipelined architecture show improvements, in terms of the throughput, over the iterative architecture used by Chaves, in the range from 8% to 15% for SHA-1, from -1% to 22% for SHA-256, and from 3% to 31% for SHA-256, with the exact value of improvement dependent on the family of Xilinx FPGAs used in comparison. Quasi- pipelining and unrolling have been combined in the hybrid architecture for the total gain in the throughput over the Chaves architecture of up to 66% for SHA-256 and 92% for SHA-512. All our architectures assume processing of only one stream of data (a single message) at a time.

We believe that our implementations can serve as reference implementations for comparing efficiency of new SHA-3 candidates vs. efficiency of existing SHA standards. Our analysis provides guidelines to the implementers of SHA-3 algorithms regarding which possible hardware architectures should be analyzed, implemented, and reported in case of each SHA-3 algorithm.