CryptArchi 2008

Abstracts

Implementation and Comparative Analysis of Selected Modern Hardware Architectures for Montgomery Multiplication

K. Gaj, R. Sumner, M. Huang

This presentation focuses on comparative analysis of several classical and modern hardware architectures for Montgomery Multiplication, A method for performing a fair comparison among the competing designs has been developed. One of the first scalable hardware architectures for the Montgomery Multiplication algorithm was proposed by Tenca and Koį back in 1999. The two more recent architectures, proposed by Harris and Huang respectively, both seek to reduce the latency of this architecture by reducing the total number of clock cycles required to complete the multiplication. Other designs proposed by McIvor et al. take a different approach based on performing partial operations on full size operands in the carry save form.. In this project, a fair and balanced evaluation method targeting these Montgomery Multiplication algorithms has been developed to impartially rank several known designs in terms of area (in CLB slices), clock period and overall latency. Each design was implemented on the same Xilinx Virtex 2 family of FPGAs to maintain a controlled test destination for each algorithm. All architectures have been evaluated based on maximum clock frequency, overall area (in terms of CLB slices) and the total latency, in order to highlight the strengths and weaknesses of each solution.