|
|
Implementation and Comparative Analysis of Selected Modern Hardware Architectures for Montgomery Multiplication
K. Gaj, R. Sumner, M. Huang
|
|
This presentation focuses on comparative analysis of several classical and modern hardware architectures for
Montgomery Multiplication, A method for performing a fair comparison among the competing designs has been
developed.
One of the first scalable hardware architectures for the Montgomery Multiplication algorithm was proposed by
Tenca and Koç back in 1999. The two more recent architectures, proposed by Harris and Huang respectively, both
seek to reduce the latency of this architecture by reducing the total number of clock cycles required to complete
the multiplication. Other designs proposed by McIvor et al. take a different approach based on performing
partial operations on full size operands in the carry save form..
In this project, a fair and balanced evaluation method targeting these Montgomery Multiplication algorithms has
been developed to impartially rank several known designs in terms of area (in CLB slices), clock period and
overall latency. Each design was implemented on the same Xilinx Virtex 2 family of FPGAs to maintain a
controlled test destination for each algorithm. All architectures have been evaluated based on maximum clock
frequency, overall area (in terms of CLB slices) and the total latency, in order to highlight the strengths and
weaknesses of each solution. |
|