TY - JOUR
T1 - 50-GFLOPS Floating-Point Adder and Multiplier Using Gate-Level-Pipelined Single-Flux-Quantum Logic with Frequency-Increased Clock Distribution
AU - Nagaoka, Ikki
AU - Kashima, Ryota
AU - Tanaka, Masamitsu
AU - Kawakami, Satoshi
AU - Tanimoto, Teruo
AU - Yamashita, Taro
AU - Inoue, Koji
AU - Fujimaki, Akira
N1 - Publisher Copyright:
© 2002-2011 IEEE.
PY - 2023/6/1
Y1 - 2023/6/1
N2 - We demonstrate the functioning of a high-throughput, gate-level-pipelined floating-point adder and multiplier over 50 GHz. The gate-level-pipelined floating-point adder and multiplier requires dedicated circuit blocks to wait until other circuit blocks complete calculations because of the dependence between their sign, exponent, and significand parts. We revealed that the resultant delay difference of the waiting circuit blocks hinders high-frequency operation if the predesigned circuit blocks with the fixed clock distribution are connected in a simple manner. We showed that clock distribution needs to synchronize with every pipeline stage regardless of the circuit blocks to minimize the delay difference between the circuit blocks for circuits containing the waiting circuit blocks (e.g., the floating-point adder and multiplier). We designed a 5-bit floating-point adder and multiplier to demonstrate the effectiveness of the clock distribution experimentally. The test chips were fabricated using AIST 10-kA/cm$\boldsymbol{^{2}}$ Advanced Process 2. We verified the high-speed operation at over 50 GHz in the floating-point adder and multiplier. The maximum clock frequency and throughput of the floating-point adder were 56 GHz and 56 GFLOPS, respectively. The corresponding values for the floating-point multiplier were 63 GHz and 63 GFLOPS, respectively.
AB - We demonstrate the functioning of a high-throughput, gate-level-pipelined floating-point adder and multiplier over 50 GHz. The gate-level-pipelined floating-point adder and multiplier requires dedicated circuit blocks to wait until other circuit blocks complete calculations because of the dependence between their sign, exponent, and significand parts. We revealed that the resultant delay difference of the waiting circuit blocks hinders high-frequency operation if the predesigned circuit blocks with the fixed clock distribution are connected in a simple manner. We showed that clock distribution needs to synchronize with every pipeline stage regardless of the circuit blocks to minimize the delay difference between the circuit blocks for circuits containing the waiting circuit blocks (e.g., the floating-point adder and multiplier). We designed a 5-bit floating-point adder and multiplier to demonstrate the effectiveness of the clock distribution experimentally. The test chips were fabricated using AIST 10-kA/cm$\boldsymbol{^{2}}$ Advanced Process 2. We verified the high-speed operation at over 50 GHz in the floating-point adder and multiplier. The maximum clock frequency and throughput of the floating-point adder were 56 GHz and 56 GFLOPS, respectively. The corresponding values for the floating-point multiplier were 63 GHz and 63 GFLOPS, respectively.
UR - http://www.scopus.com/inward/record.url?scp=85149467963&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85149467963&partnerID=8YFLogxK
U2 - 10.1109/TASC.2023.3250614
DO - 10.1109/TASC.2023.3250614
M3 - Article
AN - SCOPUS:85149467963
SN - 1051-8223
VL - 33
JO - IEEE Transactions on Applied Superconductivity
JF - IEEE Transactions on Applied Superconductivity
IS - 4
M1 - 1302711
ER -