TY - GEN
T1 - SuperNPU
T2 - 53rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2020
AU - Ishida, Koki
AU - Byun, Ilkwon
AU - Nagaoka, Ikki
AU - Fukumitsu, Kosuke
AU - Tanaka, Masamitsu
AU - Kawakami, Satoshi
AU - Tanimoto, Teruo
AU - Ono, Takatsugu
AU - Kim, Jangwoo
AU - Inoue, Koji
N1 - Funding Information:
This work was supported by JST-Mirai Program Grant Number JPMJMI18E1, JSPS KAKENHI Grant Numbers JP19H01105, JP18H05211, JP18J21274. The circuit is designed with the support by VDEC of the University of Tokyo in collaboration with Cadence Design Systems, Inc., and fabricated in the CRAVITY of AIST. We also appreciate the support from National Research Foundation of Korea (NRF) grant funded by the Korean Government (NRF-2019R1A5A1027055, NRF-2020M3H6A1084857).
PY - 2020/10
Y1 - 2020/10
N2 - Superconductor single-flux-quantum (SFQ) logic family has been recognized as a highly promising solution for the post-Moore's era, thanks to its ultra-fast and low-power switching characteristics. Therefore, researchers have made a tremendous amount of effort in various aspects to promote the technology and automate its circuit design process (e.g., low-cost fabrication, design tool development). However, there has been no progress in designing a convincing SFQ-based architectural unit due to the architects' lack of understanding of the technology's potentials and limitations at the architecture level.In this paper, we present how to architect an SFQ-based architectural unit by providing design principles with an extreme-performance neural processing unit (NPU). To achieve the goal, we first implement an architecture-level simulator to model an SFQ-based NPU accurately. We validate this model using our die-level prototypes, design tools, and logic cell library. This simulator accurately measures the NPU's performance, power consumption, area, and cooling overheads. Next, driven by the modeling, we identify key architectural challenges for designing a performance-effective SFQ-based NPU (e.g., expensive on-chip data movements and buffering). Lastly, we present SuperNPU, our example SFQ-based NPU architecture, which effectively resolves the challenges. Our evaluation shows that the proposed design outperforms a conventional state-of-the-art NPU by 23 times. With free cooling provided as done in quantum computing, the performance per chip power increases up to 490 times. Our methodology can also be applied to other architecture designs with SFQ-friendly characteristics.
AB - Superconductor single-flux-quantum (SFQ) logic family has been recognized as a highly promising solution for the post-Moore's era, thanks to its ultra-fast and low-power switching characteristics. Therefore, researchers have made a tremendous amount of effort in various aspects to promote the technology and automate its circuit design process (e.g., low-cost fabrication, design tool development). However, there has been no progress in designing a convincing SFQ-based architectural unit due to the architects' lack of understanding of the technology's potentials and limitations at the architecture level.In this paper, we present how to architect an SFQ-based architectural unit by providing design principles with an extreme-performance neural processing unit (NPU). To achieve the goal, we first implement an architecture-level simulator to model an SFQ-based NPU accurately. We validate this model using our die-level prototypes, design tools, and logic cell library. This simulator accurately measures the NPU's performance, power consumption, area, and cooling overheads. Next, driven by the modeling, we identify key architectural challenges for designing a performance-effective SFQ-based NPU (e.g., expensive on-chip data movements and buffering). Lastly, we present SuperNPU, our example SFQ-based NPU architecture, which effectively resolves the challenges. Our evaluation shows that the proposed design outperforms a conventional state-of-the-art NPU by 23 times. With free cooling provided as done in quantum computing, the performance per chip power increases up to 490 times. Our methodology can also be applied to other architecture designs with SFQ-friendly characteristics.
UR - http://www.scopus.com/inward/record.url?scp=85097329849&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85097329849&partnerID=8YFLogxK
U2 - 10.1109/MICRO50266.2020.00018
DO - 10.1109/MICRO50266.2020.00018
M3 - Conference contribution
AN - SCOPUS:85097329849
T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
SP - 58
EP - 72
BT - Proceedings - 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2020
PB - IEEE Computer Society
Y2 - 17 October 2020 through 21 October 2020
ER -