Benchmarking DNN Processors

In order to enable comparison, we recommend designs report benchmarking metrics for widely used state-of-the-art DNNs (e.g. AlexNet, VGG, GoogLeNet, ResNet) with input from well known datasets such as ImageNet. We aim to summarize the results on this website.

DNN models can be downloaded here.

Please submit benchmarking metrics using this form.

Explanation of Metrics

Measure energy and off-chip (e.g., DRAM) access relative to number of non-zero MACs and bit-width of MACs

Account for impact of sparsity in weights and activations
To compute the off-chip access, assume the DNN processor is a stand-alone chip. The off-chip access should account for all accesses needed to complete all the layers listed including initial inputs and final outputs from an off-chip device (e.g., DRAM). The goal is to compare the off-chip access at steady state, so accesses during ramp-up/ramp do not need to be included (e.g. loading configuration parameters, or loading weights *if* all weights can be stored on chip).

Energy Efficiency of Design

pJ/non-zero MAC

External Memory Bandwidth

Off-chip access (in Bytes)/non-zero MAC

Area Efficiency

Total chip mm²/multiplier and storage capacity/multiplier
Accounts for on-chip memory

More details at Tutorial on Hardware Architectures for DNN

Summary

Note: All energy and off-chip access values are normalized relative to the number of non-zero multiply-and-accumulates (MAC).

Processor Specifications

Name [Publication]	Process Technology	Power Supply Voltage	Clock Frequency (MHz)	Number of multipliers	Peak Performance (GMACs/sec)	Total Core area / Total number of multipliers (mm²)	Total On-Chip memory / Total number of multipliers (kB)	Measured or Simulated
Eyeriss [ISSCC 2016]	65nm	1.0	200	168 (16-bit)	33.6	0.073	1.14	Measured
KU Leuven [VLSI 2016]	40nm	0.85 - 0.9	204	256 (16-bit)	52.2	0.0094	0.58	Measured
Envision [ISSCC 2017]	28nm	0.65 - 1.0	200	256* (16-bit) *changes with bitwidth	52.2	0.0074	0.58	Measured
EIE [ISCA 2016]	45nm	1.0	800	64 (16-bit)	51.2	0.638	162	Simulated (PnR)
…	…	…	…	…	…	…	…	…

AlexNet

Name [Publication]	Dense/Sparse	Supported Layers	Batch Size	Bits per Weight	Bits per Input Activation	Chip Power (mW)	Chip Energy per non-zero MAC (pJ)	Run Time (ms)	Multiplier Utilization vs Peak (%)	Off-chip accesses per non-zero MAC (Bytes)
Eyeriss [ISSCC 2016]	Dense	CONV [all]	4	16	16	278	21.7	115.3	41	0.010
KU Leuven [VLSI 2016]	Dense [WACV 2016]	CONV [all]	1	7,7,8,9,9	4,7,9,8,8	78	10.7	21	14	0.066
Envision [ISSCC 2017]	Dense [WACV 2016]	CONV [all]	1	7,7,8,9,9	4,7,9,8,8	44	6.0	21	14	0.055
EIE [ISCA 2016]	Sparse [ICLR 2016]	FC [all]	1	16	16	579	14.5	0.05	76	0.009
…	…	…	…	…	…	…	…	…	…	…

VGG-16

Name [Publication]	Dense/Sparse	Supported Layers	Batch Size	Bits per Weight	Bits per Input Activation	Chip Power (mW)	Chip Energy per non-zero MAC (pJ)	Run Time (ms)	Multiplier Utilization vs Peak (%)	Off-chip accesses per non-zero MAC (Bytes)
Eyeriss [ISSCC 2016]	Dense	CONV [all]	3	16	16	236	52.0	4309.4	13	0.016
Envision [ISSCC 2017]	Dense [WACV 2016]	CONV [all]	1	5	4 (first), 6 (other layers)	26	4.4	596.5	12	0.028
EIE [ISCA 2016]	Sparse [ICLR 2016]	FC [all]	1	16	16	610	22.6	0.05	49	0.036
…	…	…	…	…	…	…	…	…	…	…

Detailed summary of results here.

Feedback and questions are welcome at eyeriss at mit dot edu