Benchmarking DNN Processors




In order to enable comparison, we recommend designs report benchmarking metrics for widely used state-of-the-art DNNs (e.g. AlexNet, VGG, GoogLeNet, ResNet) with input from well known datasets such as ImageNet. We aim to summarize the results on this website.

DNN models can be downloaded here.

Please submit benchmarking metrics using this form.


Explanation of Metrics

  • Measure energy and off-chip (e.g., DRAM) access relative to number of non-zero MACs and bit-width of MACs
    • Account for impact of sparsity in weights and activations
    • To compute the off-chip access, assume the DNN processor is a stand-alone chip. The off-chip access should account for all accesses needed to complete all the layers listed including initial inputs and final outputs from an off-chip device (e.g., DRAM). The goal is to compare the off-chip access at steady state, so accesses during ramp-up/ramp do not need to be included (e.g. loading configuration parameters, or loading weights *if* all weights can be stored on chip).
  • Energy Efficiency of Design
    • pJ/non-zero MAC
  • External Memory Bandwidth
    • Off-chip access (in Bytes)/non-zero MAC
  • Area Efficiency
    • Total chip mm2/multiplier and storage capacity/multiplier
    • Accounts for on-chip memory
More details at Tutorial on Hardware Architectures for DNN

Summary

Note: All energy and off-chip access values are normalized relative to the number of non-zero multiply-and-accumulates (MAC).

Processor Specifications

                       
Name [Publication]Process TechnologyTotal Core area / Total number of multipliers
(mm2)
Total On-Chip memory / Total number of multipliers
(kB)
Measured or Simulated
Eyeriss
[ISSCC 2016]
65nm LP TSMC (1.0V) 0.073 1.14 Measured
EIE
[ISCA 2016]
45nm (1.0V) 0.638 162 Simulated (PnR)

AlexNet

                                        
Name [Publication]Dense/SparseSupported LayersBatch SizeBits per WeightBits per Input ActivationChip Energy per non-zero MAC
(pJ)
Off-chip accesses per non-zero MAC (Bytes)Run Time
(ms)
Chip Power
(mW)
Eyeriss
[ISSCC 2016]
Dense CONV1, CONV2, CONV3, CONV4, CONV5 4 16 16 21.7 0.01 115.3 278
EIE
[ISCA 2016]
Sparse
[ICLR 2016]
FC1, FC2, FC3 1 16 16 14.5 0.0093 0.05 579

VGG-16

                                        
Name [Publication]Dense/SparseSupported LayersBatch SizeBits per WeightBits per Input ActivationChip Energy per non-zero MAC
(pJ)
Off-chip accesses per non-zero MAC (Bytes)Run Time
(ms)
Chip Power
(mW)
Eyeriss
[ISSCC 2016]
Dense CONV1-1, CONV1-2, CONV2-1, CONV2-2, CONV3-1, CONV3-2, CONV3-3, CONV4-1, CONV4-2, CONV4-3, CONV5-1. CONV5-2, CONV5-3 3 16 16 52.0 0.016 4309.4 236
EIE
[ISCA 2016]
Sparse
[ICLR 2016]
FC1, FC2, FC3 1 16 16 22.6 0.0359 0.05 610


Detailed summary of results here.


Feedback and questions are welcome at eyeriss at mit dot edu