SHA-3
SHA-3 Secure Hash Crypto Engine

The SHA-3 is a high-throughput, area-efficient hardware accelerator for the SHA-3 cryptographic hashing functions, compliant to NIST’s FIPS 180-4 and FIPS 202 standards.
The accelerator core requires no assistance from a host processor and uses standard AMBA® AXI4-Stream interfaces for input and output data. An AXI4-Stream to AXI4 Memory Mapped bridge, with or without DMA capabilities, can be used with the core and is separately available from CAST. A single instance of the core implements all fixed-length and extendable-output hash functions. The cryptographic function, the length of the extendable output function (up to 2GB) is chosen at run time via AXI4-Stream side-band signals and can be different for every input message.
The SHA-3 core is also highly configurable at synthesis time, to ease integration in systems with different requirements. The data-bus width of the input and output interfaces is configurable at synthesis time. The number of SHA-3 permutation rounds per clock cycle is also configurable at synthesis time, allowing users to trade throughput for silicon resources. Under its minimum configuration of one permutation per cycle, the core processes 50 bits per cycle depending on the hashing function. Its throughput can scale by implementing 2, 3, or 4 permutations per cycle respectively, enabling throughputs in excess of 100Gbps in modern ASIC technologies.
The core is designed for ease of use and integration and adheres to industry-best coding and verification practices. Technology mapping, and timing closure are trouble-free, as the core contains no multi-cycle or false paths, and uses only ris-ing-edge-triggered D-type flip-flops, no tri-states, and a single-clock/reset domain.

Applications

The SHA-3 IP core can be used to ensure data integrity and/or verify authentication in a wide range of applications including IP-sec and TLS/SSL protocol engines, secure boot engines, encrypted data storage, e-commerce, and financial transaction systems.

Sample implementation results for a limited set of the SHA3 core configurations are provided in the following table.

Target Technology	Configuration			Logic Resources	Memory Resources	Freq. (MHz)
Target Technology	Input / Output Bitwidth	Rounds per Cycle	Number of Buffers	Logic Resources	Memory Resources	Freq. (MHz)
TSMC 7nm	32	1	0	32,803 Gates	–	1,600
TSMC 7nm	64	1	1	48,665 Gates	–	1,700
TSMC 7nm	64	1	2	60,598 Gates	–	1,700
TSMC 7nm	128	2	2	97,787 Gates	–	1,300
TSMC 7nm	512	4	2	149,687 Gates	–	700

Note that these sample implementation figures do not represent the highest speed or smallest area possible for the core.

The SHA-3 IP core can be implemented in any Altera FPGA provided sufficient resources are available. Sample implementation results for a limited set of configurations are provided in the following table. Please, note that the list of configurations is not exhaustive and that the indicated clock frequency is not the highest possible.;

FPGA Device	Configuration			Logic Resources	Memory Resources	Freq. (MHz)
FPGA Device	Input / Output Bitwidth	Rounds per Cycle	Number of Buffers	Logic Resources	Memory Resources	Freq. (MHz)
Cyclone V (-7)	32	1	0	3,291 ALMs	–	100
Arria 10 GX (-1)	64	1	1	3,852 ALMs	–	300
Ailex (-1)	64	1	1	4,087 ALMs	–	500
Ailex (-1)	128	2	2	6,357 ALMs	–	250
Ailex (-1)	512	4	2	14,218 ALMs	–	125

The SHA-3 IP core can be implemented in any AMD FPGA provided sufficient resources are available. Sample implementation results for a limited set of configurations are provided in the following table. Please, note that the list of configurations is not exhaustive and that the indicated clock frequency is not the highest possible.

FPGA Device	Configuration			Logic Resources	Memory Resources	Freq. (MHz)
FPGA Device	Input / Output Bitwidth	Rounds per Cycle	Number of Buffers	Logic Resources	Memory Resources	Freq. (MHz)
Spartan-7 (-1)	64	1	1	4,767 LUTs	–	125
Artix-7 (-1)	64	1	1	4,772 LUTs	–	150
Kintex Ultrascale+ (-1)	64	1	1	4,808 LUTs	–	375
Kintex Ultrascale+ (-1)	128	2	2	10,340 LUTs	–	200
Kintex Ultrascale+ (-1)	512	4	2	20,672 LUTs	–	100

The SHA-3 IP core can be implemented in any Microchip FPGA provided sufficient resources are available. Sample implementation results for a limited set of configurations are provided in the following table. Please, note that the list of configurations is not exhaustive and that the indicated clock frequency is not the highest possible.

FPGA Device	Configuration			Logic Resources	Memory Resources	Freq. (MHz)
FPGA Device	Input / Output Bitwidth	Rounds per Cycle	Number of Buffers	Logic Resources	Memory Resources	Freq. (MHz)
RTG4 rt4g150-std	32	1	0	7,162 4LUT	–	85
RTG4 rt4g150-std	64	1	1	9,302 4LUT	–	85
RTG4 rt4g150-std	64	1	2	9,687 4LUT	–	80
RTG4 rt4g150-std	128	2	2	15,185 4LUT	–	65

Standards Support

FIPS 202: SHA-3 - Permutation-Based Hash and Extendable-Output Function
FIPS 180-4: Secure Hash Functions (limited to SHA-3 use)
All four fixed-length SHA-3 Hash Functions:
- SHA3-224
- SHA3-256
- SHA3-384
- SHA3-512
Both SHA-3 Extendable Output Functions (XOF):
- SHAKE-128
- SHAKE-256
NIST-Validated

Performance

User-selectable (1 to 4) permutation rounds per clock cycle, resulting in a throughput of:
- Up to 50 Mbits/MHz for one permutation per cycle
- Up to 150 Mbits/MHz for four permutations per cycle
Intelligent buffers management optionally allows receiving new input while processing the previous message
Optional dynamic control of the number of permutation rounds