Silicon IP Cores
SHA-3
SHA-3 Secure Hash Crypto Engine
The SHA-3 is a high-throughput, area-efficient hardware accelerator for the SHA-3 cryptographic hashing functions, compliant to NIST’s FIPS 180-4 and FIPS 202 standards.
The accelerator core requires no assistance from a host processor and uses standard AMBA® AXI4-Stream interfaces for input and output data. An AXI4-Stream to AXI4 Memory Mapped bridge, with or without DMA capabilities, can be used with the core and is separately available from CAST. A single instance of the core implements all fixed-length and extendable-output hash functions. The cryptographic function, the length of the extendable output function (up to 2GB) is chosen at run time via AXI4-Stream side-band signals and can be different for every input message.
The SHA-3 core is also highly configurable at synthesis time, to ease integration in systems with different requirements. The data-bus width of the input and output interfaces is configurable at synthesis time. The number of SHA-3 permutation rounds per clock cycle is also configurable at synthesis time, allowing users to trade throughput for silicon resources. Under its minimum configuration of one permutation per cycle, the core processes 50 bits per cycle depending on the hashing function. Its throughput can scale by implementing 2, 3, or 4 permutations per cycle respectively, enabling throughputs in excess of 100Gbps in modern ASIC technologies.
The core is designed for ease of use and integration and adheres to industry-best coding and verification practices. Technology mapping, and timing closure are trouble-free, as the core contains no multi-cycle or false paths, and uses only ris-ing-edge-triggered D-type flip-flops, no tri-states, and a single-clock/reset domain.
Applications
The SHA-3 IP core can be used to ensure data integrity and/or verify authentication in a wide range of applications including IP-sec and TLS/SSL protocol engines, secure boot engines, encrypted data storage, e-commerce, and financial transaction systems.
Sample implementation results for a limited set of the SHA3 core configurations are provided in the following table.
Target Technology |
Configuration | Logic Resources |
Memory Resources |
Freq. (MHz) |
||
---|---|---|---|---|---|---|
Input / Output Bitwidth |
Rounds per Cycle |
Number of Buffers |
||||
TSMC 7nm | 32 | 1 | 0 | 32,803 Gates | – | 1,600 |
TSMC 7nm | 64 | 1 | 1 | 48,665 Gates | – | 1,700 |
TSMC 7nm | 64 | 1 | 2 | 60,598 Gates | – | 1,700 |
TSMC 7nm | 128 | 2 | 2 | 97,787 Gates | – | 1,300 |
TSMC 7nm | 512 | 4 | 2 | 149,687 Gates | – | 700 |
Note that these sample implementation figures do not represent the highest speed or smallest area possible for the core.
The SHA-3 IP core can be implemented in any Altera FPGA provided sufficient resources are available. Sample implementation results for a limited set of configurations are provided in the following table. Please, note that the list of configurations is not exhaustive and that the indicated clock frequency is not the highest possible.;
FPGA Device | Configuration | Logic Resources |
Memory Resources |
Freq. (MHz) |
||
---|---|---|---|---|---|---|
Input / Output Bitwidth |
Rounds per Cycle |
Number of Buffers |
||||
Cyclone V (-7) | 32 | 1 | 0 | 3,291 ALMs | – | 100 |
Arria 10 GX (-1) | 64 | 1 | 1 | 3,852 ALMs | – | 300 |
Ailex (-1) | 64 | 1 | 1 | 4,087 ALMs | – | 500 |
Ailex (-1) | 128 | 2 | 2 | 6,357 ALMs | – | 250 |
Ailex (-1) | 512 | 4 | 2 | 14,218 ALMs | – | 125 |
The SHA-3 IP core can be implemented in any AMD FPGA provided sufficient resources are available. Sample implementation results for a limited set of configurations are provided in the following table. Please, note that the list of configurations is not exhaustive and that the indicated clock frequency is not the highest possible.
FPGA Device | Configuration | Logic Resources |
Memory Resources |
Freq. (MHz) |
||
---|---|---|---|---|---|---|
Input / Output Bitwidth |
Rounds per Cycle |
Number of Buffers |
||||
Spartan-7 (-1) | 64 | 1 | 1 | 4,767 LUTs | – | 125 |
Artix-7 (-1) | 64 | 1 | 1 | 4,772 LUTs | – | 150 |
Kintex Ultrascale+ (-1) | 64 | 1 | 1 | 4,808 LUTs | – | 375 |
Kintex Ultrascale+ (-1) | 128 | 2 | 2 | 10,340 LUTs | – | 200 |
Kintex Ultrascale+ (-1) | 512 | 4 | 2 | 20,672 LUTs | – | 100 |
The SHA-3 IP core can be implemented in any Microchip FPGA provided sufficient resources are available. Sample implementation results for a limited set of configurations are provided in the following table. Please, note that the list of configurations is not exhaustive and that the indicated clock frequency is not the highest possible.
FPGA Device | Configuration | Logic Resources |
Memory Resources |
Freq. (MHz) |
||
---|---|---|---|---|---|---|
Input / Output Bitwidth |
Rounds per Cycle |
Number of Buffers |
||||
RTG4 rt4g150-std | 32 | 1 | 0 | 7,162 4LUT | – | 85 |
RTG4 rt4g150-std | 64 | 1 | 1 | 9,302 4LUT | – | 85 |
RTG4 rt4g150-std | 64 | 1 | 2 | 9,687 4LUT | – | 80 |
RTG4 rt4g150-std | 128 | 2 | 2 | 15,185 4LUT | – | 65 |
Engineered by Beyond Semiconductor.
Features List
Standards Support
- FIPS 202: SHA-3 - Permutation-Based Hash and Extendable-Output Function
- FIPS 180-4: Secure Hash Functions (limited to SHA-3 use)
- All four fixed-length SHA-3 Hash Functions:
- SHA3-224
- SHA3-256
- SHA3-384
- SHA3-512
- Both SHA-3 Extendable Output Functions (XOF):
- SHAKE-128
- SHAKE-256
- NIST-Validated
Performance
- User-selectable (1 to 4) permutation rounds per clock cycle, resulting in a throughput of:
- Up to 50 Mbits/MHz for one permutation per cycle
- Up to 150 Mbits/MHz for four permutations per cycle
- Intelligent buffers management optionally allows receiving new input while processing the previous message
- Optional dynamic control of the number of permutation rounds
Interfaces
- AMBA® AXI4-Stream
Fully autonomous operation
- Requires no assistance from the host processor
- Automatic padding insertion
Configuration Options
- Hashing function (bit-rate, capacity, number of permutation rounds)
- Input & output bus bit-width
- Number of input buffers
- Number of permutations per cycle
- Enable/disable dynamic control of permutation rounds
Deliverables
- Verilog RTL source code or targeted FPGA netlist
- Integration Test-Bench
- Simulation & synthesis scripts
- Bit Accurate C Model
- User documentation