Silicon IP Cores
AES-P
Programmable Advanced Encryption Standard Engine
The AES-P encryption IP core implements hardware Rijndael encoding and decoding in compliance with the NIST Advanced Encryption Standard. It processes 128-bit blocks, and is programmable for 128-, 192-, and 256-bit key lengths.
Two architectural versions are available to suit system requirements. The Standard version (AES-P-S) is more compact, using a 32-bit datapath and requiring 44/52/60 clock cycles for each data block (128/192/256-bit cipher key, respectively). The Fast version (AES-P-F) achieves higher throughput, using a 128-bit datapath and requiring 11/13/15 clock cycles for each data block. It can be programmed to use any of the following cipher modes: CBC, CTR, ECB, and OFB.
The core works with a pre-expanded key, or with optional key expansion logic.
The AES-P core is a fully synchronous design and has been evaluated in a variety of technologies, and is available optimized for ASICs or FPGAs.
An AES encryption operation transforms a 128-bit block into a block of the same size. The encryption key can be chosen among three different sizes: 128, 192 or 256 bits. The key is expanded during cryptographic operations.
The AES algorithm consists of a series of steps repeated a number of times (rounds). The number of rounds depends on the size of the key and the data block. The intermediate cipher result is known as state.
KSIZE = 00 | KSIZE = 01 | KSIZE = 10 | |
---|---|---|---|
Rounds | 10 | 12 | 14 |
Initially, the incoming data and the key are added together in the AddRoundKey module. The result is stored in the State Storage area.
The state information is then retrieved and the ByteSub, Shiftrow, MixColumn and AddRoundKey functions are performed on it in the specified order. At the end of each round, the new state is stored in the State Storage area. These operations are repeated according to the number of rounds.
The final round is anomalous as the MixColumn step is skipped. The cipher is output after the final round.
Supported modes
This AES-P core supports both encryption and decryption in ECB, CBC, CFB, OFB and CTR modes. ECB stands for Electronic CodeBook mode. This is the basic AES algorithm as described in the FIPS 197 documentation.
CBC stands for Cipher Block Chaining mode. Chaining adds a feedback mechanism to a block cipher. The result of the previous encryption operation is XORed with the incoming data. An initialization vector IV is used for the first iteration. Decryption reverses encryption operations.
The figure below shows the data flow during encryption (left) and decryption (right) in CBC mode.
CFB stands for Cipher FeedBack mode. In this mode, the output of an encryption operation is fed back to the input of the AES core. An initialization vector IV is used for the first iteration.
Input data is encrypted by XORing it with the output of the encryption module. Decryption reverses encryption operations.
The figure below shows the block diagram of the AES in CFB mode.
OFB stands for Output FeedBack mode. In this mode, the output of an encryption operation is fed back to the input of the AES core. An initialization vector IV is used for the first iteration.
Input data is encrypted by XORing it with the output of the encryption module. Decryption reverses encryption operations.
The figure below shows the block diagram of the AES in OFB mode.
CTR stands for Counter mode. In this mode, the output of counter is input of the AES core. An initialization vector IV is used to initialize the counter.
Input data is encrypted by XORing it with the output of the encryption module. Decryption reverses encryption operations.
The figure below shows the block diagram of the AES in CTR mode.
Key Expansion
The AES algorithm requires an expanded key for encryption or decryption. The KEXP AES key expander core is available as an AES-P core option.
During encryption, the key expander can produce the expanded key on the fly while the AES-P core is consuming it. For decryption, though, the key must be pre-expanded and stored in an appropriate memory before being used by the AES-P core. This is because the core uses the expanded key backwards during decryption.
In some cases a key expander is not required. This might be the case when the key does not need to be changed (and so it can be stored in its expanded form) or when the key does not change very often (and thus it can be expanded more slowly in software).
The core has been verified through extensive synthesis, place and route and simulation runs. It has also been embedded in several products, and is proven both in ASIC and FPGA technologies.
Support
The core as delivered is warranted against defects for ninety days from purchase. Thirty days of phone and email technical support are included, starting with the first interaction. Additional maintenance and support options are available.
Deliverables
The core is available in ASIC (RTL) or FPGA (netlist) forms, and includes everything required for successful implementation. The ASIC version includes
- HDL RTL source
- Sophisticated HDL Testbench (self checking)
- C Model & test vector generator
- Simulation script, vectors & expected results
- Synthesis script
- User documentation
The AES-P can be mapped to any ASIC technology or FPGA device (provided sufficient silicon resources are available). The following are sample ASIC pre-layout results reported from synthesis with a silicon vendor design kit under typical conditions, with all core I/Os assumed to be routed on-chip. The provided figures do not represent the higher speed or smaller area for the core. Please contact CAST to get characterization data for your target configuration and technology.
AES-P Standard Core ASIC Implementation Results
ASIC Technology |
Number of eq. gates |
Fmax (MHz) |
Throughput (Gbps) |
---|---|---|---|
TSMC 16nm
|
9,149
|
500
|
1.455
|
TSMC 28nm HPM
|
9,564
|
500
|
1.455
|
TSMC 40nm G |
12,231
|
500
|
1.455
|
Throughput for a 128-bit key size
AES-P Fast Core ASIC Implementation Results
ASIC Technology |
Number of eq. gates |
Fmax (MHz) |
Throughput (Gbps) |
---|---|---|---|
TSMC 16nm
|
27,598
|
500
|
5.818
|
TSMC 28nm HPM
|
28,313
|
500
|
5.818
|
TSMC 40nm G |
37,075
|
500
|
5.818
|
Throughput for a 128-bit key size
The AES-P can be mapped to any ASIC technology or FPGA device (provided sufficient silicon resources are available). The following are sample Intel® results with all core I/Os assumed to be routed on-chip. The provided figures do not represent the higher speed or smaller area for the core. Please contact CAST to get characterization data for your target configuration and technology.
AES-P Standard Core Intel Implementation Results
Family | Logic | Memory | Freq. (MHz) |
Throughput (Mbps) |
---|---|---|---|---|
Cyclone V (-7) | 238 ALMs | 8 RAM Block | 160 | 465 |
Stratix V (-1) | 229 ALMs | 4 RAM Block | 380 | 1,105 |
MAX 10 (-7) | 546 LEs | 16 M9K | 140 | 145 |
Throughput for a 128-bit key size
AES-P Fast Core Intel Implementation Results
Family | Logic | Memory | Freq. (MHz) |
Throughput (Mbps) |
---|---|---|---|---|
Cyclone V (-7) | 1,040 ALMs | 32 RAM Block | 110 | 1,280 |
Stratix V (-1) | 1,014 ALMs | 16 RAM Block | 290 | 3,375 |
MAX 10 (-7)\ | 1,248 LEs | 64 M9K | 80 | 683 |
Throughput for a 128-bit key size
The AES-P can be mapped to any ASIC technology or FPGA device (provided sufficient silicon resources are available). The following are sample AMD® results with all core I/Os assumed to be routed on-chip. The provided figures do not represent the higher speed or smaller area for the core. Please contact CAST to get characterization data for your target configuration and technology.
AES-P Standard Core AMD Implementation Results
Family (speed grade) | LUTs | BRAMs | Freq. (MHz) |
Throughput (Mbps) |
---|---|---|---|---|
Zynq-7000 (-3) | 354 | 2 | 250 | 727 |
Kintex-7 (-3) | 354 | 2 | 326 | 945 |
Virtex-7 (-3) | 355 | 2 | 300 | 873 |
Kintex UltraScale (-3) | 351 | 2 | 425 | 1,236 |
Kintex UltraScale+ (-3) | 353 | 2 | 550 | 1,600 |
Versal (-2) | 383 | 2 | 425 | 1,236 |
Throughput for a 128-bit key size
AES-P Fast Core AMD Implementation Results
Family | LUTs | BRAMs | Freq (MHz) |
Throughput (Mbps) |
---|---|---|---|---|
Zynq—7000 (-3) | 998 | 8 | 175 | 2,036 |
Kintex-7 (-3) | 1,007 | 8 | 250 | 2,909 |
Virtex-7 (-3) | 1,011 | 8 | 250 | 2,909 |
Kintex UltraScale (-3) | 1,029 | 8 | 350 | 4,073 |
Kintex UltraScale+ (-3) | 1,033 | 8 | 475 | 5,527 |
Versal (-2) | 1098 | 8 | 325 | 3,782 |
Throughput for a 128-bit key size
Engineered by Ocean Logic.