Silicon IP Cores
ZipAccel-C
GZIP/ZLIB/Deflate Data Compressor
ZipAccel-C is a custom hardware implementation of a lossless data compression engine that complies with the Deflate, GZIP, and ZLIB compression standards.
The core receives uncompressed input files and produces compressed files. No post-processing of the compressed files is required, as the core encapsulates the compressed data payload with the proper headers and footers. Input files can be segmented, and segments from different files can be interleaved at the core’s input.
The core’s flexible architecture enables fine-tuning of its compression efficiency, throughput, and latency to match the requirements of the end application. Throughputs in excess of 400 Gbps are feasible even at clock rates as low as 500MHz, and latency can be as small as a few tens of clock cycles.
ZipAccel-C offers compression efficiency practically equivalent to today’s popular deflate-based software applications. Analyzing processing speed versus compression efficiency to achieve the best trade-off for a specific system is facilitated by the included software model, and by support from our team of data compression experts.
ZipAccel-C has been designed for ease of use and integration. It operates on a standalone basis, off-loading the host CPU from the demanding task of data compression, and optionally from the task of encrypting the compressed stream. Streaming AXI-Stream or native FIFO-like data interfaces ease SoC integration.
Technology mapping is straightforward, as the design is LINT-clean, scan-ready, microcode-free, and uses easily replaceable, generic memory models. Memory blocks can optionally support Error Correction Codes (ECC) to simplify achieving Enterprise-Class reliability requirements. Furthermore, input file segmentation can limit the inter-file latency and helps users achieve Quality of Service (QoS) objectives.
Support
The core as delivered is warranted against defects for ninety days from purchase. Thirty days of phone and email technical support are included, starting with the first interaction. Additional maintenance and support options are available.
Deliverables
The core is available in synthesizable HDL (System Verilog) and targeted FPGA netlist forms, and includes everything required for successful implementation. It's deliverables include:
- Sophisticated Test Environment
- Simulation scripts, test vectors, and expected results
- Synthesis script
- Comprehensive user documentation
ZipAccel-C reference designs have been evaluated in a variety of technologies. ZipAccel-C performance can scale by instantiating more search engines and/or Huffman encoders. Furthermore, other design options, such as the search area window, affect the silicon resources utilization.
Over 400 Gbps throughputs are feasible at clock rates as low as 500MHz, and the silicon footprint can be less than 100KGates. Contact CAST Sales for help defining likely configuration options and estimating implementation results for your specific system.
The ZipAccel-C can be mapped to any Altera FPGA device (provided sufficient silicon resources are available). The ZipAccel-C performance can scale by instantiating more search engines and/or Huffman encoders. Furthermore, other design options, such as the search area window, affect the silicon resources utilization. The following table provides sample Intel results for a subset of the possible configuration options. They do not represent the smallest possible area requirements nor the highest possible clock frequency. Please contact CAST to get characterization data for your target configuration and technology.
Family | Configuration | ALMs | RAM Bits |
---|---|---|---|
Agilex (-3) | 1 Systolic Search Engine, 1 Static Huffman Encoder, 512B History Window, 450MHz | 7,021 | 2,040 |
Arria 10 GX (-3) | 1 Systolic Search Engine, 1 Static Huffman Encoder, 2kB History Window, 320MHz | 13,641 | 5,656 |
Arria 10 GX (-3) | 1 Hash Search Engine, 1 Dynamic Huffman Encoder, 8kB History Window, 210MHz | 40,668 | 683,581 |
Agilex (-3) | 1 Hash Engine, 1 Dynamic Huffman Encoder, 32KB History Window, 250MHz | 17,091 | 1,390,662 |
Arria 10 GX (-3) | 4 Hash Search Engines, 1Dynamic Huffman Encoder, 4KB History Window, 110MHz | 64,044 | 1,513,623 |
Agilex (-3) | 26 Systolic Search Engines, 26 Static Huffman Encoders, 2kB History Window, 450MHz | 438,617 | 2,055,946 |
Agilex (-1) | 30 Search Engines, 10 Static Huffman Encoders, 256B History Window, 500MHz | 319,724 | 7,471,974 |
The ZipAccel-C can be mapped to any AMD FPGA device (provided sufficient silicon resources are available). ZipAccel-C performance can scale by instantiating more search engines and/or Huffman encoders. Furthermore, other design options, such as the search area window, affect the silicon resources utilization. The following table provides sample AMD results for a subset of the possible configuration options. They do not represent the smallest possible area requirements nor the highest possible clock frequency. Please contact CAST to get characterization data for your target configuration and technology.
Family / Device | Configuration | LUTs | RAM Blocks |
---|---|---|---|
Kintex UltraScale+ ku9p-1-e |
1 Systolic Search Engine,1 Static Huffman Encoder, 512B History Window, 450MHz | 5,715 | 1 |
Artix UltraScale+ au25p-1-e |
1 Hash Search Engine, 1 Dynamic Huffman Encoder, 4kB History Window, 180MHz | 18,073 | 23 |
Spartan-7 7s100-1 |
1 Hash Search Engine, 1 Dynamic Huffman Encoder, 32kB History Window, 103MHz | 19,190 | 103 |
Versal Premium vp1202-2MP-e-L |
10 Systolic Search Engines, 10 Static Huffman Encoders, 256B History Window, 300MHz | 47,383 | 15 |
Kintex UltraScale ku085-1-c |
10 Hash Search Engines, 2 Dynamic Huffman Encoders, 4kB History Window, 100MHz | 144,724 | 218 |
Virtex UltraScale+ vu9p-1 |
36 Hash Search Engines, 6 Dynamic Huffman Encoders, 16kB History Window, 250MHz | 578,403 | 692 |
ZipAccel-C reference designs have been evaluated in a variety of technologies. ZipAccel-C performance can scale by instantiating more search engines and/or Huffman encoders. Furthermore, other design options, such as the search area window, affect the silicon resources utilization.
The core can be mapped on any Lattice FPGA provided sufficient silicon resources are available. The following are sample implementation results for a small subset of the possible configuration options of the core on a CertusPro-NX and an Avant-E device, and do not represent the smallest possible area requirements nor the highest possible clock frequency.
Family / Device |
Configuration | Freq. (MHz) |
Logic Resources |
Memory Resources |
---|---|---|---|---|
CertusNX-Pro LFCPNX-100(-9) |
1 Systolic Search Engine, 1 Static Huffman Encoder, 512B History Window |
155 | 16,694 Slices | 15 EBR |
1 Systolic Search Engine, 1 Static Huffman Encoder, 2kB History Window |
140 | 35,941 Slices | 16 EBR | |
1 Hash Search Engine, 1 Dynamic Huffman Encoder, 8kB History Window |
110 | 27,943 Slices | 65 EBR | |
1 Hash Search Engine, 1 Dynamic Huffman Encoder, 32kB Window |
100 | 28,022 Slices | 99 EBR | |
Avant-E LAV-AT-500E(-1) |
1 Systolic Search Engine, 1 Static Huffman Encoder, 512B History Window |
180 | 7,734 Slices | 8 EBR |
1 Systolic Search Engine, 1 Static Huffman Encoder, 2kB History Window |
175 | 23,338 Slices | 8 EBR | |
1 Hash Search Engine, 1 Dynamic Huffman Encoder, 8kB History Window |
145 | 19,271 Slices | 48 EBR | |
1 Hash Search Engine, 1 Dynamic Huffman Encoder, 32kB Window |
140 | 19,253 Slices | 50 EBR |
Engineered by Sandgate Technologies.
Features List
Compression Standards
- Deflate (RFC-1951)
- ZLIB (RFC-1950)
- GZIP (RFC-1952)
Deflate Features
- LZ77 with configurable block and search window size
- Static and dynamic Huffman
- Optional stored deflate blocks
- Dynamic mode selection
Flexible Architecture
- Fine-tune Throughput, Compression Efficiency, and Latency to match application requirements
- More than 400Gbps with one core instance, scalable to meet any throughput requirement
- Compression efficiency can be on par with Unix/Linux max compression option (gzip -9)
- Silicon requirements start from less than 100k gates
- Under 40 clock cycles for Static Huffman
- Configuration options (partial list):
- Search engine and Huffman encoder architecture
- History search window size (up to 32KB)
- Deflate block size
- Stored blocks support
- Parallel processing level
Easy to Use and Integrate
- Processor-free, standalone operation
- AXI-Stream or native FIFO-like data interfaces
- Large file segmentation enables meeting QoS objectives
- Microcode-free, scan-ready design
- Optional ECC memories
- Optionally integrated with DMA, encryption or other cores from CAST
- Complete, turn-key Accelerator Designs available on FPGA boards from different vendor
Resources
Applicable Standards
• RFC 1952 – GZIP file format
• RFC 1950 – ZLIB Compressed Data Format
• RFC 1951 – DEFLATE Compressed Data Format
Background & More Info
• Data Compression in Solid State Storage, presentation at Flash Memory Summit 2013 (PDF)
• Wikipedia entries on GZIP, ZLIB, and Deflate
• An explanation of the Deflate algorithm by Antaeus Feldspar
• GZIP Project website
• ZLIB Project website