Silicon IP Cores
LZ4SNP-D
LZ4/Snappy Data Decompressor
LZ4SNP-D is a custom hardware implementation of a lossless data decompression engine for the LZ4 and Snappy compression algorithms. The core receives compressed files, automatically detects the LZ4 or Snappy format, and outputs the decompressed data.
The core features fast processing with low latency and high throughput. In its default configuration, LZ4SNP-D outputs 7.8 bytes of decompressed data per clock cycle and can be clocked at frequencies exceeding 1 GHz in modern ASIC technologies. Designers can scale the throughput by instantiating the core multiple times to achieve throughput rates exceeding 100Gbps. The processing latency is approximately 30 clock cycles.
The decompression core operates on a standalone basis—offloading the host CPU from the demanding task of data decompression—and has been designed for easy integration and use. No preprocessing of the incoming compressed files is required, as the core parses the file headers, checks the input files for errors, and outputs the decompressed data payload.
Extensive error tracking and reporting enable the core to ensure smooth system operation and error recovery, even in the presence of errors in the compressed input files. Furthermore, internal memories can optionally support Error Correction Codes (ECC) to simplify achieving enterprise-class reliability or functional safety requirements.
The LZ4SNP-D core is a microcode-free design developed for reuse in ASIC and FPGA implementations. Its streaming data interface—optionally bridged to AMBA® AXI4-Stream—eases SoC integration. Technology mapping is straightforward, as the design is scan-ready, LINT-clean, microcode-free, and uses easily replaceable, generic memory models.
The LZ4SNP-D core is ideal for increasing the bandwidth of optical, wired, or wireless data communication links. It also effectively increases data storage capacity in a wide range of devices, such as networking interface/routing/storage equipment, data servers, or SSD drives. The core can also help reduce the power consumption and bandwidth of centralized memories (e.g., DDR) or interfaces (e.g., Ethernet or Wi-Fi) in a wide range of SoCs.
LZ4SNP-D can be mapped on any Altera FPGA, and its resource requirements and throughput depend on its configuration. Also, its performance can scale via multiple core instances. The following table provides sample FPGA resource utilization data for the core mapped on an Agilex 7 (speed grade 3) device with its clock set to 280MHz.
Datapath Width (bits) | LZ4 Support | Snappy Support | Max. History | Logic Resoruces (ALMs) | Memory Resources (M20Ks) |
---|---|---|---|---|---|
32 | Yes | No | 64kB | 2,396 | 38 |
32 | No | Yes | 64kB | 2,252 | 38 |
32 | Yes | Yes | 64kB | 3,219 | 38 |
64 | Yes | No | 64kB | 3,072 | 39 |
64 | No | Yes | 64kB | 2,946 | 39 |
64 | Yes | Yes | 64kB | 3,835 | 39 |
Please contact CAST Sales to get accurate characterization data for your target technology and configuration.
LZ4SNP-D can be mapped on any AMD FPGA, and its re-source requirements and throughput depend on its configuration. Also, its performance can scale via multiple core instances. The following table provides sample FPGA resource utilization data for the core mapped on an Artix Ultrascale+ Device with its clock set to 300MHz.
Datapath Width (bits) | LZ4 Support | Snappy Support | Max. History | Logic Resoruces (LUTs) | Memory Resources (RAMB) |
---|---|---|---|---|---|
32 | Yes | No | 64kB | 3,377 | 19 |
32 | No | Yes | 64kB | 3,010 | 19 |
32 | Yes | Yes | 64kB | 4,114 | 19 |
64 | Yes | No | 64kB | 4,307 | 20 |
64 | No | Yes | 64kB | 3,929 | 20 |
64 | Yes | Yes | 64kB | 5,105 | 20 |
Please contact CAST Sales to get accurate characterization data for your target technology and configuration.
LZ4SNP-D is process-independent, and its silicon resource requirements and throughput depend on its configuration. Also, its performance can scale via multiple core instances. The following table provides sample silicon resource utilization data for the core mapped on a 7nm ASIC technology with its clock set to 500MHz.
Datapath Width (bits) | LZ4 Support | Snappy Support | Max. History | Logic Resoruces (eq.Gates) | Memory Resources (bits) |
---|---|---|---|---|---|
32 | Yes | No | 64kB | 33,343 | 540k |
32 | No | Yes | 64kB | 31,909 | 540k |
32 | Yes | Yes | 64kB | 42,616 | 540k |
64 | Yes | No | 64kB | 42,632 | 549k |
64 | No | Yes | 64kB | 41,094 | 549k |
64 | Yes | Yes | 64kB | 51,795 | 549k |
Please contact CAST Sales to get accurate characterization data for your target technology and configuration.
The core as delivered is warranted against defects for ninety days from purchase. Thirty days of phone and email technical support are included, starting with the first interaction. Additional maintenance and support options are available.
The core is available in synthesizable HDL (System Verilog) or targeted FPGA netlist forms and includes everything required for successful implementation. Its deliverables include:
- Sophisticated Test Environment
- Simulation scripts, test vectors, and expected results
- Synthesis script
- Comprehensive user documentation
Features List
Compression Algorithms
- LZ4
- 64KB history window size
- All frame and block formats
- CRC checking (optional, on request)
- Dictionaries not supported
- Snappy
- 64KB history window size
- All frame and stream formats
- CRC checking (optional, on request)
High Performance & Low Latency
- Average processing rate of 7.8 decompressed bytes per clock cycle
- Clock frequency in excess of 1GHz on modern ASIC processes, and more than 300MHz on high-end FPGAs
- Latency of approximately 30 clock cycles
Easy to Use and Integrate
- Processor-free, standalone operation
- Automatic detection of input frame format (LZ4 or Snappy)
- Extensive error catching & reporting for smooth operation and recovery in the presence of errors
- CRC 32 errors
- File size errors
- Coding errors
- Non-correctable ECC memory errors
- Optional ECC memories
- AXI-Stream or native FIFO-like data interfaces
- Single clock domain design
- Interface bridges and DMAs available separately
- Microcode-free, LINT-clean, scan-ready design
Configuration Options
- Synthesis-time configuration options allow finetuning the core’s size and performance:
- Input and output bus widths
- FIFO sizes
- Maximum history window
- Data-path width
- More