Silicon IP Cores
H264-LD-BP
Low-Power AVC/H.264 Baseline Profile Decoder
The H264-LD-BP IP core implements a silicon and energy efficient hardware video decoder able to process H.264 streams produced by the H264-E-BPS, H264-E-BPF and H264-E-BIS video encoder cores available from CAST.
The H264-LD-BP is extremely small, requiring less than 70k gates and about 60k bits of infernal memory. Its small silicon footprint, low bandwidth requirements, and zero software-overhead enable extremely cost-effective and low-power ASIC and FPGA implementations.
The H264-LD-BP is designed for straightforward, trouble-free SoC integration. It operates on a stand-alone basis such that decoding proceeds without any assistance or input from the host processor. The decoder’s memory interface—used to store reconstructed video data—is extremely flexible: it operates on a separate clock domain, is independent from the external memory type and memory controller, and is tolerant to relatively large latencies. The decoder reports decompressed video parameters, detects and reports bit stream errors to the system, and simplifies video cropping at its output. The core is optionally delivered with a raster-to-block converter, and wrappers for AMBA® AHB, AXI, or AXI-Streaming buses are available.
Customers can further decrease their time to market by using CAST’s integration services to receive complete video encoding/decoding subsystems. These integrate the decoder core with video encoders, video and networking interface controllers, networking stacks, or other CAST or third-party IP cores.
The H264-LD-BP IP core is designed using industry best practices and has been multiple times production proven. Its deliverables include a complete verification environment and a bit-accurate software model.
Potential customers can readily evaluate the video decoder’s low latency characteristics by using the Video over IP reference design with the compressed stream captured over Ethernet, and the decoded video driving an HDMI interface.
The core deliverables include everything required for successful implementation:
- Source-code HDL (Verilog or VHDL) (ASICs) or as a targeted netlist (FPGAs)
- Sophisticated self-checking Testbench
- Synthesis scripts.
- Simulation script, vectors and expected results.
- Comprehensive user documentation.
The H264-LD-BP synthesizes to less than 70k gates and also requires about 60 Kbits of internal memory.
The H264-LD-BP can be mapped to any ASIC technology and optimized to suit the particular project’s requirements. The following table provides sample implementation data for a single H264-LD-BP.
Target Technology |
Logic (Gates) |
Memory Bits |
Freq. (MHz) |
Throughput (Mpixels/sec) |
Video Format |
TSMC 16nm | 55k | 58k | 1,000 | 250 | UHD/4k@30fp |
TSMC 40nm | 60k | 58k | 500 | 200 | 1080p60 |
These sample implementation figures do not represent the highest speed or smallest area possible for the core. Note that under certain conditions two or more H.264-LD-BP cores can be combined to decode streams produced by the H264-E-BPF core. Please contact CAST to get characterization data for your target configuration and technology.
The H264-LD-BP BP can be mapped to any Altera Family (provided sufficient silicon resources are available) and optimized to suit the particular project’s requirements. The following table provides sample resource utilization data for different Intel Device Families. Please contact CAST to get characterization data for your target configuration and technology.
Logic | Memory Bits | DSPs/MULs | |
---|---|---|---|
StratixV | 4k ALMs | 59,278 | 2 |
Arria10 | 4k ALMs | 59,278 | 2 |
CycloneV | 4k ALMs | 59,278 | 2 |
Max10 | 10K LEs | 59,364 | 2 |
Multiple H264-LD-BP cores can be combined to decode streams produced by the H264-E-BPF core. The following table indicates the number of H264-LD-BP cores that would be required for different video formats in different Intel families.
480p30 | 720p30 | 720p60 | 1080p30 | 1080p60 | |
---|---|---|---|---|---|
StratixV
|
✓(1) | ✓(1) | ✓(2) | ✓(2) | ✗(3) |
Arria10
|
✓(1) | ✓(1) | ✓(2) | ✓(2) | ✓(3) |
CycloneV
|
✓(1) | ✓(2) | ✓(3) | ✓(3) | ✗ |
Max10
|
✓(1) | ✗ | ✗ | ✗ | ✗ |
Note: List of video formats is not exhaustive.
The H264-LD-BP can be mapped to any AMD Family (provided sufficient silicon resources are available) and optimized to suit the particular project’s requirements. The following table provides sample resource utilization data for different Xilinx Device Families. Please contact CAST to get characterization data for your target configuration and technology.
Logic Resources |
Memory Resources |
Freq. (MHz) |
|
---|---|---|---|
ARTIX-7 (-3) | 39,849 LUTs | 6.5 BRAM | 117 |
KINTEX UltraScale+ (-1) |
41,003 LUTs | 4.5 BRAM | 250 |
Artix UltraScale+ (-1) |
42,290 LUTs | 4.5 BRAM | 250 |
Versal Prime (-1) | 52,347 LUTs | 5 BRAM | 188 |
Multiple H264-LD-BP cores can be combined to decode streams produced by the H264-E-BPF core. The following table indicates the number of H264-LD-BP cores that would be required for different video formats in different AMD families.
720p30 | 720p60 | 1080p30 | 1080p60 | |
---|---|---|---|---|
ARTIX-7 | ✓(1) | ✓(2) | ✓(2) | ✓(4) |
KINTEX-7 | ✓(1) | ✓(2) | ✓(2) | ✓(3) |
ARTIX UltraScale+ |
✓(1) | ✓(1) | ✓(1) | ✓(2) |
KINTEX UltraScale+ |
✓(1) | ✓(1) | ✓(1) | ✓(2) |
Note: List of video formats is not exhaustive.
Engineered by Ocean Logic.
Features List
Low-power AVC/H.264 decoder, with small silicon footprint; optimized for low-latency, low-bit-rate video streaming
- Decodes streams produced by the H264-E-BPS, H264-E-BPF, and H264-E-BIS cores
Video Formats
- Progressive or Interlaced, 4:2:0 YCbCr with 8 bits per color sample
- Single-channel SD, ED, and Full-HD capable even in low-cost FPGAs
- Optional multichannel decoding
Small and Low-Power
- Less than 70 KGates and 60k bits of RAM
- Less than half the typical silicon footprint and small external memory bandwidth mean it uses less power than competitive hardware H.264 decoders
- Consumes much less power than any equivalent software or software-hardware decoder
Ease of Integration
- Zero CPU overhead, stand-alone operation
- Flexible external memory interface. Uses a separate clock, is independent of memory type and tolerant to latencies
- AMBA® Interface Options: DMA-capable AMBA® AHB, AXI or AXI-Streaming
Supported Coding Tools
- I and P Slices
- Single Reference Frame
- Motion vector up to –32.00/+31.75 pixels down to ¼ pel accuracy
- All intra16x16 and most intra 4x4 modes
- Multiple slices per frame
- Block skipping
- Deblocking filter
Resources
- See MPEG-LA's page on AVC/H.264
- See the H.264 entry at Wikipedia.
- Download the ITU standard