H264-LD-BP
Low-Power AVC/H.264 Baseline Profile Decoder

The H264-LD-BP IP core implements a silicon and energy efficient hardware video decoder able to process H.264 streams produced by the H264-E-BPS, H264-E-BPF and H264-E-BIS video encoder cores available from CAST.  

The H264-LD-BP is extremely small, requiring less than 70k gates and about 60k bits of infernal memory. Its small silicon footprint, low bandwidth requirements, and zero software-overhead enable extremely cost-effective and low-power ASIC and FPGA implementations.   

The H264-LD-BP is designed for straightforward, trouble-free SoC integration. It operates on a stand-alone basis such that decoding proceeds without any assistance or input from the host processor. The decoder’s memory interface—used to store reconstructed video data—is extremely flexible: it operates on a separate clock domain, is independent from the external memory type and memory controller, and is tolerant to relatively large latencies. The decoder reports decompressed video parameters, detects and reports bit stream errors to the system, and simplifies video cropping at its output. The core is optionally delivered with a raster-to-block converter, and wrappers for AMBA® AHB, AXI, or AXI-Streaming buses are available. 

Customers can further decrease their time to market by using CAST’s integration services to receive complete video encoding/decoding subsystems. These integrate the decoder core with video encoders, video and networking interface controllers, networking stacks, or other CAST or third-party IP cores.  

The H264-LD-BP IP core is designed using industry best practices and has been multiple times production proven. Its deliverables include a complete verification environment and a bit-accurate software model.

Potential customers can readily evaluate the video decoder’s low latency characteristics by using the Video over IP reference design with the compressed stream captured over Ethernet, and the decoded video driving an HDMI interface. 

The core deliverables include everything required for successful implementation: 

  •     Source-code HDL (Verilog or VHDL) (ASICs) or as a targeted netlist (FPGAs) 
  •     Sophisticated self-checking Testbench  
  •     Synthesis scripts. 
  •     Simulation script, vectors and expected results. 
  •     Comprehensive user documentation.  
     

The H264-LD-BP synthesizes to less than 70k gates and also requires about 60 Kbits of internal memory. 

The H264-LD-BP can be mapped to any ASIC technology and optimized to suit the particular project’s requirements. The following table provides sample implementation data for a single H264-LD-BP.

Target
Technology
Logic
(Gates)
Memory
Bits
Freq.
(MHz)
Throughput
(Mpixels/sec)
Video
Format
TSMC 16nm 55k 58k 1,000 250 UHD/4k@30fp
TSMC 40nm 60k 58k 500 200 1080p60

These sample implementation figures do not represent the highest speed or smallest area possible for the core. Note that under certain conditions two or more H.264-LD-BP cores can be combined to decode streams produced by the H264-E-BPF core. Please contact CAST to get characterization data for your target configuration and technology.

The H264-LD-BP BP can be mapped to any Altera Family (provided sufficient silicon resources are available) and optimized to suit the particular project’s requirements. The following table provides sample resource utilization data for different Intel Device Families. Please contact CAST to get characterization data for your target configuration and technology.

  Logic Memory Bits DSPs/MULs
StratixV 4k ALMs 59,278 2
Arria10 4k ALMs 59,278 2
CycloneV 4k ALMs 59,278 2
Max10 10K LEs 59,364 2

Multiple H264-LD-BP cores can be combined to decode streams produced by the H264-E-BPF core. The following table indicates the number of H264-LD-BP cores that would be required for different video formats in different Intel families.

  480p30 720p30 720p60 1080p30 1080p60
StratixV
 
(1) (1) (2) (2) (3)
Arria10
 
(1) (1) (2) (2) (3)
CycloneV
 
(1) (2) (3) (3)
Max10
 
(1)

Note: List of video formats is not exhaustive.

The H264-LD-BP can be mapped to any AMD Family (provided sufficient silicon resources are available) and optimized to suit the particular project’s requirements. The following table provides sample resource utilization data for different Xilinx Device Families. Please contact CAST to get characterization data for your target configuration and technology.

  Logic
Resources
Memory
Resources
Freq.
(MHz)
ARTIX-7 (-3) 39,849 LUTs 6.5 BRAM 117
KINTEX
UltraScale+ (-1)
41,003 LUTs 4.5 BRAM 250
Artix
UltraScale+ (-1)
42,290 LUTs 4.5 BRAM 250
Versal Prime (-1) 52,347 LUTs 5 BRAM 188

Multiple H264-LD-BP cores can be combined to decode streams produced by the H264-E-BPF core. The following table indicates the number of H264-LD-BP cores that would be required for different video formats in different AMD families.

  720p30 720p60 1080p30 1080p60
ARTIX-7 (1) (2) (2) (4)
KINTEX-7 (1) (2) (2) (3)
ARTIX
UltraScale+
(1) (1) (1) (2)
KINTEX
UltraScale+
(1) (1) (1) (2)

Note: List of video formats is not exhaustive.

Related Content

Features List

Low-power AVC/H.264 decoder, with small silicon footprint; optimized for low-latency, low-bit-rate video streaming

  • Decodes streams produced by the H264-E-BPS, H264-E-BPF, and H264-E-BIS cores

Video Formats

  • Progressive or Interlaced, 4:2:0 YCbCr with 8 bits per color sample
  • Single-channel SD, ED, and Full-HD capable even in low-cost FPGAs
  • Optional multichannel decoding

Small and Low-Power

  • Less than 70 KGates and 60k bits of RAM
  • Less than half the typical silicon footprint and small external memory bandwidth mean it uses less power than competitive hardware H.264 decoders
  • Consumes much less power than any equivalent software or software-hardware decoder

Ease of Integration

  • Zero CPU overhead, stand-alone operation
  • Flexible external memory interface. Uses a separate clock, is independent of memory type and tolerant to latencies
  • AMBA® Interface Options: DMA-capable AMBA® AHB, AXI or AXI-Streaming

Supported Coding Tools

  • I and P Slices
  • Single Reference Frame
  • Motion vector up to –32.00/+31.75 pixels down to ¼ pel accuracy
  • All intra16x16 and most intra 4x4 modes
  • Multiple slices per frame
  • Block skipping
  • Deblocking filter

Resources

Let's talk about your project and our IP solutions

Request Info