Silicon IP Cores
H264-D-BP
Low-Latency AVC/H.264 Baseline Profile Decoder
The H264-D-BP IP core is a video decoder complying with the Constrained Baseline Profile of the ISO/IEC 14496-10/ITU-T H.264 standard. It implements a hardware decoder with very low latency and high throughput that is suitable for live streaming and other delay-sensitive applications up to full HD resolution.
The decoder adds just one macroblock line of latency, which means a negligible real-world latency under one msec for most widely used video formats, including HD/720p and Full-HD/1080p video.
The H264-D-BP is designed for straightforward, trouble-free SoC integration. It operates on a stand-alone basis such that decoding proceeds with no assistance or input from the host processor. The decoder’s memory interface—used to store reconstructed video data—is independent from the external memory type and memory controller, and is tolerant to large latencies. Optionally, the core can be reduced to support only Intra-coded streams, in which case the required external memory is just 128kB and can be implemented on-chip. The decoder reports decompressed video parameters, detects and reports bit stream errors to the system, and simplifies video cropping at its output. The core is optionally delivered with a raster-to-block converter, and wrappers for AMBA® AHB, AXI, or AXI-Streaming buses are available.
Customers can further decrease their time to market by using CAST’s integration services to receive complete video encoding/decoding subsystems. These integrate the decoder core with video encoders, video and networking interface controllers, networking stacks, or other CAST or third-party IP cores.
The H264-D-BP IP core has been verified with Fraunhofer’s compliance test stream suit, and has been silicon and production proven. Its deliverables include a complete verification environment and a bit-accurate software model.
The H264-D-BP synthesizes to less than 500k gates and requires 532 kbits of internal memory. When configured to support only Intra-coded streams, the core synthesizes to about 400k gates, and requires only 128kB of external memory.
See sample ASIC and FPGA implementation results below.
Potential customers can readily evaluate the video decoder’s low latency characteristics by using the Video over IP reference design with the compressed stream captured over Ethernet, and the decoded video driving an HDMI interface.
The core is available in source-code VHDL or as a targeted netlist, and its deliverables include everything required for successful implementation:
- Sophisticated self-checking Testbench
- Synthesis scripts.
- Simulation script, vectors and expected results.
- Software Bit-Accurate Model
- Comprehensive user documentation.
The H264-D-BP can be mapped to any Altera FPGA (provided sufficient silicon resources are available). The following table provides sample performance and resource utilization data for different Altera Device Families. Please contact CAST to get characterization data for your target configuration and technology.
720p30 | 720p50 | 720p60 | 1080p30 | |
---|---|---|---|---|
StratixV | ✓ | ✓ | ✓ | ✓ |
Arria10 | ✓ | ✓ | ✓ | ✓ |
CycloneV | ✓ | ✗ | ✗ | ✗ |
ALMs | 32K | |||
Memory bits | 532k | |||
DSPs | 19 |
1: List of video formats is not exhaustive. Indicated video formats may not be supported at devices of all speed grades
An Intra-only version of the core (i.e. decoder core limited to decoded I-frames only) occupies about 20% less silicon resources, and requires just 128kB of external memory.
The H264-D-BP can be mapped to any AMD FPGA, provided sufficient silicon resources are available. The following table provides sample performance and resource utilization data for different AMD device families. Please contact CAST to get characterization data for your target configuration and technology.
720p30 | 720p50 | 720p60 | 1080p30 | |
---|---|---|---|---|
ARTIX ULTRASCALE+ | ✓ | ✓ | ✓ | ✓ |
KINTEX ULTRASCALE | ✓ | ✓ | ✓ | ✓ |
KINTEX-7 | ✓ | ✓ | ✓ | ✓ |
ARTIX-7 | ✓ | ✓ | ✗ | ✗ |
LUTs1 | 52k | |||
BRAMs | 23 RAMB36 / 95 RAMB18 | |||
DSPs | 19 |
1: Exact resource requirements and max performance depend on target device
2: List of video formats is not exhaustive. Indicated video formats may not be supported at devices of all speed grades
An Intra-only version of the core (i.e. decoder core limited to decoded I-frames only) occupies about 20% less silicon resources, and requires just 128kB of external memory.
The H264-D-BP can be mapped to any ASIC technology or FPGA device (provided sufficient silicon resources are available). The following table provides sample implementation data. Note that these sample implementation figures do not represent the highest speed or smallest area possible for the core. Please contact CAST to get characterization data for your target configuration and technology.
Technology | Logic Area (μm2) |
Logic Area (Gates) |
Memory (kbits) |
Freq. (MHz) |
Throughput (Mpixels/sec) |
---|---|---|---|---|---|
tsmc16-sc7-svt | 85,390 | 494,155 | 532 | 1,000 | 400 |
tsmc28hpm-sc9-c35-svt-ss | 257,570 | 529,979 | 532 | 1,000 | 400 |
tsmc40g-sc9-rvt | 257,570 | 645,329 | 532 | 800 | 320 |
When configured to support only Intra-coded streams, the core synthesizes to about 400k gates and requires just 128kB of external memory.
The H264-D-BP can be mapped to any Microchip FPGA (provided sufficient silicon resources are available). The following table provides sample performance and resource utilization data on a PolarFire device. Please contact CAST to get characterization data for your target configuration and device.
480p60 | 576p60 | 720p30 | 720p60 1080p30 |
|
---|---|---|---|---|
PolarFire | ✓ | ✓ | ✓ | ✗ |
4LUTs | 85,137 | |||
RAM Blocks | 255 uSRAM, 53 LSRAM | |||
Math Blocks | 37 |
1: List of video formats is not exhaustive. Indicated video formats may not be supported at devices of all speed grades
An Intra-only version of the core (i.e. decoder core limited to decoded I-frames only) occupies about 20% less silicon resources, and requires just 128kB of external memory.
Engineered by Fraunhofer HHI.
Features List
Constrained Baseline Profile AVC/H.264 decoder
- Ultra-Low-Latency: less than one msec latency for most widely used formats
- High performance: 2.5 cycles per pixel; Full-HD capable
Standard Support
- ISO/IEC 14496-10/ITU-T H.264, Constrained Baseline Profile specification
- I and P slices (Intra-only version also available)
- Multiple slices per frame
- Multiple reference frames
- Multiple sequence parameter sets (SPS)
- Multiple picture parameter sets (PPS)
- In-loop deblocking filter
- CAVLC entropy decoding
- Real time performance up to level 4.1
Video Formats
- Progressive, 4:2:0 YCbCr with 8 bits per color sample
- From QCIF (176x144), to 2048x2048 resolutions
Low Latency
- No decoded frame buffering
- Decoded pixels are streamed out with less than one macro-block line of latency
- Less than 1 msec for almost all widely used video formats
Ease of Integration
- Zero CPU overhead, stand-alone operation
-
AMBA® AXI external memory interface is independent of memory type and tolerant to latencies
- Streaming interfaces for bit-stream and pixel data with flow control; easily bridged to AMBA® AXI Streaming
- Error catching and reporting capability
- Reports video format and enables cropping
- Optional Block to Raster Conversion
Maturity
- Silicon proven
- Verified with Fraunhofer H.264 Compliance Test Streams Suite
Resources
- See MPEG-LA's page on AVC/H.264
- See the H.264 entry at Wikipedia.
- Download the ITU standard