Silicon IP Cores
H264-E-BPS
Low-Power AVC/H.264 Baseline Profile Encoder
The H264-E-BPS IP core is a video encoder supporting the Constrained Baseline Profile of the ISO/IEC 14496-10/ITU-T H.264 standard. It Implements an energy-efficient hardware architecture that is optimized for ultra-low-latency video streaming at low bit rates.
The H264-E-BPS encoder requires less than half the silicon area of most competing hardware encoders—approximately 125K gates—allowing for very cost-effective ASIC or FPGA implementations. Its small silicon footprint, low external memory bandwidth requirements, and zero software overhead enable H.264 coding at an extremely low energy cost. The encoder is able to process UHD/4K video when mapped on modern ASIC technologies, and Full-HD when mapped on FPGAs.
Despite being small, the H264-E-BPS produces high quality video, especially at low bit-rates, and is suitable for systems with low-latency requirements. It uses constant quantization to output video streams of Variable Bit-Rate (VBR), or automatically regulates quantization multiple times within a frame to output Constant Bit-Rate (CBR) streams. In CBR mode it responds rapidly to temporal or spatial changes in the video content. This can be combined with an artifacts-free Intra-Refresh coding implementation to effectively eliminate bit-rate peaks, while preserv-ing the periodic intra-coded references. As a result, the stream buffers can be smaller than those typically required, and the end-to-end latency can be brought down to frame or sub-frame levels. Video quality at low bit rates is preserved, as the encoder intelligently uses block-skipping and quantization coefficient thresholding to reduce bit rate at minimal quality loss, and uses the in-loop deblocking filter to eliminate the blocking artifact.
The core was designed for ease of use and integration. Once initially programmed, it operates without any assistance from the host processor. The encoder’s memory interface is extremely flexible: it operates on a separate clock domain, is independent from the external memory type and memory controller, and is tolerant to large latencies. The core is optionally delivered with a raster-to-block converter, and wrappers for AMBA® AHB, AXI, or AXI-Streaming buses are available.
Customers can further decrease their time to market by using CAST’s integration services to receive complete video encoding subsystems. These integrate the encoder core with video and networking interface controllers, networking stacks, or other CAST or third-party IP cores.
The H264-E-BPS IP core is designed using with industry best practices and has been multiple times production proven. Its deliverables include a complete verification environment and a bit-accurate software model.
Variable Bit-Rate with Constant Qp (VBR-CQP) and Constant Bit-Rate (CBR) output with CAVLC Encoding
- Efficient Inter- and Intra- Prediction
- Motion vector up to –16.00/+15.75 pixels down to ¼ pel accuracy
- All intra16x16 and most intra 4x4 modes
- Options for improved error resilience
- o Multiple slices per frame
- o Intra-only coding
- Options for better quality at low bit-rates
- Block skipping
- Deblocking filter
- Separate quantization values for luma and chrome
- Thresholding of quantized transform coefficient
The encoder core can be limited to operate in Intra-Only mode (H264-BIS version).
Under this configuration, no external memory is required, and the core’s size is further reduced.
The compression efficiency of H.264 Intra-only coding is superior to that of JPEG and comparable to JPEG2000. With Intra-only coding each frame is compressed independently simplifying video editing, and en-hancing error resilience.
Potential customers can readily evaluate the video encoder’s compression efficiency by using:
- Available sample compressed video streams
- The available Bit-Accurate Model with your choice of input videos
- The Video over IP reference design with video captured over an HDMI interface
Please contact CAST to arrange for your evaluation preference.
The core is available in source-code HDL (Verilog or VHDL) or as a targeted netlist, and its deliverables include everything required for successful implementation:
- Sophisticated self-checking Testbench
- Synthesis scripts.
- Simulation script, vectors and expected results.
- Software (Bit-Accurate Model and test vector generator)
- Comprehensive user documentation.
H264-E-BPS and H264-E-BIS Core — ASIC Implementation Results
The H264-E-BPS can be mapped to any ASIC Technology (provided sufficient silicon resources are available) and optimized to suit the particular project’s requirements. The following table provides sample implementation data for the core configured under its default configuration (H264-E-BPS) and for the intra-only version for the core (H264-E-BIS). Please contact CAST to get characterization data for your target configuration and technology.
H264-E-BPS Core
Technology | Logic | Memory | Freq. | Throughput | Video Format |
TSMC 16nm | 22k um2 130k Gates |
133K bits | 1 GHz | 250 Mpixels/sec | UHD/4k@30fp |
TSMC 28nm | 60k um2 125k Gates |
133K bits | 850 MHz | 212 Mpixels/sec | UHD/4k@25fp |
TSMC 16nm | 20k um2 117k Gates |
133K bits | 500 MHz | 125 Mpixels/sec | 1080p60 |
TSMC 28nm | 55k um2 113k Gates |
133K bits | 500 MHz | 125 Mpixels/sec | 1080p60 |
H264-E-BIS Core
Technology | Logic | Memory | Freq. | Throughput | Video Format |
TSMC 16nm | 12k um2 70k Gates |
60K bits | 1 GHz | 250 Mpixels/sec | UHD/4k@30fp |
TSMC 28nm | 32k um2 67k Gates |
60K bits | 850 MHz | 212 Mpixels/sec | UHD/4k@25fp |
TSMC 16nm | 11k um2 65k Gates |
60K bits | 500 MHz | 125 Mpixels/sec | 1080p60 |
TSMC 28nm | 30k um2 61k Gates |
60K bits | 500 MHz | 125 Mpixels/sec | 1080p60 |
Note that these sample implementation figures do not represent the highest speed or smallest area possible for the core.
The H264-E-BPS can be mapped to any Altera FPGA (provided sufficient silicon resources are available) and optimized to suit the particular project’s requirements. The following table provides sample implementation data for the core configured under its default configuration (H264-EBPS) and for the intra-only version for the core (H264-EBIS). Please contact CAST to get characterization data for your target configuration and technology.
H264-E-BPS | Area | Memory Bits |
DSPs / MULs |
Video Formats |
StratixV | 8.6K ALMs | 114k | 9 | 1080p30/25 720p60/50/30 |
Arria10 | 8.6K ALMs | 114k | 9 | |
Max10 | 21k LEs | 112k | 13 | 720p25 480p60 |
* List of video formats is not exhaustive. Indicated video formats may not be supported at devices of all speed grades
H264-E-BIS | Area | Memory Bits |
DSPs | Video Formats |
StratixV | 5k ALMs | 58K | 4 | 1080p30/25 720p60/50/30 |
Arria10 | 5k ALMs | 58K | 4 | |
Max10 | 11K LEs | 58K | 8 | 480p60 |
* List of video formats is not exhaustive. Indicated video formats may not be supported at devices of all speed grades
H264-E-BPS and H264-E-BIS Core — AMD FPGA Results
The H264-E-BPS can be mapped to any AMD FPGA (provided sufficient silicon resources are available) and optimized to suit the particular project’s requirements. The following table provides sample implementation data for the core configured under its default configuration (H264-E-BPS) and for the intra-only version of the core (H264-E-BIS). Please contact CAST to get characterization data for your target configuration and technology.
H264-E-BPS | H264-E-BIS | |||
720p@30 | 1080p@30 | 720p@30 | 1080p@30 | |
ARTIX-7 | ✓ | ✗ | ✓ | ✗ |
KINTEX-7 | ✓ | ✗ | ✓ | ✗ |
KINTEX ULTRASCALE |
✓ | ✓ | ✓ | ✓ |
LUTs 1 | 16K | 8k | ||
Slices 1 | 4.8k | 2.6k | ||
BRAMs | 18 | 3 | ||
DSPs | 4 | 4 |
1: Exact resources usage and max performance depend on target device
2: Indicated performance may not be supported at devices of all speed grades
The H264-E-BPS can be mapped to any Efinix Family (provided sufficient silicon resources are available) and optimized to suit the particular project’s requirements. The following table provides sample implementation data for the core configured under the intra-only version for the core (H264-E-BIS).
H264-E-BIS
Target Technology |
Logic Resources |
Memory Resources |
Freq. | Video Formats* |
---|---|---|---|---|
Titanium Ti60-C4 |
12.9K XLRs 1 DSP |
25 BRAM | 272 MHz | 1080p30/25 |
Titanium Ti60-C3 |
12.9K XLRs 1 DSP |
25 BRAM | 231 MHz | |
Titanium T120-C4 |
13.5K LEs 1 Mult |
31 Mem. Blocks | 85 MHz | 720p20 |
* List of video formats is not exhaustive. Indicated video formats may not be supported at devices of all speed grades
The H264-E-BPS can be mapped to any Lattice FPGA (provided sufficient silicon resources are available) and optimized to suit the particular project’s requirements. The core is capable of processing 720p30 video while occupying approximately 12k LUT4s and 12 BRAM when mapped on an Avant-E device. Please contact CAST to get characterization data for your target configuration and technology.
Engineered by Ocean Logic.
Features List
Low-power AVC/H.264 encoder, with small silicon footprint and optimized for low-latency, low-bit-rate video streaming; multiple times production proven
Standard Support
- ISO/IEC 14496-10/ITU-T H.264 Constrained Baseline Profile specification
- Interlaced Video using Main Profile syntax
- Output Annex B NAL byte stream decodable by Baseline, Main and High Profile decoders
Input Video Formats
- Progressive or Interlaced, 4:2:0 YCbCr input with 8 bits per color sample
- Up to UHD/4K in ASICs; up to Full-HD in FPGAs
- Optional multichannel encoding
Small and Low Power
- Approximately 125K gates and 133 kbits of RAM
- Uses less power than competitive hardware H.264 encoders thanks to having under half their silicon footprint and small external memory bandwidth.
- Consumes much less power than any equivalent software, or software-hardware encoder
Low Latency and Low Bit Rates with Fewer Artifacts
- Constant Bit-Rate (CBR) output for smaller stream buffers and end-to-end latency
- Advanced rate control regulates Qp multiple times within a frame, and rapidly responds to temporal or spatial video variations
- Enables artifacts-free Intra-Refresh to eliminate bit-rate peaks of I frames
- Block skipping, Quantized coefficients thresholding, and in-loop deblocking filter improve quality at low bit rates
Ease of Integration
- Zero CPU overhead, stand-alone operation
- Flexible external memory interface uses separate clock, is independent of memory type and tolerant to latencies
- AMBA® Interface Options: DMA-capable AMBA® AHB, AXI or AXI-Streaming
Resources
- See MPEG-LA's page on AVC/H.264
- See the H.264 entry at Wikipedia.
- Download the ITU standard