Detailed explanation of H.264 scalable video codec (SVC) application

Scalable video codecs have been developed for many years. The broadcasting industry is strictly controlled by various established standards, so it has been slow to adopt this technology. Advances in processors, sensors, and display technologies are igniting a variety of video applications. The Internet and IP technologies are seamlessly serving video to a more diverse and remote community of display devices. Scalable video codecs, such as H.264 SVC, meet the needs of many of these systems, and they may motivate video to be widely adopted as a catalyst for communication media.

This article refers to the address: http://

Codecs are used to compress video to reduce the bandwidth required to transport video streams, or to reduce the amount of storage required to archive video files. The cost of this compression is to increase the computational requirements: the higher the compression ratio, the higher the computational power requirements.

A trade-off between bandwidth and computational requirements has implications for defining the minimum channel bandwidth required to carry an encoded video stream and the minimum metric for the encoding device. In conventional video systems such as broadcast televisions, the minimum specification of the decoder (in the case of a set top box) is easily defined.

However, video is increasingly being used by a wide variety of applications, and accordingly, there are a variety of client devices, including watching Internet video from a computer, to a portable digital assistant (PDA), and compact. Cellular phone. The video stream for these devices must be different.

In order to be more compatible with a particular viewing device and channel bandwidth, the video stream must be encoded multiple times with different settings. The combination of each setting must produce to the user a video stream that satisfies the bandwidth required for video streaming and the ability to decode the viewing device. If the original decompressed video stream is not available, the encoded video stream must first be decoded and then decoded with the new settings. This practice is very expensive.

In the ideal case, the video is encoded only once with a highly efficient codec. If decoded, the resulting video stream will produce full resolution video. In addition, in an ideal situation, if a lower resolution or bandwidth video stream needs to be further extended into the network to lock down lower performance devices, then a small portion can be sent without additional processing. Encoded video. This smaller video stream facilitates decoding and produces lower resolution video. In this way, the encoded video stream itself is able to adapt to the channel bandwidth it needs to pass and the performance of the target device. These are the qualities of a scalable video codec.

H.264 scalable video codec

The Scalable Video Codec (H.264 SVC), extended to H.264, is designed to deliver the benefits of the above ideals. It is based on the H.264 Advanced Video Codec Standard (H.264 AVC) and takes advantage of the tools and concepts of the original codec. However, the encoded video it produces is spatially extensible and scalable in terms of video quality. That is, it is capable of producing decoded video at different frame rates, resolutions or quality levels.

The SVC extension introduces a concept that the original H.264 AVC codecâ€”the layers within the encoded videoâ€”does not exist. The base layer encodes the minimum temporary, spatial, and quality performance of the video stream. The enhanced layers use the base layer as a starting point to encode additional information to reconstruct the high quality, high resolution or temporary video version during the decoding process.

By decoding the base layer and only the required later enhancement layers, the decoder can generate a video stream with the desired features. Figure 1 shows the hierarchical structure of the H.264 SVC stream. In the encoding process, a special layer is carefully decoded using a reference to only the lower layers. In this way, the encoded stream can be truncated at any point, but still maintains a valid, decodable video stream.

Figure 1: H.264 SVC hierarchy.

This layered approach allows a generated encoded stream to be truncated to limit the bandwidth consumed or to reduce the computational requirements of the decoding. The puncturing process is constructed entirely by extracting the layers required from the encoded video stream. This process can even be performed on the network.

Figure 2: Adjusting the level to reduce bandwidth and resolution.

That is, as the video stream transitions from high bandwidth to a lower bandwidth network (eg, from Ethernet to handheld over a WiFi link), the size of the video stream is adjusted for the available bandwidth. In the above example, the size of the video stream and the decoding capabilities of the hand-held decoder are adjusted for the bandwidth of the wireless link. Figure 2 shows an example of a PC turning a low-bandwidth video stream into a mobile device video stream.

H.264 SVC Secret

To achieve temporary scalability, H.264 SVC links its reference frames and predicted frames, which is slightly different from traditional H.264 AVC encoders. SVC uses a hierarchical prediction structure, as shown in Figure 3, rather than the relationship between traditional intraframes (I frames), bidirectional frames (B frames), and predicted frames (P frames).

Figure 3: The relationship between traditional I, P, and B frames.

The hierarchy defines the temporary layering of the final video stream. Figure 4 depicts a possible hierarchical structure. In this particular example, each frame is predicted based only on the last frame that occurred. This ensures that the structure not only shows temporary scalability, but also shows low latency.

Figure 4: Hierarchical prediction frame in SVC.

This scheme has four nested temporary layers: T0 (base layer), T1, T2, and T3. A frame composed of layers of T1 and T2 is predicted only by each frame in the T0 layer. Each frame in the T3 layer is predicted only by each of the T1 or T2 layers.

In order to play the encoded frame at a rate of 3.75 frames per second, only the frames constituting T0 need to be decoded. All other frames can be discarded. In order to play at 7.5 fps, the layers constituting T0 and T1 are decoded. Each frame in T2 and T3 is discarded. Similarly, if the frames constituting T0, T1, and T2 are decoded, the resulting video stream will be played at 15 fps. If all frames are decoded, then a full 30fps video stream is recovered.

In contrast, in H.264 SVC (for Baseline Profile, only bidirectionally predicted frames are applied), all frames need to be decoded regardless of the required display rate. In order to switch to a low bandwidth network, the entire video stream needs to be decoded, unwanted frames can be discarded and then re-encoded.

Spatial scalability in H.264 SVC follows a similar principle. In this case, each frame of lower resolution is encoded as a base frame. The decoded and upsampled base frames are used to predict higher order layers. The additional information needed to reconstruct the original scene details is encoded as a separate enhancement layer. In some cases, reusing motion information can further increase coding efficiency.

Simultaneous simulcasting with SVC

Extensibility-related overhead exists in H.264 SVC. As we can see in Figure 3, the distance between the reference frame and the predicted frame is longer than the conventional frame structure at time (e.g., from T0 to T1). In scenes with high moving images, this can result in less efficient compression. In order to manage the hierarchical structure of the video stream, there is also an associated overhead.

Overall, SVC video with three layers of temporary scalability and three levels of spatial scalability may be larger than H.264 video streams with full resolution and full frame rate video without scalability. %the above. If the H.264 codec is used to emulate scalability, multiple encoded video streams are required, resulting in higher bandwidth requirements or expensive decoding and secondary encoding throughout the network.

Additional benefits of SVC:

Error recovery

A traditional implementation of error recovery is to add additional information to the video stream to monitor and correct errors. SVC's layered approach means that high-level error monitoring and correction can be performed on a smaller base layer without adding significant overhead. If the same level of error monitoring and correction is to be applied to the AVC video stream, then the entire video stream needs to be protected, resulting in a larger video stream. If the error is detected in the SVC video stream, then the resolution and frame rate can be gradually degraded until - if needed - only the highly protected base layer can be used. In this way, degradation under noisy conditions is more acceptable than in the H.264 AVC environment.

Storage management

Since the SVC video stream or file can still be decoded even if it is truncated, the SVC can be used both during the transmission process and after the file is stored. By storing the decomposed file on the disc and canceling the enhancement layer, the file size can be compressed without further processing of the video stream stored in the file. This is not possible for AVC files that require disc management for "either all or nothing".

Content management

An SVC video stream or file inherently contains a video stream of lower resolution and frame rate. These video streams can be used to accelerate the application of video analytics or to classify various algorithms. Temporary scalability also makes it easy for video streams to search in a fast forward and backward manner.

Applications

A typical application for H.264 SVC is the monitoring system (Stretch offers market-leading solutions in this area, please visit its website for more details). For example, in the case where an IP camera feeds video into a control room where video content is stored, basic motion monitoring analysis is run on the video stream. On the control room display, the fed video is viewed at the camera's maximum resolution (1280 x 720) and stored on the save disc space at a resolution of D1 (720 x 480). The first reaction team also accesses the video stream on the mobile terminal in the on-site reaction vehicle. The resolution of those displays is CIF (352 x 240), while the video stream has a servo rate of 7 fps.

In the process of implementing with H.264 AVC, the primary constraint may be that the camera is servoing multiple video streams. In this example, one resolution is 1280x720 and the other resolution is 720 x 480. This adds extra cost to the camera, but allows the video stream to be recorded directly between controls while another video stream is decoded and displayed.

Without this feature, expensive decoding, resizing, and re-encoding steps are required. The D1 stream can also be decoded and resized to convert to CIF resolution for feeding into video analysis (tools) running on the video stream. The CIF resolution is diminished in time to achieve 7 frames per second and re-encoded to allow the first reaction car to be utilized over the wireless link. Figure 5 shows a system that might be implemented using H.264 AVC.

Figure 5: Video surveillance application for H.264 AVC.

With the H.264 SVC codec, you can relax the requirements of multiple video streams on the camera, reduce system complexity, and compress the network bandwidth between the camera and the control. The full 1280 x 720 video stream can now be stored on a Network Video Recorder (NVR), which can be easily decomposed to create a D1 (or CIF) video stream that vacates the disc space after a given period of time. come out. The CIF video stream can be directly servoed by the NVR for analysis work, while the second video stream with reduced frame rate can be provided for use by the first reaction vehicle. Figure 6 shows one possible implementation of H.264 SVC.

Figure 6: H.264 SVC video surveillance application.

Therefore, there is no need to operate on the video stream itself, and it is sufficient to operate on the stored files. The advantages are obvious:

Compress network bandwidth;

Flexible storage management;

The steps of expensive decoding and secondary encoding are eliminated;

High-definition video on the NVR can be used for archiving if needed;

Summary of this article

Scalable video codecs have been developed for many years. The broadcasting industry is strictly controlled by various established standards, so it has been slow to adopt this technology. Advances in processors, sensors, and display technologies are igniting a variety of video applications. The Internet and IP technologies are seamlessly serving video to a more diverse and remote community of display devices. Scalable video codecsâ€”such as H.264 SVCâ€”satisfy the needs of many of these systems, and they may drive video to be widely adopted as a catalyst for communication media.

About the author

Mark Oliver is Director of Product Marketing at Stretch. As a British native, Oliver received electrical and electronic engineering at the University of Leeds. During his ten years at Hewlett Packard, he managed HP's engineering and manufacturing functions divisions in Europe and the US, and then led product marketing and application activities in video-related startups. Prior to joining Stretch, he was a marketing manager for internal video and imaging products at the Xilinx DSP division.

Storage Product

WiFi HDD Storage,Multibay HDD Storage,HDD Docking Station

HDD Protector Box, Express Card Co., Ltd. , http://www.chhddstorage.com