Video Segmentation partially decoded bitstrteam

Isil Kayaalp, G. Bozdagi Akar

The use of video in multimedia and WEB increases rapidly with the recent developments in video compression and transmission. However, it is not easy to manage video like text or images because of the complex information conveyed in it. Video data have to be analyzed semantically in order to convey a scene description. Recently, the field of video partioning (scene cut detection) received increased attention in order to easy the management of video streams. Most of the work in this area is performed over uncompressed video. However, if a video source is provided in the compressed form, these operations can not be performed until that representation has been decompressed. Some work, however, has been done to attempt to perform scene cut detection on compressed video. The most common information that is being utilized for video partioning in the compressed domain is composed of DCT's DC and AC coefficients and motion vectors. An example of scene cut detection in an MPEG stream that uses DC information is the method due to Meng, Juan, and Chang [1]. In this method, DC coefficients are used to detect cuts on I-frames and ratio of motion vectors are used to detect cuts on P- and B-frames. Another method that uses DC values has been proposed by Yeo and Liu [2]. They used the difference and histogram of DC images extracted from MJPEG and MPEG video for scene analysis. In another work [3], DC images and coding information such as motion vectors are again used for scene cut detection. Feng proposed an algorithm that utilizes AC information in addition to DC information. The significant change in DC and AC information is detected by comparing the number of bits for each macroblock [4]. Zhang proposed a hierarchical method where I frames are used as an initial step for cut detection [5]. If higher resolution of shot boundary is required, motion vectors are used among the intervening B-frames.

In all of those methods, the bitstream of the compressed video has to be partially decoded up to a level where the required information can be extracted. If the utilized information increases, the decoding process takes more time. However, if an accurate pre-segmentation can be done by minimum decoding, further improvements can be done based on the resolution required by the user. In [6], we proposed a simple algorithm for this purpose. The algorithm depends on the number of bits spent for each frame in an MPEG (or MJPEG) compressed data. Such data can be extracted without decompressing the stream in any form. However, the algorithm has limited capabilities based on the limited amount of information used in the decision. In this work, we extended this work such that more robust segmentation can be done. The algorithm depends on the number of bits spent for each frame in a compressed video stream together with the frame type information.
 
 
 

 REFERENCES