Post published over one year ago
According to :
There’re 6 main module in VVC procedure, and the like H.265, main compression is in
motion prediction ,
transform/quantization and the optimization of
entropy coding. Also, loop filter plays an important role here, according to , the effect of
Adaptive Loop Filter (ALF) is quite obvious.
In 4 basic mode that VTM provided (All intra(AI), randomly access(RA), low delay B-frame(LD), low delay P-frame(LDP)).The all-intra one do not have motion predictor but intra predictor only. According to  , AI,RA,and LD is asked in tool evaluation. And in the attachment table of , RA mode is is used for summary. Thus, the experiments next will run at RA first.
To run with default configuration , we can check if VTM works functionally, and get to know how much time it will take, and the differences between these modes. Technically I-frame only will be the fastest one, because it does not need to generate motion predictor, other modes’ speeds are depend on specific GOP structure.()
Frame To Encode: 30
Config file: Default.
Total Frames | Bitrate Y-PSNR U-PSNR V-PSNR YUV-PSNR 30 a 299.6933 30.1197 29.9015 29.7325 29.8497 Total Time: 16428.927 sec. [user] 16428.927 sec. [elapsed]
Total Frames | Bitrate Y-PSNR U-PSNR V-PSNR YUV-PSNR 30 a 28.2827 36.0232 42.3287 37.5540 36.6602 Total Time: 4385.215 sec. [user] 4385.214 sec. [elapsed]
According to , a evaluation procedure asks a complete encode-decode results, with time and PSNR info. Since the evaluation table of
Term project –VVC does not mark the LD mode as M and time is limited, so I will only test AI and RA mode below. Also I will adjust QP to decrease running time. According to JVET-N1010, we should use
37 to test.
Noticed that we can design improved method into the six main module of VVC procedure (see diagram 1). And basically all proposal about enhancement tools are based on these module. I picked ISP  as our to-run experiments, since it has been accepted in VTM 7  , so there’s a easy way to enable it by adding parameters. And also I took a look at CST  , which is not accepted yet. The author of CST provided their algorithm with corresponding codes at .
ISP is the latest version of LIP, whose main thought was re-designed in HEVC. ISP is a improvement tool in module
intra-prediction of figure 1.
ISP is based on LIP, whose purpose is in meeting K, . The main thought is to use intra motion predictor to decrease the size of encoded file.
LIP looks like Intra Prediction in HEVC,(): “At the same time, the residue compensation is introduced to calibrate the prediction of boundary regions in a block when we utilize further reference lines. ” And in , we can see that the predictor contains a block with a vector, and the approach is to find reference line, and further reference lines are also utilized.
In, the authors provided an approach to use a horizontal or vertical to make partitions at these blocks, like figure 3 below.
Since the approach in  do not have a significant limitation of number of partitions, an entropy decoding procedure is used to make prediction for each line:
Because this procedure have to be run at each line in each partitions, so the complexity will grow with the number of partitions.
Due to high complexity, meeting L held many conversation about the tradeoff between complexity and performance. Also there are many evaluation (,), which shows that thought LIP has a good effect on encoding, but the memory access cost is high. Especially the number of pixels may be less than 16 in a single partitions. Thus the strategy after meeting L is :
|Block Size||Number of Sub-Partitions|
|4×8 and 8×4||2|
|All other cases||4|
LIP is changed into ISP after meeting L.
The strategy after meeting M is:
|lock size||Coefficient group size|
|All other possible M×N cases||4×4|
Some proposal () also said that the cost is still high, with the proposal of (), which contains 2 approaches to reduce the memory bandwidth by constraining motion vectors in blocks.
Notice that from , which set a limitation: the ISP coding has 64x64 CUs at most. Specific CU size is:
According to  ,the number of partitions is fixed for 8xN blocks, which is the the latest update in the meeting in July, 2019:
“ For 4xN coding blocks and 8xN coding blocks (N > 4) that are coded using ISP with vertical split, the prediction region is specified to be of size 4xN. ”
Besides,  provided a filter approach with optionally Cubic or Gaussian fliter, depends on the size of block. According to the VTM result in , the quality loss is controled within 0.05% in average.
 also provided some adjustment :
This contribution removes the MPM only and PDPC restrictions and applies always the cubic filter on CUs using ISP.
In my opinion, ISP is a series of methods deployed in intra prediction. First it checks all possible reference lines, even some far reference lines to generate possible motion vectors within intra prediction blocks. And then some advance approaches may be involved , like WAIP and PDPC(), which help to construct more precise blocks for objects. Then, the developers of ISP system are trying to find the best strategy to make partition and limit the maximum CU for entropy coding next. According to the knowledge of Information theory, we are actually finding the minimum quantification $q$ to get the best rate of coding. Besides, recent proposals are trying to use different partition form, which need lots of testing and evaluation.
Modern features of video ,like the shape of objects inside is usually rectangle, is playing a important role in the design ISP. I noticed that the shape of partition is rather horizontal or vertical for this reason, and we already use such thoughts in current video code. Based on the main purpose of VVC——high and extremely high resolution, we do need more partition in intra coding, and I think the main effect in these high resolution videos is come from inter-frame prediction. It is more possible to construct precise vector between frames, since we can use lots of information here, which is helpful to find suitable codebook.
At last, I think we can use different partition strategy which depends on the specific resolution. Although current approaches already have such feature: if the resolution is too low, then the size of calculated partition will be too low, then according to strategy table(table 1 and table 2 above), these partition will not be made. And I think maybe we can directly set the stategy for corresponding resolution. But everything here is just a “guess”.
According to , since ISP is accepted into VTM, we can change it by adding parameters.
Stage 1 is to use both 甲 and 乙 for A1 video, and to run RA32 and AI37 (the numbers mean QP) tentatively.
Frame To Encode: 30
Config file: Reference(ISP enabled) , Test(ISP disabled)
sequence: Campfire, FoodMarket4,Tango
According to JVET-N1010, we should run a full test on A1 class with 4 kinds of QP values.
The result is here :
(Because I have no more time , RA22 isn’t finished)
All the data can be found in evaluation excel file. Here are some screenshots.
The reference of experiences of evaluation is :
Similar with the standard evaluation result, while we have longer encode/decoder time in average, and PSNR is unsteady, while the degree of changes is close to standard result.
The main difference is the encoding time. I speculate that the reason is the resolution in my test is much lower than 4K, which may result in low efficiency when using ISP——the number of intra vector is fewer than standard testing, which decreases the effect of ISP. And in this case, the time cost of generate ISP for frames will be correspondingly high, while the partition operation is hard to implement at this resolution. Besides,ISP tool have to decide and construct partition marks in procedure. Additionally, since some other tools are enabled by default, so there may exists some coeffect to influent encode efficiency.
As for unsteady PSNR , I speculate that the reason is our cut method is like a “zoom-in” for source 4k video. This will result in blur and much lower inside information, and a much important point is that there will be more “jump in” objects in low-resolution video, and that is why the result between different video varies.
Technically, since ISP use partitions for each block with size limitation, it may can’t work fine at this resolution and this type of video. And we can see that in fig. 10, there’s a little negative gap, which means ISP results in worse quality in same bitrate, which means ISP does help compressing.(Same quality, lower bitrate) ( I don’t know what the wired reentry curve at the top is)
At last, a general phenomenon is the lower the QP is, the more “complex” the binary code will be. “Complex” here means longer encoder/decode time, higher bitrate, and higher PSNR. A few records of encode/decode times do not follow this rule. I think the reason is that they are from stage 1 of this experiments. So my computer’s condition is different compared with stage 2, for I was running other program. CPU scheduling will not always be same.
batto run lots of the decode/encode automatically, which will increase efficiency, and create more data about this experiment, which will be positive for analysis and conclusion.
VVC and VTM documents:
 Soft-manual of VTM 7.0
Tool documents of ISP(Assigned by P0013):
 JVET-N0308-V2 Restriction of the maximum CU size for ISP to 64×64
Tool documents of CST:  JVET-N0137-R1 Introduction of CST  JVET-N0137-WD-CE3 Algorithm description of CST
 Li, Jiahao et al. “Efficient Multiple-Line-Based Intra Prediction for HEVC.” IEEE Transactions on Circuits and Systems for Video Technology 28.4 (2018): 947–957. Crossref. Web.
 JEVT-L0023 CE3: Summary Report on Intra Prediction and Mode Coding
 JEVT-L1023 Description of Core Experiment 3 (CE3): Intra Prediction and Mode Coding
 JEVT-L0319 CE4-related: Sub-block MV clipping in planar motion vector prediction
 JEVT-L0131 Harmonization of Linear interpolation intra prediction (LIP) with Simplified position dependent intra prediction combination (PDPC) and wide-angle intra prediction (WAIP)
 JEVT-M0485 Sub-block MV clip in planar motion vector prediction