x264学习笔记

更新记录

2022-04-28 第一次编辑

正文

H.264

H.264简介

H.264是目前最为主流的视频编码标准,同时也是MPEG-4第十部分,一般称为H.264/AVC。对比H.264以前广泛使用的视频编码,H.264提供更低的码率和更高的视频质量,同时也提供了更强的容错能力和良好的IP网络适应性。H.264编码在推出之后被视频点播和实时流媒体领域被广泛使用,在标准成立之初,H.264主要提供三个档次的流媒体能力:基本档次(baseline profile),扩展档次(Extended profile)和主要档次(main profile);其中,baseline profile被广泛应用于实时流媒体传输,例如视频会议,远程医疗等,提供低码率,低延时,高质量的流媒体视频流。

图像质量PSNR

PSNR即图像峰值信噪比(Peak Signal to Noise Ratio),作为一种客观的图像质量评价标准,在图像处理相关的工程中被广泛使用,用于评价图像在经过压缩处理之后,获取到的处理图像的质量对比原始图像能否让人满意。其计算公式如下(其中,MSE是原图像与处理图像之间均方误差):
$$
PSNR=10*log_{10}(\frac {(2^n-1)^2}{MSE})
$$
作为一种客观图像质量评价方法,对比真实的主观图像质量评测,PSNR仍然存在一定误差。有时PSNR高的图像并不一定能够比PSNR低的图像带来更好的主观图像质量,这是因为人眼对图像质量误差的敏感度并非绝对相同,而是对部分特定场景敏感度更高。H.264基于大量实验数据利用人眼的敏感度进行了大量的优化。

帧类型和参考帧

H.264编码中,图像存在不同的帧类型;一般可以分为IDR帧,I帧,P帧,B帧,还有SP/SI等帧类型。同时,H.264还会产生SPS、PPS和SEI等NALU单元。下面简单介绍一下主要使用的帧类型(一些扩展类型这里不提):

  1. IDR帧 Instantaneous Decoding Refresh 即时解码刷新帧,一定是一个I帧,解码器在接收到IDR帧时,会清空参考帧列表,并重置部分参数。IDR帧后的帧永远不会参考IDR帧前的帧。
  2. I帧 Intra-coded picture 关键帧,只做帧内预测,只需要当帧画面即可解锁
  3. P帧 Predictive-coded Picture 前向预测帧,会参考可以之前的参考帧进行帧间预测
  4. B帧 Bidirectionally predicted picture双向预测帧,会参考前向和后向的参考帧进行帧间预测
  5. SPS Sequence Parameter Set,序列参数集;包含profile,level,分辨率等重要信息,代表了一组编码序列的全局参数
  6. PPS Picture Parameter Set,图像参数集,保持了图像编码时所使用的参数
  7. SEI Supplementary Enhancement Information,图像增强信息,会携带一些自定义信息和对解码可能有帮助的信息

CABAC和CAVLC

熵编码的两种方式,熵编码是一种无损压缩的方式。H.264在处理完图像数据之后,通过熵编码进一步提高了编码压缩率。其中CABAC为基于上下文的自适应二进制算数熵编码,CAVLC为基于上下文自适应的可变长编码。实时流媒体传输一般来说baseline profile,只支持CAVLC。

视频码率控制

图像是由一个一个的像素点组成,视频则又是由连续的图像组成,因此传输原始的无损图片会占用极大的带宽。视频编码的主要目的就在于,通过有损的压缩,在保证图像质量的同时尽可能的降低视频传输时占用的带宽,即视频码率。显然,视频的质量越高,视频的码率必然越高,进行码率控制,就是要求视频编码在质量和码率直接达到一定平衡。

CBR VS VBR

CBR即恒定码率控制(Constant Bitrate),指定一个比特率,每一帧图像都按照这个比特数进行编码。这种编码方式在音频编码中非常常见,但是在视频编码领域,由于H.264当中不同帧类型要求的比特数本来就不同,同时不同的场景对于比特数的需求也不相同,复杂的场景显然需要更多的比特数,而简单固定的场景并不需要很多的比特数进行编码。导致固定码率,意味着对复杂的场景可能出现画质很差,对简单恒定的场景出现大量的比特浪费。

为了修正CBR存在的问题,视频编码就出现了VBR的码率控制方法,即可变码率控制(Variable Bitrate),用于在给定比特限制下保持图像的最高质量。简单来说,VBR的核心思想在于,对于复杂的难编码的场景给予更多的比特,对于简单的易于编码的场景使用更少的比特,从而达成始终以较少的比特保持一定的图像质量。

可变比特率类型的码率控制方式在视频码率控制当中是主流方式。

编码应用场景

可变码率控制方式,对于不同的场景给予不同的比特数;同时,实际使用视频的场景也对视频的码率具有影响。不同的profile之间,由于启用的压缩技术不同,码率差距和编码速度也有区别。对于视频应用来说,一般有如下几种应用场景:

  1. 视频存档:单纯的保存视频,不关心编码速度,最重要的指标是视频质量和视频大小,可以预知场景
  2. 流媒体点播:通过网络传输视频,不关心编码速度,在保证视频质量的时候视频码率不能超过带宽,尽量不占用带宽,可以预知场景
  3. 流媒体直播:通过网络传输实时视频,要求编码速度快,码率不超过带宽的情况下尽可能低,同时保证视频质量,不能预知场景
  4. 面向特定存储介质:将媒体刻录到DVD等介质上,要求编码出的视频大小刚好占满介质容量,并且质量高,可以预知场景

不同的编码场景对视频指标的关注点不尽相同,其中流媒体直播即实时流媒体传输,对视频编码的要求更高。在实时流媒体传输当中,最重要的是保持流媒体的实时性,因此一般只采用baseline profile。同时,实时流媒体传输同样关注视频码率和视频质量,要求在码率足够小的同时质量足够好。并且,不能预知场景使得实时流媒体传输难以根据特定场景应用场景的优化参数。

QP

量化参数(Quantization Parameter,QP)控制着压缩大小。QP越大压缩率越高同时质量越低,QP越小压缩率越低同时质量越高。在H.264中,QP的范围是0-51间的整数。一般而言,控制码率就是控制编码的比特数,也即控制编码的QP值

TODO QP的原理

码率控制方式

CQP

恒定QP值(Constant QP)是通过保证每帧画面QP值固定进行码率控制的方法,可以发现,通过恒定QP完全无法控制码率,每帧画面编码出的大小完全受限于画面复杂度。因此,在x264文档中只推荐启用CQP用于研究,而不推荐使用CQP控制码率。

ABR

平均码率控制模式(Average Bitrate)是通过控制整个码流的平均码率,从而控制码率的模式。ABR并不要求整个视频码率恒定,但是面对突发的过高码率,ABR会压低后续画面的码率以保持平均码率。因此,ABR虽然简单,但是由于在大部分场景中,编码器无法知道接下来的编码帧需要多大的码率,从而导致在画面变化时画面质量波动十分剧烈,使得视频质量低于预期。因此,在大多数情况都不应该使用ABR模式。

2-PASS ABR

ABR模式因为很难预测下一帧画面的消耗,往往导致码率控制效果很差。但是,如果允许编码器进行多次编码,编码器在第二次及之后的编码过程中就可以提前知道每一帧编码的消耗,从而更好的分配比特,这就是2-PASS ABR控制方式。在第一次编码时,编码器首先获得了视频编码的消耗,从而在第二次编码时更合理的分配,一般而言第二次编码会应用ABR码率控制,第一次编码CQP即可。

2-PASS编码能够在给定码率下达到较好质量,但是仍然存在一些问题。首先,如果给定的码率不足以保证图像质量,画面质量还是会低于预期,为了获得最好质量必须要有一个经验性的码率(需要多次实验);其次,2-PASS码率控制进行两次编码,这对于实时流媒体来说基本是无法忍受的;最后,码率波动同样无法避免,如果瞬时码率过高,流媒体客户端的接受能力就必须纳入考虑。

CRF

恒定视频质量( Constant Rate Factor)通过指定crf值,使编码器试图保证输出的图像质量恒定。crf值是一个类似于QP的概念,8bit的H.264编码的crf值理论范围是[0,51],在x264中,默认的crf值为23。crf值每增长6,输出的视频码率就变为原来的1/2,crf值和视频码率满足对数关系。在理论上,crf = 0 即完全无损压缩,crf = 51 即最大程度的压缩。在实际实践中,一般认为crf = 18 即为H.264的无损画面,因此在主观图像质量评价中,crf值一般使用[18,28]作为范围,有时也会应用到17,29等边界值用于极端情况。

使用CRF进行码率控制,可以达到比较优质的图像质量,但是类似于CQP,CRF并不能对码率做出很好的保证。

VBV

视频缓存验证(Video Buffering Verifier)通过设定视频流的码率上界和码率下界,保证媒体流既不会高于某个特定码率,也不会低于某个特定码率,从而完成码率的控制。使用VBV需要设置编码器的buffsize,一般设置为码率上界的两倍。

VBV的特性保证了在实时流媒体传输中,视频编码的码率总是被约束在接受方的能力范围内,因此在实时流媒体传输中被普遍使用,同时VBV实际上与2-PASS和CRF等控制方式兼容,因此可以做到结合使用。

CRF + VBV

由于CRF和VBV完全兼容,因此可以同时启用这两个策略,在一定码率下保证画面的画质。

各个码率控制方式对比总结

控制方式应用面不适合的场景
CQP可以用于研究几乎所有实际落地场景
ABR要求简单快速除非必须极低延时,几乎所有场景不应该使用
2-PASS ABR面向特定场景,非实时的处理要求低延时的场景,快速编码场景
CRF对图像质量有要求的场景对码率波动十分敏感的场景
VBV实时流媒体,带宽受限的流媒体存档类的编码
CRF + VBV实时流媒体,带宽受限的流媒体存档类的编码

参考文档 [x264-devel] Making sense out of x264 rate control methods

x264

x264代码框架

文件组织如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
 > tree -L 1
.
├── AUTHORS
├── autocomplete.c
├── common //通用代码
├── config.guess
├── config.h
├── config.log
├── config.mak
├── config.sub
├── configure
├── COPYING
├── doc //文档
├── encoder //编码器代码
├── example.c //示例
├── extras
├── filters
├── input
├── Makefile
├── output //输出格式
├── tools
├── version.sh
├── x264.c
├── x264cli.h
├── x264_config.h
├── x264dll.c
├── x264.h
├── x264.pc
├── x264res.manifest
└── x264res.rc

完整树如下(删减了git相关的文件):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
> tree -a
.
├── AUTHORS
├── autocomplete.c
├── common
│   ├── aarch64
│   │   ├── asm-offsets.c
│   │   ├── asm-offsets.h
│   │   ├── asm.S
│   │   ├── bitstream-a.S
│   │   ├── bitstream.h
│   │   ├── cabac-a.S
│   │   ├── dct-a.S
│   │   ├── dct.h
│   │   ├── deblock-a.S
│   │   ├── deblock.h
│   │   ├── mc-a.S
│   │   ├── mc-c.c
│   │   ├── mc.h
│   │   ├── pixel-a.S
│   │   ├── pixel.h
│   │   ├── predict-a.S
│   │   ├── predict-c.c
│   │   ├── predict.h
│   │   ├── quant-a.S
│   │   └── quant.h
│   ├── arm
│   │   ├── asm.S
│   │   ├── bitstream-a.S
│   │   ├── bitstream.h
│   │   ├── cpu-a.S
│   │   ├── dct-a.S
│   │   ├── dct.h
│   │   ├── deblock-a.S
│   │   ├── deblock.h
│   │   ├── mc-a.S
│   │   ├── mc-c.c
│   │   ├── mc.h
│   │   ├── pixel-a.S
│   │   ├── pixel.h
│   │   ├── predict-a.S
│   │   ├── predict-c.c
│   │   ├── predict.h
│   │   ├── quant-a.S
│   │   └── quant.h
│   ├── base.c
│   ├── base.h
│   ├── bitstream.c
│   ├── bitstream.h
│   ├── cabac.c
│   ├── cabac.h
│   ├── common.c
│   ├── common.h
│   ├── cpu.c
│   ├── cpu.h
│   ├── dct.c
│   ├── dct.h
│   ├── deblock.c
│   ├── frame.c
│   ├── frame.h
│   ├── macroblock.c
│   ├── macroblock.h
│   ├── mc.c
│   ├── mc.h
│   ├── mips
│   │   ├── dct-c.c
│   │   ├── dct.h
│   │   ├── deblock-c.c
│   │   ├── deblock.h
│   │   ├── macros.h
│   │   ├── mc-c.c
│   │   ├── mc.h
│   │   ├── pixel-c.c
│   │   ├── pixel.h
│   │   ├── predict-c.c
│   │   ├── predict.h
│   │   ├── quant-c.c
│   │   └── quant.h
│   ├── mvpred.c
│   ├── opencl
│   │   ├── bidir.cl
│   │   ├── downscale.cl
│   │   ├── intra.cl
│   │   ├── motionsearch.cl
│   │   ├── subpel.cl
│   │   ├── weightp.cl
│   │   └── x264-cl.h
│   ├── opencl.c
│   ├── opencl.h
│   ├── osdep.c
│   ├── osdep.h
│   ├── pixel.c
│   ├── pixel.h
│   ├── ppc
│   │   ├── dct.c
│   │   ├── dct.h
│   │   ├── deblock.c
│   │   ├── deblock.h
│   │   ├── mc.c
│   │   ├── mc.h
│   │   ├── pixel.c
│   │   ├── pixel.h
│   │   ├── ppccommon.h
│   │   ├── predict.c
│   │   ├── predict.h
│   │   ├── quant.c
│   │   └── quant.h
│   ├── predict.c
│   ├── predict.h
│   ├── quant.c
│   ├── quant.h
│   ├── rectangle.c
│   ├── rectangle.h
│   ├── set.c
│   ├── set.h
│   ├── tables.c
│   ├── tables.h
│   ├── threadpool.c
│   ├── threadpool.h
│   ├── vlc.c
│   ├── win32thread.c
│   ├── win32thread.h
│   └── x86
│   ├── bitstream-a.asm
│   ├── bitstream.h
│   ├── cabac-a.asm
│   ├── const-a.asm
│   ├── cpu-a.asm
│   ├── dct-32.asm
│   ├── dct-64.asm
│   ├── dct-a.asm
│   ├── dct.h
│   ├── deblock-a.asm
│   ├── deblock.h
│   ├── mc-a2.asm
│   ├── mc-a.asm
│   ├── mc-c.c
│   ├── mc.h
│   ├── pixel-32.asm
│   ├── pixel-a.asm
│   ├── pixel.h
│   ├── predict-a.asm
│   ├── predict-c.c
│   ├── predict.h
│   ├── quant-a.asm
│   ├── quant.h
│   ├── sad16-a.asm
│   ├── sad-a.asm
│   ├── trellis-64.asm
│   ├── util.h
│   ├── x86inc.asm
│   └── x86util.asm
├── config.guess
├── config.h
├── config.log
├── config.mak
├── config.sub
├── configure
├── COPYING
├── doc
│   ├── ratecontrol.txt
│   ├── regression_test.txt
│   ├── standards.txt
│   ├── threads.txt
│   └── vui.txt
├── encoder
│   ├── analyse.c
│   ├── analyse.h
│   ├── api.c
│   ├── cabac.c
│   ├── cavlc.c
│   ├── encoder.c
│   ├── lookahead.c
│   ├── macroblock.c
│   ├── macroblock.h
│   ├── me.c
│   ├── me.h
│   ├── ratecontrol.c
│   ├── ratecontrol.h
│   ├── rdo.c
│   ├── set.c
│   ├── set.h
│   ├── slicetype.c
│   ├── slicetype-cl.c
│   └── slicetype-cl.h
├── example.c
├── extras
│   ├── avisynth_c.h
│   ├── avxsynth_c.h
│   ├── cl.h
│   ├── cl_platform.h
│   ├── getopt.c
│   ├── getopt.h
│   ├── intel_dispatcher.h
│   ├── inttypes.h
│   └── stdint.h
├── filters
│   ├── filters.c
│   ├── filters.h
│   └── video
│   ├── cache.c
│   ├── crop.c
│   ├── depth.c
│   ├── fix_vfr_pts.c
│   ├── internal.c
│   ├── internal.h
│   ├── resize.c
│   ├── select_every.c
│   ├── source.c
│   ├── video.c
│   └── video.h
├── input
│   ├── avs.c
│   ├── ffms.c
│   ├── input.c
│   ├── input.h
│   ├── lavf.c
│   ├── raw.c
│   ├── thread.c
│   ├── timecode.c
│   └── y4m.c
├── Makefile
├── output
│   ├── flv_bytestream.c
│   ├── flv_bytestream.h
│   ├── flv.c
│   ├── matroska.c
│   ├── matroska_ebml.c
│   ├── matroska_ebml.h
│   ├── mp4.c
│   ├── mp4_lsmash.c
│   ├── output.h
│   └── raw.c
├── tools
│   ├── bash-autocomplete.sh
│   ├── checkasm-aarch64.S
│   ├── checkasm-a.asm
│   ├── checkasm-arm.S
│   ├── checkasm.c
│   ├── cltostr.sh
│   ├── countquant_x264.pl
│   ├── digress
│   │   ├── cli.py
│   │   ├── comparers.py
│   │   ├── constants.py
│   │   ├── errors.py
│   │   ├── __init__.py
│   │   ├── scm
│   │   │   ├── dummy.py
│   │   │   ├── git.py
│   │   │   └── __init__.py
│   │   └── testing.py
│   ├── gas-preprocessor.pl
│   ├── msvsdepend.sh
│   ├── q_matrix_jvt.cfg
│   └── test_x264.py
├── version.sh
├── x264.c
├── x264cli.h
├── x264_config.h
├── x264dll.c
├── x264.h
├── x264.pc
├── x264res.manifest
└── x264res.rc

代码框架分析

TODO

x264代码分析

在编译完成之后,x264分为可执行程序x264和库libx264。先从可执行程序x264看起,了解大概流程。

整体调用分析

x264调用流程

注:图中橙色函数为实际调用需要的x264 API,红色函数为目前认为和码率控制有关的函数,绿色为注释

main函数

x264.c中定义了x264的main函数,这个函数处理命令行参数,设置信号;然后进行编码处理和完成后的清理工作。在该函数中,主要的核心函数为prase()和encode(),核心结构体为x264_param_t和cli_opt_t,如下所示:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
REALIGN_STACK int main( int argc, char **argv )
{
if( argc == 4 && !strcmp( argv[1], "--autocomplete" ) )
return x264_cli_autocomplete( argv[2], argv[3] );

x264_param_t param; //x264对外提供的参数,定义在x264.h中
cli_opt_t opt = {0}; //cli参数,定义在x264.c中
int ret = 0;

FAIL_IF_ERROR( x264_threading_init(), "unable to initialize threading\n" );

#ifdef _WIN32
FAIL_IF_ERROR( !get_argv_utf8( &argc, &argv ), "unable to convert command line to UTF-8\n" );

GetConsoleTitleW( org_console_title, CONSOLE_TITLE_SIZE );
_setmode( _fileno( stdin ), _O_BINARY );
_setmode( _fileno( stdout ), _O_BINARY );
_setmode( _fileno( stderr ), _O_BINARY );
#endif

// 此处对命令行参数进行解析,初始化x264参数
x264_param_default( &param );
/* Parse command line */
if( parse( argc, argv, &param, &opt ) < 0 )
ret = -1;

#ifdef _WIN32
/* Restore title; it can be changed by input modules */
SetConsoleTitleW( org_console_title );
#endif

/* Control-C handler */
signal( SIGINT, sigint_handler );

//编码操作
if( !ret )
ret = encode( &param, &opt );

/* clean up handles */
if( filter.free )
filter.free( opt.hin );
else if( opt.hin )
cli_input.close_file( opt.hin );
if( opt.hout )
cli_output.close_file( opt.hout, 0, 0 );
if( opt.tcfile_out )
fclose( opt.tcfile_out );
if( opt.qpfile )
fclose( opt.qpfile );
x264_param_cleanup( &param );

#ifdef _WIN32
SetConsoleTitleW( org_console_title );
free( argv );
#endif

return ret;
}

继续分析两个关键函数

parse( argc, argv, &param, &opt )

prase函数用于解析命令行参数,并对x264_param_t进行初始化。prase主要调用了以下函数(位于common/base.c)进行设置设置:

  1. x264_param_default_preset - 根据preset和tune设置参数
  2. x264_param_default - 设置默认参数
  3. x264_param_parse - 解析用户自带的参数
  4. x264_param_apply_profile - 设置profile
x264_param_default(common/base.c)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
/****************************************************************************
* x264_param_default:初始化默认参数,因此会设置全部参数的默认值
****************************************************************************/
REALIGN_STACK void x264_param_default( x264_param_t *param )
{
memset( param, 0, sizeof( x264_param_t ) );

/* CPU autodetect */
param->cpu = x264_cpu_detect();
param->i_threads = X264_THREADS_AUTO;
param->i_lookahead_threads = X264_THREADS_AUTO;
param->b_deterministic = 1;
param->i_sync_lookahead = X264_SYNC_LOOKAHEAD_AUTO;

/* Video properties */
param->i_csp = X264_CHROMA_FORMAT ? X264_CHROMA_FORMAT : X264_CSP_I420;
param->i_width = 0;
param->i_height = 0;
param->vui.i_sar_width = 0;
param->vui.i_sar_height= 0;
param->vui.i_overscan = 0; /* undef */
param->vui.i_vidformat = 5; /* undef */
param->vui.b_fullrange = -1; /* default depends on input */
param->vui.i_colorprim = 2; /* undef */
param->vui.i_transfer = 2; /* undef */
param->vui.i_colmatrix = -1; /* default depends on input */
param->vui.i_chroma_loc= 0; /* left center */
param->i_fps_num = 25;
param->i_fps_den = 1;
param->i_level_idc = -1;
param->i_slice_max_size = 0;
param->i_slice_max_mbs = 0;
param->i_slice_count = 0;
#if HAVE_BITDEPTH8
param->i_bitdepth = 8;
#elif HAVE_BITDEPTH10
param->i_bitdepth = 10;
#else
param->i_bitdepth = 8;
#endif

/* Encoder parameters */
param->i_frame_reference = 3;
param->i_keyint_max = 250;
param->i_keyint_min = X264_KEYINT_MIN_AUTO;
param->i_bframe = 3;
param->i_scenecut_threshold = 40;
param->i_bframe_adaptive = X264_B_ADAPT_FAST;
param->i_bframe_bias = 0;
param->i_bframe_pyramid = X264_B_PYRAMID_NORMAL;
param->b_interlaced = 0;
param->b_constrained_intra = 0;

param->b_deblocking_filter = 1;
param->i_deblocking_filter_alphac0 = 0;
param->i_deblocking_filter_beta = 0;

param->b_cabac = 1;
param->i_cabac_init_idc = 0;
//以下为码率控制参数
param->rc.i_rc_method = X264_RC_CRF;//默认crf
param->rc.i_bitrate = 0;
param->rc.f_rate_tolerance = 1.0;
param->rc.i_vbv_max_bitrate = 0;
param->rc.i_vbv_buffer_size = 0;
param->rc.f_vbv_buffer_init = 0.9;
param->rc.i_qp_constant = -1;
param->rc.f_rf_constant = 23;
param->rc.i_qp_min = 0;
param->rc.i_qp_max = INT_MAX;
param->rc.i_qp_step = 4;
param->rc.f_ip_factor = 1.4;
param->rc.f_pb_factor = 1.3;
param->rc.i_aq_mode = X264_AQ_VARIANCE;
param->rc.f_aq_strength = 1.0;
param->rc.i_lookahead = 40;

param->rc.b_stat_write = 0;
param->rc.psz_stat_out = "x264_2pass.log";
param->rc.b_stat_read = 0;
param->rc.psz_stat_in = "x264_2pass.log";
param->rc.f_qcompress = 0.6;
param->rc.f_qblur = 0.5;
param->rc.f_complexity_blur = 20;
param->rc.i_zones = 0;
param->rc.b_mb_tree = 1;

/* Log */
param->pf_log = x264_log_default;
param->p_log_private = NULL;
param->i_log_level = X264_LOG_INFO;

/*analysis相关 */
param->analyse.intra = X264_ANALYSE_I4x4 | X264_ANALYSE_I8x8;
param->analyse.inter = X264_ANALYSE_I4x4 | X264_ANALYSE_I8x8
| X264_ANALYSE_PSUB16x16 | X264_ANALYSE_BSUB16x16;
param->analyse.i_direct_mv_pred = X264_DIRECT_PRED_SPATIAL;
param->analyse.i_me_method = X264_ME_HEX; //六边形搜索
param->analyse.f_psy_rd = 1.0;
param->analyse.b_psy = 1;
param->analyse.f_psy_trellis = 0;
param->analyse.i_me_range = 16;
param->analyse.i_subpel_refine = 7;
param->analyse.b_mixed_references = 1;
param->analyse.b_chroma_me = 1;
param->analyse.i_mv_range_thread = -1;
param->analyse.i_mv_range = -1; // set from level_idc
param->analyse.i_chroma_qp_offset = 0;
param->analyse.b_fast_pskip = 1;
param->analyse.b_weighted_bipred = 1;
param->analyse.i_weighted_pred = X264_WEIGHTP_SMART;
param->analyse.b_dct_decimate = 1;
param->analyse.b_transform_8x8 = 1;
param->analyse.i_trellis = 1;
param->analyse.i_luma_deadzone[0] = 21;
param->analyse.i_luma_deadzone[1] = 11;
param->analyse.b_psnr = 0;
param->analyse.b_ssim = 0;

param->i_cqm_preset = X264_CQM_FLAT;
memset( param->cqm_4iy, 16, sizeof( param->cqm_4iy ) );
memset( param->cqm_4py, 16, sizeof( param->cqm_4py ) );
memset( param->cqm_4ic, 16, sizeof( param->cqm_4ic ) );
memset( param->cqm_4pc, 16, sizeof( param->cqm_4pc ) );
memset( param->cqm_8iy, 16, sizeof( param->cqm_8iy ) );
memset( param->cqm_8py, 16, sizeof( param->cqm_8py ) );
memset( param->cqm_8ic, 16, sizeof( param->cqm_8ic ) );
memset( param->cqm_8pc, 16, sizeof( param->cqm_8pc ) );

param->b_repeat_headers = 1;
param->b_annexb = 1;
param->b_aud = 0;
param->b_vfr_input = 1;
param->i_nal_hrd = X264_NAL_HRD_NONE;
param->b_tff = 1;
param->b_pic_struct = 0;
param->b_fake_interlaced = 0;
param->i_frame_packing = -1;
param->i_alternative_transfer = 2; /* undef */
param->b_opencl = 0;
param->i_opencl_device = 0;
param->opencl_device_id = NULL;
param->psz_clbin_file = NULL;
param->i_avcintra_class = 0;
param->i_avcintra_flavor = X264_AVCINTRA_FLAVOR_PANASONIC;
}
x264_param_default_preset(common/base.c)
1
2
3
4
5
6
7
8
9
10
11
12
//设置x264默认的preset和tune
REALIGN_STACK int x264_param_default_preset( x264_param_t *param, const char *preset, const char *tune )
{
//这里又一次把参数设置到了默认值
x264_param_default( param );

if( preset && param_apply_preset( param, preset ) < 0 )
return -1;
if( tune && param_apply_tune( param, tune ) < 0 )
return -1;
return 0;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
//根据输入的preset设置部分参数
static int param_apply_preset( x264_param_t *param, const char *preset )
{
char *end;
int i = strtol( preset, &end, 10 );
if( *end == 0 && i >= 0 && i < ARRAY_ELEMS(x264_preset_names)-1 )
preset = x264_preset_names[i];

if( !strcasecmp( preset, "ultrafast" ) )
{
param->i_frame_reference = 1;
param->i_scenecut_threshold = 0;
param->b_deblocking_filter = 0; //关闭了去方块滤波
param->b_cabac = 0;
param->i_bframe = 0; //无b帧
param->analyse.intra = 0;
param->analyse.inter = 0;
param->analyse.b_transform_8x8 = 0; //无8x8 DCT
param->analyse.i_me_method = X264_ME_DIA; //菱形搜索
param->analyse.i_subpel_refine = 0;
param->rc.i_aq_mode = 0;
param->analyse.b_mixed_references = 0;
param->analyse.i_trellis = 0;
param->i_bframe_adaptive = X264_B_ADAPT_NONE;
param->rc.b_mb_tree = 0;
param->analyse.i_weighted_pred = X264_WEIGHTP_NONE;
param->analyse.b_weighted_bipred = 0;
param->rc.i_lookahead = 0;
}
else if( !strcasecmp( preset, "superfast" ) )
{
param->analyse.inter = X264_ANALYSE_I8x8|X264_ANALYSE_I4x4;
param->analyse.i_me_method = X264_ME_DIA;
param->analyse.i_subpel_refine = 1;
param->i_frame_reference = 1;
param->analyse.b_mixed_references = 0;
param->analyse.i_trellis = 0;
param->rc.b_mb_tree = 0;
param->analyse.i_weighted_pred = X264_WEIGHTP_SIMPLE;
param->rc.i_lookahead = 0;
}
else if( !strcasecmp( preset, "veryfast" ) )
{
param->analyse.i_subpel_refine = 2;
param->i_frame_reference = 1;
param->analyse.b_mixed_references = 0;
param->analyse.i_trellis = 0;
param->analyse.i_weighted_pred = X264_WEIGHTP_SIMPLE;
param->rc.i_lookahead = 10;
}
else if( !strcasecmp( preset, "faster" ) )
{
param->analyse.b_mixed_references = 0;
param->i_frame_reference = 2;
param->analyse.i_subpel_refine = 4;
param->analyse.i_weighted_pred = X264_WEIGHTP_SIMPLE;
param->rc.i_lookahead = 20;
}
else if( !strcasecmp( preset, "fast" ) )
{
param->i_frame_reference = 2;
param->analyse.i_subpel_refine = 6;
param->analyse.i_weighted_pred = X264_WEIGHTP_SIMPLE;
param->rc.i_lookahead = 30;
}
else if( !strcasecmp( preset, "medium" ) )
{
/* Default is medium */
}
else if( !strcasecmp( preset, "slow" ) )
{
param->analyse.i_subpel_refine = 8;
param->i_frame_reference = 5;
param->analyse.i_direct_mv_pred = X264_DIRECT_PRED_AUTO;
param->analyse.i_trellis = 2;
param->rc.i_lookahead = 50;
}
else if( !strcasecmp( preset, "slower" ) )
{
param->analyse.i_me_method = X264_ME_UMH; //非对称十字六边形网络搜索
param->analyse.i_subpel_refine = 9;
param->i_frame_reference = 8;
param->i_bframe_adaptive = X264_B_ADAPT_TRELLIS;
param->analyse.i_direct_mv_pred = X264_DIRECT_PRED_AUTO;
param->analyse.inter |= X264_ANALYSE_PSUB8x8;
param->analyse.i_trellis = 2;
param->rc.i_lookahead = 60;
}
else if( !strcasecmp( preset, "veryslow" ) )
{
param->analyse.i_me_method = X264_ME_UMH;
param->analyse.i_subpel_refine = 10;
param->analyse.i_me_range = 24;
param->i_frame_reference = 16;
param->i_bframe_adaptive = X264_B_ADAPT_TRELLIS;
param->analyse.i_direct_mv_pred = X264_DIRECT_PRED_AUTO;
param->analyse.inter |= X264_ANALYSE_PSUB8x8;
param->analyse.i_trellis = 2;
param->i_bframe = 8;
param->rc.i_lookahead = 60;
}
else if( !strcasecmp( preset, "placebo" ) )
{
param->analyse.i_me_method = X264_ME_TESA;
param->analyse.i_subpel_refine = 11;
param->analyse.i_me_range = 24;
param->i_frame_reference = 16;
param->i_bframe_adaptive = X264_B_ADAPT_TRELLIS;
param->analyse.i_direct_mv_pred = X264_DIRECT_PRED_AUTO;
param->analyse.inter |= X264_ANALYSE_PSUB8x8;
param->analyse.b_fast_pskip = 0;
param->analyse.i_trellis = 2;
param->i_bframe = 16;
param->rc.i_lookahead = 60;
}
else
{
x264_log_internal( X264_LOG_ERROR, "invalid preset '%s'\n", preset );
return -1;
}
return 0;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
//设置tune
static int param_apply_tune( x264_param_t *param, const char *tune )
{
int psy_tuning_used = 0;
//这里是循环的,可以设置多个tune
for( int len; tune += strspn( tune, ",./-+" ), (len = strcspn( tune, ",./-+" )); tune += len )
{
if( len == 4 && !strncasecmp( tune, "film", 4 ) )
{
if( psy_tuning_used++ ) goto psy_failure;
param->i_deblocking_filter_alphac0 = -1;
param->i_deblocking_filter_beta = -1;
param->analyse.f_psy_trellis = 0.15;
}
else if( len == 9 && !strncasecmp( tune, "animation", 9 ) )
{
if( psy_tuning_used++ ) goto psy_failure;
param->i_frame_reference = param->i_frame_reference > 1 ? param->i_frame_reference*2 : 1;
param->i_deblocking_filter_alphac0 = 1;
param->i_deblocking_filter_beta = 1;
param->analyse.f_psy_rd = 0.4;
param->rc.f_aq_strength = 0.6;
param->i_bframe += 2;
}
else if( len == 5 && !strncasecmp( tune, "grain", 5 ) )
{
if( psy_tuning_used++ ) goto psy_failure;
param->i_deblocking_filter_alphac0 = -2;
param->i_deblocking_filter_beta = -2;
param->analyse.f_psy_trellis = 0.25;
param->analyse.b_dct_decimate = 0;
param->rc.f_pb_factor = 1.1;
param->rc.f_ip_factor = 1.1;
param->rc.f_aq_strength = 0.5;
param->analyse.i_luma_deadzone[0] = 6;
param->analyse.i_luma_deadzone[1] = 6;
param->rc.f_qcompress = 0.8;
}
else if( len == 10 && !strncasecmp( tune, "stillimage", 10 ) )
{
if( psy_tuning_used++ ) goto psy_failure;
param->i_deblocking_filter_alphac0 = -3;
param->i_deblocking_filter_beta = -3;
param->analyse.f_psy_rd = 2.0;
param->analyse.f_psy_trellis = 0.7;
param->rc.f_aq_strength = 1.2;
}
else if( len == 4 && !strncasecmp( tune, "psnr", 4 ) )
{
if( psy_tuning_used++ ) goto psy_failure;
param->rc.i_aq_mode = X264_AQ_NONE;
param->analyse.b_psy = 0;
}
else if( len == 4 && !strncasecmp( tune, "ssim", 4 ) )
{
if( psy_tuning_used++ ) goto psy_failure;
param->rc.i_aq_mode = X264_AQ_AUTOVARIANCE;
param->analyse.b_psy = 0;
}
else if( len == 10 && !strncasecmp( tune, "fastdecode", 10 ) )
{
param->b_deblocking_filter = 0;
param->b_cabac = 0;
param->analyse.b_weighted_bipred = 0;
param->analyse.i_weighted_pred = X264_WEIGHTP_NONE;
}
else if( len == 11 && !strncasecmp( tune, "zerolatency", 11 ) )
{
param->rc.i_lookahead = 0;
param->i_sync_lookahead = 0;
param->i_bframe = 0; //关闭b帧
param->b_sliced_threads = 1;
param->b_vfr_input = 0;
param->rc.b_mb_tree = 0;
}
else if( len == 6 && !strncasecmp( tune, "touhou", 6 ) )
{
if( psy_tuning_used++ ) goto psy_failure;
param->i_frame_reference = param->i_frame_reference > 1 ? param->i_frame_reference*2 : 1;
param->i_deblocking_filter_alphac0 = -1;
param->i_deblocking_filter_beta = -1;
param->analyse.f_psy_trellis = 0.2;
param->rc.f_aq_strength = 1.3;
if( param->analyse.inter & X264_ANALYSE_PSUB16x16 )
param->analyse.inter |= X264_ANALYSE_PSUB8x8;
}
else
{
x264_log_internal( X264_LOG_ERROR, "invalid tune '%.*s'\n", len, tune );
return -1;
psy_failure:
x264_log_internal( X264_LOG_WARNING, "only 1 psy tuning can be used: ignoring tune %.*s\n", len, tune );
}
}
return 0;
}

encode( &param, &opt )

encode是真正的编码处理,代码开头定义部分如下,开头几个是x264的结构体:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
static int encode( x264_param_t *param, cli_opt_t *opt )
{
x264_t *h = NULL; // 编码器
x264_picture_t pic; // 图像信息
cli_pic_t cli_pic; //cli用的图片的信息
const cli_pulldown_t *pulldown = NULL; // shut up gcc

int i_frame = 0;
int i_frame_output = 0;
int64_t i_end, i_previous = 0, i_start = 0;
int64_t i_file = 0;
int i_frame_size;
int64_t last_dts = 0;
int64_t prev_dts = 0;
int64_t first_dts = 0;
# define MAX_PTS_WARNING 3 /* arbitrary */
int pts_warning_cnt = 0;
int64_t largest_pts = -1;
int64_t second_largest_pts = -1;
int64_t ticks_per_frame;
double duration;
double pulldown_pts = 0;
int retval = 0;

encode 的主要工作是初始化结构体后调用encode_frame函数编码,而encode_frame调用了x264_encoder_encode进行实际的编码。这个函数位于encoder/api.c,是个调用函数指针的封装:

1
2
3
4
5
6
REALIGN_STACK int x264_encoder_encode( x264_t *h, x264_nal_t **pp_nal, int *pi_nal, x264_picture_t *pic_in, x264_picture_t *pic_out )
{
x264_api_t *api = (x264_api_t *)h;

return api->encoder_encode( api->x264, pp_nal, pi_nal, pic_in, pic_out );
}

因此,回到编码器开启的时候寻找当时注册的encode函数,这个注册过程同样实现在encoder/api.c:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
REALIGN_STACK x264_t *x264_encoder_open( x264_param_t *param )
{
x264_api_t *api = calloc( 1, sizeof( x264_api_t ) );
if( !api )
return NULL;

if( HAVE_BITDEPTH8 && param->i_bitdepth == 8 )
{
api->nal_encode = x264_8_nal_encode;
api->encoder_reconfig = x264_8_encoder_reconfig;
api->encoder_parameters = x264_8_encoder_parameters;
api->encoder_headers = x264_8_encoder_headers;
api->encoder_encode = x264_8_encoder_encode;
api->encoder_close = x264_8_encoder_close;
api->encoder_delayed_frames = x264_8_encoder_delayed_frames;
api->encoder_maximum_delayed_frames = x264_8_encoder_maximum_delayed_frames;
api->encoder_intra_refresh = x264_8_encoder_intra_refresh;
api->encoder_invalidate_reference = x264_8_encoder_invalidate_reference;

api->x264 = x264_8_encoder_open( param );
}
else if( HAVE_BITDEPTH10 && param->i_bitdepth == 10 )
{
api->nal_encode = x264_10_nal_encode;
api->encoder_reconfig = x264_10_encoder_reconfig;
api->encoder_parameters = x264_10_encoder_parameters;
api->encoder_headers = x264_10_encoder_headers;
api->encoder_encode = x264_10_encoder_encode;
api->encoder_close = x264_10_encoder_close;
api->encoder_delayed_frames = x264_10_encoder_delayed_frames;
api->encoder_maximum_delayed_frames = x264_10_encoder_maximum_delayed_frames;
api->encoder_intra_refresh = x264_10_encoder_intra_refresh;
api->encoder_invalidate_reference = x264_10_encoder_invalidate_reference;

api->x264 = x264_10_encoder_open( param );
}
else
x264_log_internal( X264_LOG_ERROR, "not compiled with %d bit depth support\n", param->i_bitdepth );

if( !api->x264 )
{
free( api );
return NULL;
}

/* x264_t is opaque */
return (x264_t *)api;
}

目前x264主要是按照8bit和10bit两种不同的深度对调用进行了注册。但是全文搜索并找不到这些指向的函数实现,只能找到定义,同样位于encoder/api.c

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
x264_t *x264_8_encoder_open( x264_param_t * );
void x264_8_nal_encode( x264_t *h, uint8_t *dst, x264_nal_t *nal );
int x264_8_encoder_reconfig( x264_t *, x264_param_t * );
void x264_8_encoder_parameters( x264_t *, x264_param_t * );
int x264_8_encoder_headers( x264_t *, x264_nal_t **pp_nal, int *pi_nal );
int x264_8_encoder_encode( x264_t *, x264_nal_t **pp_nal, int *pi_nal, x264_picture_t *pic_in, x264_picture_t *pic_out );
void x264_8_encoder_close( x264_t * );
int x264_8_encoder_delayed_frames( x264_t * );
int x264_8_encoder_maximum_delayed_frames( x264_t * );
void x264_8_encoder_intra_refresh( x264_t * );
int x264_8_encoder_invalidate_reference( x264_t *, int64_t pts );

x264_t *x264_10_encoder_open( x264_param_t * );
void x264_10_nal_encode( x264_t *h, uint8_t *dst, x264_nal_t *nal );
int x264_10_encoder_reconfig( x264_t *, x264_param_t * );
void x264_10_encoder_parameters( x264_t *, x264_param_t * );
int x264_10_encoder_headers( x264_t *, x264_nal_t **pp_nal, int *pi_nal );
int x264_10_encoder_encode( x264_t *, x264_nal_t **pp_nal, int *pi_nal, x264_picture_t *pic_in, x264_picture_t *pic_out );
void x264_10_encoder_close( x264_t * );
int x264_10_encoder_delayed_frames( x264_t * );
int x264_10_encoder_maximum_delayed_frames( x264_t * );
void x264_10_encoder_intra_refresh( x264_t * );
int x264_10_encoder_invalidate_reference( x264_t *, int64_t pts );

在编译后,可以看到x264会生成两种不同位深的.o文件,例如在encoder/:

1
2
3
4
5
6
7
8
9
10
11
 > ls
analyse-10.o cavlc-8.o macroblock.c rdo.c
analyse-8.o cavlc.c macroblock.h set-10.o
analyse.c encoder-10.o me-10.o set-8.o
analyse.h encoder-8.o me-8.o set.c
api.c encoder.c me.c set.h
api.o lookahead-10.o me.h slicetype.c
cabac-10.o lookahead-8.o ratecontrol-10.o slicetype-cl-8.o
cabac-8.o lookahead.c ratecontrol-8.o slicetype-cl.c
cabac.c macroblock-10.o ratecontrol.c slicetype-cl.h
cavlc-10.o macroblock-8.o ratecontrol.h

全局搜索可以看到,相关的实现被x264通过宏定义进行区分(common/common.h):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
/* Macros for templating function calls according to bit depth */
#define x264_template(w) x264_glue3(x264, BIT_DEPTH, w)

/****************************************************************************
* API Templates
****************************************************************************/
#define x264_nal_encode x264_template(nal_encode)
#define x264_encoder_reconfig x264_template(encoder_reconfig)
#define x264_encoder_parameters x264_template(encoder_parameters)
#define x264_encoder_headers x264_template(encoder_headers)
#define x264_encoder_encode x264_template(encoder_encode)
#define x264_encoder_close x264_template(encoder_close)
#define x264_encoder_delayed_frames x264_template(encoder_delayed_frames)
#define x264_encoder_maximum_delayed_frames x264_template(encoder_maximum_delayed_frames)
#define x264_encoder_intra_refresh x264_template(encoder_intra_refresh)
#define x264_encoder_invalidate_reference x264_template(encoder_invalidate_reference)

因此,无论是8bit还是10bit的处理,在API逻辑上没有区别。

上文API的实现位于encoder/encoder.c,现在再回头看之前寻找的编码函数,定义如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
/****************************************************************************
* x264_encoder_encode:
* XXX: i_poc : is the poc of the current given picture
* i_frame : is the number of the frame being coded
* ex: type frame poc
* I 0 2*0
* P 1 2*3
* B 2 2*1
* B 3 2*2
* P 4 2*6
* B 5 2*4
* B 6 2*5
****************************************************************************/
int x264_encoder_encode( x264_t *h,
x264_nal_t **pp_nal, int *pi_nal,
x264_picture_t *pic_in,
x264_picture_t *pic_out )

该函数的入参如下:

 1. x264_t *h编码器参数
 2. x264_nal_t **pp_nal 编码的结果
 3. int *pi_nal 已编码的帧数
 4. x264_picture_t *pic_in 存放输入的raw数据
 5. x264_picture_t *pic_out

主要调用的函数:

  1. x264_frame_pop_unused 获取一个没有使用的frame(fenc)
  2. x264_frame_copy_picture 拷贝raw数据到fenc
  3. x264_frame_expand_border_mod16 对分辨率不是16倍数的帧,增加pad位进行扩边
  4. x264_macroblock_tree_read mb-tree相关的码率控制
  5. x264_adaptive_quant_frame 宏块级码率控制
  6. x264_frame_init_lowres 1/2像素的内插
  7. x264_lookahead_put_frame 将帧加入lookahead分析
  8. x264_lookahead_get_frames 从lookahead取出帧
  9. x264_frame_shift 从队列中取出用于编码的帧
  10. x264_ratecontrol_zone_init 码率控制初始化
  11. reference_reset 清空参考帧列表
  12. reference_hierarchy_reset 在帧类型为I帧,P帧等类型时,调用判断是否重置参考帧
  13. reference_build_list 建立参考帧列表,list0,list1
  14. x264_ratecontrol_start 开启码率控制,选择合适的QP
  15. x264_ratecontrol_qp 获取计算出的QP
  16. slice_init 创建头信息
  17. slices_write 真正的编码函数,内部进行了编码
  18. encoder_frame_end 编码后的处理内容

码率控制

x264_ratecontrol相关的处理被定义在encoder/ratecontrol.c当中。其中,x264_ratecontrol_start的ratecontrol针对的是帧层级的计算。在该函数中,主要进行帧预期QP值的运算,主要使用ABR,2-PASS和VBV三种控制方式。

ABR算法 && 2-PASS ABR

在x264_ratecontrol_start函数中, 关于ABR和2-PASS ABR计算,主要是如下几行:

1
2
3
4
5
6
7
8
9
if( rc->b_abr )
{
q = qscale2qp( rate_estimate_qscale( h ) );
}
else if( rc->b_2pass )
{
rce->new_qscale = rate_estimate_qscale( h );
q = qscale2qp( rce->new_qscale );
}

其中q代表ABR算法计算的QP值;下面分析函数的调用(这里看的是原生代码,不是优化后代码)

1
2
3
4
5
6
#define QP_BD_OFFSET (6*(BIT_DEPTH-8))

static inline float qscale2qp( float qscale )
{
return (12.0f + QP_BD_OFFSET) + 6.0f * log2f( qscale/0.85f );
}

这个函数代表了计算ABR的换算公式,x264选择了固定的公式(经验公式)换算qscale和qp。此时查看反向的切换

1
2
3
4
static inline float qp2qscale( float qp )
{
return 0.85f * powf( 2.0f, ( qp - (12.0f + QP_BD_OFFSET) ) / 6.0f );
}

可以看到公式为 (假设是8bit位深)
$$
qscale = 0.85*2.0^{(qp - 12.0)/6.0}
$$
qscale和每帧的bits也存在线性关系,因此先计算qscale,再转换成qp即可进行码率初步控制;

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
// update qscale for 1 frame based on actual bits used so far
static float rate_estimate_qscale( x264_t *h )
{
float q;
x264_ratecontrol_t *rcc = h->rc;
ratecontrol_entry_t rce = {0};
int pict_type = h->sh.i_type;
// 这里进行总的比特的计算
int64_t total_bits = 8*(h->stat.i_frame_size[SLICE_TYPE_I]
+ h->stat.i_frame_size[SLICE_TYPE_P]
+ h->stat.i_frame_size[SLICE_TYPE_B])
- rcc->filler_bits_sum; //这里减掉了帧填充数据

//如果2pass获取句柄
if( rcc->b_2pass )
{
rce = *rcc->rce;
if( pict_type != rce.pict_type )
{
x264_log( h, X264_LOG_ERROR, "slice=%c but 2pass stats say %c\n",
slice_type_to_char[pict_type], slice_type_to_char[rce.pict_type] );
}
}

if( pict_type == SLICE_TYPE_B )
{
/* B-frames don't have independent ratecontrol, but rather get the
* average QP of the two adjacent P-frames + an offset */

int i0 = IS_X264_TYPE_I(h->fref_nearest[0]->i_type);
int i1 = IS_X264_TYPE_I(h->fref_nearest[1]->i_type);
int dt0 = abs(h->fenc->i_poc - h->fref_nearest[0]->i_poc);
int dt1 = abs(h->fenc->i_poc - h->fref_nearest[1]->i_poc);
float q0 = h->fref_nearest[0]->f_qp_avg_rc;
float q1 = h->fref_nearest[1]->f_qp_avg_rc;

if( h->fref_nearest[0]->i_type == X264_TYPE_BREF )
q0 -= rcc->pb_offset/2;
if( h->fref_nearest[1]->i_type == X264_TYPE_BREF )
q1 -= rcc->pb_offset/2;

if( i0 && i1 )
q = (q0 + q1) / 2 + rcc->ip_offset;
else if( i0 )
q = q1;
else if( i1 )
q = q0;
else
q = (q0*dt1 + q1*dt0) / (dt0 + dt1);

if( h->fenc->b_kept_as_ref )
q += rcc->pb_offset/2;
else
q += rcc->pb_offset;

rcc->qp_novbv = q;
q = qp2qscale( q );
//估算当前帧的体积
if( rcc->b_2pass )
rcc->frame_size_planned = qscale2bits( &rce, q );
else
rcc->frame_size_planned = predict_size( rcc->pred_b_from_p, q, h->fref[1][h->i_ref[1]-1]->i_satd );
/* Limit planned size by MinCR */
if( rcc->b_vbv )
rcc->frame_size_planned = X264_MIN( rcc->frame_size_planned, rcc->frame_size_maximum );
rcc->frame_size_estimated = rcc->frame_size_planned;

/* For row SATDs */
if( rcc->b_vbv )
rcc->last_satd = x264_rc_analyse_slice( h );
return q;
}
else
{
double abr_buffer = 2 * rcc->rate_tolerance * rcc->bitrate;
double predicted_bits = total_bits;
//多线程更新predict_bits
if( h->i_thread_frames > 1 )
{
int j = rcc - h->thread[0]->rc;
for( int i = 1; i < h->i_thread_frames; i++ )
{
x264_t *t = h->thread[(j+i) % h->i_thread_frames];
double bits = t->rc->frame_size_planned;
if( !t->b_thread_active )
continue;
bits = X264_MAX(bits, t->rc->frame_size_estimated);
predicted_bits += bits;
}
}

if( rcc->b_2pass )
{
double lmin = rcc->lmin[pict_type];
double lmax = rcc->lmax[pict_type];
double diff;

/* Adjust ABR buffer based on distance to the end of the video. */
if( rcc->num_entries > h->i_frame )
{
double final_bits = rcc->entry_out[rcc->num_entries-1]->expected_bits;
double video_pos = rce.expected_bits / final_bits;
double scale_factor = sqrt( (1 - video_pos) * rcc->num_entries );
abr_buffer *= 0.5 * X264_MAX( scale_factor, 0.5 );
}

diff = predicted_bits - rce.expected_bits;
q = rce.new_qscale;
q /= x264_clip3f((abr_buffer - diff) / abr_buffer, .5, 2);
if( h->i_frame >= rcc->fps && rcc->expected_bits_sum >= 1 )
{
/* Adjust quant based on the difference between
* achieved and expected bitrate so far */
double cur_time = (double)h->i_frame / rcc->num_entries;
double w = x264_clip3f( cur_time*100, 0.0, 1.0 );
q *= pow( (double)total_bits / rcc->expected_bits_sum, w );
}
rcc->qp_novbv = qscale2qp( q );
if( rcc->b_vbv )
{
/* Do not overflow vbv */
double expected_size = qscale2bits( &rce, q );
double expected_vbv = rcc->buffer_fill + rcc->buffer_rate - expected_size;
double expected_fullness = rce.expected_vbv / rcc->buffer_size;
double qmax = q*(2 - expected_fullness);
double size_constraint = 1 + expected_fullness;
qmax = X264_MAX( qmax, rce.new_qscale );
if( expected_fullness < .05 )
qmax = lmax;
qmax = X264_MIN(qmax, lmax);
while( ((expected_vbv < rce.expected_vbv/size_constraint) && (q < qmax)) ||
((expected_vbv < 0) && (q < lmax)))
{
q *= 1.05;
expected_size = qscale2bits(&rce, q);
expected_vbv = rcc->buffer_fill + rcc->buffer_rate - expected_size;
}
rcc->last_satd = x264_rc_analyse_slice( h );
}
q = x264_clip3f( q, lmin, lmax );
}
else /* 1pass ABR */
{
/* Calculate the quantizer which would have produced the desired
* average bitrate if it had been applied to all frames so far.
* Then modulate that quant based on the current frame's complexity
* relative to the average complexity so far (using the 2pass RCEQ).
* Then bias the quant up or down if total size so far was far from
* the target.
* Result: Depending on the value of rate_tolerance, there is a
* tradeoff between quality and bitrate precision. But at large
* tolerances, the bit distribution approaches that of 2pass. */

double wanted_bits, overflow = 1;

//获取最新的satd
rcc->last_satd = x264_rc_analyse_slice( h );
//更新cplxsum和cplxcount
rcc->short_term_cplxsum *= 0.5;
rcc->short_term_cplxcount *= 0.5;
rcc->short_term_cplxsum += rcc->last_satd / (CLIP_DURATION(h->fenc->f_duration) / BASE_FRAME_DURATION);
rcc->short_term_cplxcount ++;

//rce的初始化
rce.tex_bits = rcc->last_satd;
rce.blurred_complexity = rcc->short_term_cplxsum / rcc->short_term_cplxcount;
rce.mv_bits = 0;
rce.p_count = rcc->nmb;
rce.i_count = 0;
rce.s_count = 0;
rce.qscale = 1;
rce.pict_type = pict_type;
rce.i_duration = h->fenc->i_duration;

if( h->param.rc.i_rc_method == X264_RC_CRF ) //crf
{
q = get_qscale( h, &rce, rcc->rate_factor_constant, h->fenc->i_frame );
}
else
{
q = get_qscale( h, &rce, rcc->wanted_bits_window / rcc->cplxr_sum, h->fenc->i_frame );

/* ABR code can potentially be counterproductive in CBR, so just don't bother.
* Don't run it if the frame complexity is zero either. */
if( !rcc->b_vbv_min_rate && rcc->last_satd ) //没开启vbv就进行二次调整
{
// FIXME is it simpler to keep track of wanted_bits in ratecontrol_end?
int i_frame_done = h->i_frame;
//播放需要的时间
double time_done = i_frame_done / rcc->fps;
if( h->param.b_vfr_input && i_frame_done > 0 )
time_done = ((double)(h->fenc->i_reordered_pts - h->i_reordered_pts_delay)) * h->param.i_timebase_num / h->param.i_timebase_den;
wanted_bits = time_done * rcc->bitrate;
if( wanted_bits > 0 )
{
abr_buffer *= X264_MAX( 1, sqrt( time_done ) );
overflow = x264_clip3f( 1.0 + (predicted_bits - wanted_bits) / abr_buffer, .5, 2 );
q *= overflow;
}
}
}

if( pict_type == SLICE_TYPE_I && h->param.i_keyint_max > 1
/* should test _next_ pict type, but that isn't decided yet */
&& rcc->last_non_b_pict_type != SLICE_TYPE_I )
{
q = qp2qscale( rcc->accum_p_qp / rcc->accum_p_norm );
q /= h->param.rc.f_ip_factor;
}
else if( h->i_frame > 0 )
{
if( h->param.rc.i_rc_method != X264_RC_CRF )
{
/* Asymmetric clipping, because symmetric would prevent
* overflow control in areas of rapidly oscillating complexity */
double lmin = rcc->last_qscale_for[pict_type] / rcc->lstep;
double lmax = rcc->last_qscale_for[pict_type] * rcc->lstep;
if( overflow > 1.1 && h->i_frame > 3 )
lmax *= rcc->lstep;
else if( overflow < 0.9 )
lmin /= rcc->lstep;

q = x264_clip3f(q, lmin, lmax);
}
}
else if( h->param.rc.i_rc_method == X264_RC_CRF && rcc->qcompress != 1 )
{
q = qp2qscale( ABR_INIT_QP ) / h->param.rc.f_ip_factor;
}
rcc->qp_novbv = qscale2qp( q );

//FIXME use get_diff_limited_q() ?
q = clip_qscale( h, pict_type, q );
}

rcc->last_qscale_for[pict_type] =
rcc->last_qscale = q;

if( !(rcc->b_2pass && !rcc->b_vbv) && h->fenc->i_frame == 0 )
rcc->last_qscale_for[SLICE_TYPE_P] = q * h->param.rc.f_ip_factor;

if( rcc->b_2pass )
rcc->frame_size_planned = qscale2bits( &rce, q );
else
rcc->frame_size_planned = predict_size( &rcc->pred[h->sh.i_type], q, rcc->last_satd );

/* Always use up the whole VBV in this case. */
if( rcc->single_frame_vbv )
rcc->frame_size_planned = rcc->buffer_rate;
/* Limit planned size by MinCR */
if( rcc->b_vbv )
rcc->frame_size_planned = X264_MIN( rcc->frame_size_planned, rcc->frame_size_maximum );
rcc->frame_size_estimated = rcc->frame_size_planned;
return q;
}
}

根据这段代码,可以绘制如下流程图:

rate_estimate_qscale流程图
VBV算法

x264_ratecontrol_start中关于vbv的代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
if( rc->b_vbv )
{
memset( h->fdec->i_row_bits, 0, h->mb.i_mb_height * sizeof(int) );
memset( h->fdec->f_row_qp, 0, h->mb.i_mb_height * sizeof(float) );
memset( h->fdec->f_row_qscale, 0, h->mb.i_mb_height * sizeof(float) );
rc->row_pred = rc->row_preds[h->sh.i_type];
rc->buffer_rate = h->fenc->i_cpb_duration * rc->vbv_max_rate * h->sps->vui.i_num_units_in_tick / h->sps->vui.i_time_scale;
update_vbv_plan( h, overhead ); //根据当前的帧大小更新vbv

const x264_level_t *l = x264_levels;
while( l->level_idc != 0 && l->level_idc != h->param.i_level_idc )
l++;

//获取最小的压缩比
int mincr = l->mincr;

if( h->param.b_bluray_compat )
mincr = 4;

/* Profiles above High don't require minCR, so just set the maximum to a large value. */
if( h->sps->i_profile_idc > PROFILE_HIGH )
rc->frame_size_maximum = 1e9;
else
{
/* The spec has a bizarre special case for the first frame. 第一帧的特殊处理*/
if( h->i_frame == 0 )
{
//384 * ( Max( PicSizeInMbs, fR * MaxMBPS ) + MaxMBPS * ( tr( 0 ) - tr,n( 0 ) ) ) / MinCR
double fr = 1. / (h->param.i_level_idc >= 60 ? 300 : 172);
int pic_size_in_mbs = h->mb.i_mb_width * h->mb.i_mb_height;
rc->frame_size_maximum = 384 * BIT_DEPTH * X264_MAX( pic_size_in_mbs, fr*l->mbps ) / mincr;
}
else
{
//384 * MaxMBPS * ( tr( n ) - tr( n - 1 ) ) / MinCR
rc->frame_size_maximum = 384 * BIT_DEPTH * ((double)h->fenc->i_cpb_duration * h->sps->vui.i_num_units_in_tick / h->sps->vui.i_time_scale) * l->mbps / mincr;
}
}
}

update_vbv_plan函数如下,可以发现这个函数主要更新了buffer_fill作为计划缓冲区。实际的更新不在这里。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// provisionally update VBV according to the planned size of all frames currently in progress
static void update_vbv_plan( x264_t *h, int overhead )
{
x264_ratecontrol_t *rcc = h->rc;
rcc->buffer_fill = h->thread[0]->rc->buffer_fill_final_min / h->sps->vui.i_time_scale;
//多线程情况
if( h->i_thread_frames > 1 )
{
int j = rcc - h->thread[0]->rc;
for( int i = 1; i < h->i_thread_frames; i++ )
{
x264_t *t = h->thread[ (j+i)%h->i_thread_frames ];
double bits = t->rc->frame_size_planned;
if( !t->b_thread_active )
continue;
bits = X264_MAX(bits, t->rc->frame_size_estimated);
rcc->buffer_fill -= bits;
rcc->buffer_fill = X264_MAX( rcc->buffer_fill, 0 );
rcc->buffer_fill += t->rc->buffer_rate;
rcc->buffer_fill = X264_MIN( rcc->buffer_fill, rcc->buffer_size );
}
}
rcc->buffer_fill = X264_MIN( rcc->buffer_fill, rcc->buffer_size );
rcc->buffer_fill -= overhead;
}

实际的更新在完成编码之后,由x264_ratecontrol_end调用update_vbv;这个函数更新了buffer_fill_final,代表vbv最后实际使用的bits。vbv算法还有许多宏块级的处理,分散在代码中,本次学习暂时没有分析相关内容

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
// update VBV after encoding a frame
static int update_vbv( x264_t *h, int bits )
{
int filler = 0;
int bitrate = h->sps->vui.hrd.i_bit_rate_unscaled;
x264_ratecontrol_t *rcc = h->rc;
x264_ratecontrol_t *rct = h->thread[0]->rc;
int64_t buffer_size = (int64_t)h->sps->vui.hrd.i_cpb_size_unscaled * h->sps->vui.i_time_scale;

if( rcc->last_satd >= h->mb.i_mb_count )
update_predictor( &rct->pred[h->sh.i_type], qp2qscale( rcc->qpa_rc ), rcc->last_satd, bits );

if( !rcc->b_vbv )
return filler;

uint64_t buffer_diff = (uint64_t)bits * h->sps->vui.i_time_scale;
rct->buffer_fill_final -= buffer_diff;
rct->buffer_fill_final_min -= buffer_diff;

if( rct->buffer_fill_final_min < 0 )
{
double underflow = (double)rct->buffer_fill_final_min / h->sps->vui.i_time_scale;
if( rcc->rate_factor_max_increment && rcc->qpm >= rcc->qp_novbv + rcc->rate_factor_max_increment )
x264_log( h, X264_LOG_DEBUG, "VBV underflow due to CRF-max (frame %d, %.0f bits)\n", h->i_frame, underflow );
else
x264_log( h, X264_LOG_WARNING, "VBV underflow (frame %d, %.0f bits)\n", h->i_frame, underflow );
rct->buffer_fill_final =
rct->buffer_fill_final_min = 0;
}

if( h->param.i_avcintra_class )
buffer_diff = buffer_size;
else
buffer_diff = (uint64_t)bitrate * h->sps->vui.i_num_units_in_tick * h->fenc->i_cpb_duration;
rct->buffer_fill_final += buffer_diff;
rct->buffer_fill_final_min += buffer_diff;

if( rct->buffer_fill_final > buffer_size )
{
if( h->param.rc.b_filler )
{
int64_t scale = (int64_t)h->sps->vui.i_time_scale * 8;
filler = (rct->buffer_fill_final - buffer_size + scale - 1) / scale;
bits = h->param.i_avcintra_class ? filler * 8 : X264_MAX( (FILLER_OVERHEAD - h->param.b_annexb), filler ) * 8;
buffer_diff = (uint64_t)bits * h->sps->vui.i_time_scale;
rct->buffer_fill_final -= buffer_diff;
rct->buffer_fill_final_min -= buffer_diff;
}
else
{
rct->buffer_fill_final = X264_MIN( rct->buffer_fill_final, buffer_size );
rct->buffer_fill_final_min = X264_MIN( rct->buffer_fill_final_min, buffer_size );
}
}

return filler;
}

x264学习笔记
http://nyamori.icu/2022/04/28/x264学习笔记/
作者
Nyamori
发布于
2022年4月28日
许可协议