Pyc 结构简述
浅浅研究了一下 .pyc
的文件结构。
Pyc 简介
.pyc
是 Python 字节码文件,可以由 Python 虚拟机来执行。
在 Python 3 中,Python 会自动在 __pycache__
目录里,缓存每个模块编译后的版本,名称为 module.version.pyc
,这就是 Python 字节码文件。其中 version 一般使用 Python 版本号。
字节码在 Python 虚拟机程序里对应的是 PyCodeObject
对象。.pyc
文件是字节码在磁盘上的表现形式。
PyCodeObject
对象的创建时机是模块加载的时候,即 import
。
如果使用 python test.py
命令会对 test.py
进行编译成字节码并解释执行,但是不会生成 test.pyc
。
如果 test.py
加载了其他模块,如 import util
,Python 会对 util.py
进行编译成字节码,生成 util.pyc
,然后对字节码解释执行。
如果想生成 test.pyc
,我们可以使用Python内置模块py_compile来编译。
加载模块时,如果同时存在 .py
和 .pyc
,Python 会尝试使用 .pyc
;如果 .pyc
的编译时间早于 .py
的修改时间,则重新编译 .py
并更新 .pyc
。
Pyc 结构概述
Python 的原始代码在运行前都会被先编译成字节码(二进制),并把编译的结果保存到:
- 一个四字节
magic number
- 一个四字节的时间戳
- 一个
PyCodeObject
把这三部分在内存中以 marshal 格式保存为文件,即 pyc 文件。
01 magic number
iPlayForSG 大佬博客里整理了下面的 Magic Number 对照表。
Known values:
# Python 1.5: 20121
# Python 1.5.1: 20121
# Python 1.5.2: 20121
# Python 1.6: 50428
# Python 2.0: 50823
# Python 2.0.1: 50823
# Python 2.1: 60202
# Python 2.1.1: 60202
# Python 2.1.2: 60202
# Python 2.2: 60717
# Python 2.3a0: 62011
# Python 2.3a0: 62021
# Python 2.3a0: 62011 (!)
# Python 2.4a0: 62041
# Python 2.4a3: 62051
# Python 2.4b1: 62061
# Python 2.5a0: 62071
# Python 2.5a0: 62081 (ast-branch)
# Python 2.5a0: 62091 (with)
# Python 2.5a0: 62092 (changed WITH_CLEANUP opcode)
# Python 2.5b3: 62101 (fix wrong code: for x, in ...)
# Python 2.5b3: 62111 (fix wrong code: x += yield)
# Python 2.5c1: 62121 (fix wrong lnotab with for loops and
# storing constants that should have been removed)
# Python 2.5c2: 62131 (fix wrong code: for x, in ... in listcomp/genexp)
# Python 2.6a0: 62151 (peephole optimizations and STORE_MAP opcode)
# Python 2.6a1: 62161 (WITH_CLEANUP optimization)
# Python 2.7a0: 62171 (optimize list comprehensions/change LIST_APPEND)
# Python 2.7a0: 62181 (optimize conditional branches:
# introduce POP_JUMP_IF_FALSE and POP_JUMP_IF_TRUE)
# Python 2.7a0 62191 (introduce SETUP_WITH)
# Python 2.7a0 62201 (introduce BUILD_SET)
# Python 2.7a0 62211 (introduce MAP_ADD and SET_ADD)
# Python 3000: 3000
# 3010 (removed UNARY_CONVERT)
# 3020 (added BUILD_SET)
# 3030 (added keyword-only parameters)
# 3040 (added signature annotations)
# 3050 (print becomes a function)
# 3060 (PEP 3115 metaclass syntax)
# 3061 (string literals become unicode)
# 3071 (PEP 3109 raise changes)
# 3081 (PEP 3137 make __file__ and __name__ unicode)
# 3091 (kill str8 interning)
# 3101 (merge from 2.6a0, see 62151)
# 3103 (__file__ points to source file)
# Python 3.0a4: 3111 (WITH_CLEANUP optimization).
# Python 3.0b1: 3131 (lexical exception stacking, including POP_EXCEPT
#3021)
# Python 3.1a1: 3141 (optimize list, set and dict comprehensions:
# change LIST_APPEND and SET_ADD, add MAP_ADD #2183)
# Python 3.1a1: 3151 (optimize conditional branches:
# introduce POP_JUMP_IF_FALSE and POP_JUMP_IF_TRUE
#4715)
# Python 3.2a1: 3160 (add SETUP_WITH #6101)
# tag: cpython-32
# Python 3.2a2: 3170 (add DUP_TOP_TWO, remove DUP_TOPX and ROT_FOUR #9225)
# tag: cpython-32
# Python 3.2a3 3180 (add DELETE_DEREF #4617)
# Python 3.3a1 3190 (__class__ super closure changed)
# Python 3.3a1 3200 (PEP 3155 __qualname__ added #13448)
# Python 3.3a1 3210 (added size modulo 2**32 to the pyc header #13645)
# Python 3.3a2 3220 (changed PEP 380 implementation #14230)
# Python 3.3a4 3230 (revert changes to implicit __class__ closure #14857)
# Python 3.4a1 3250 (evaluate positional default arguments before
# keyword-only defaults #16967)
# Python 3.4a1 3260 (add LOAD_CLASSDEREF; allow locals of class to override
# free vars #17853)
# Python 3.4a1 3270 (various tweaks to the __class__ closure #12370)
# Python 3.4a1 3280 (remove implicit class argument)
# Python 3.4a4 3290 (changes to __qualname__ computation #19301)
# Python 3.4a4 3300 (more changes to __qualname__ computation #19301)
# Python 3.4rc2 3310 (alter __qualname__ computation #20625)
# Python 3.5a1 3320 (PEP 465: Matrix multiplication operator #21176)
# Python 3.5b1 3330 (PEP 448: Additional Unpacking Generalizations #2292)
# Python 3.5b2 3340 (fix dictionary display evaluation order #11205)
# Python 3.5b3 3350 (add GET_YIELD_FROM_ITER opcode #24400)
# Python 3.5.2 3351 (fix BUILD_MAP_UNPACK_WITH_CALL opcode #27286)
# Python 3.6a0 3360 (add FORMAT_VALUE opcode #25483)
# Python 3.6a1 3361 (lineno delta of code.co_lnotab becomes signed #26107)
# Python 3.6a2 3370 (16 bit wordcode #26647)
# Python 3.6a2 3371 (add BUILD_CONST_KEY_MAP opcode #27140)
# Python 3.6a2 3372 (MAKE_FUNCTION simplification, remove MAKE_CLOSURE
# #27095)
# Python 3.6b1 3373 (add BUILD_STRING opcode #27078)
# Python 3.6b1 3375 (add SETUP_ANNOTATIONS and STORE_ANNOTATION opcodes
# #27985)
# Python 3.6b1 3376 (simplify CALL_FUNCTIONs & BUILD_MAP_UNPACK_WITH_CALL
#27213)
# Python 3.6b1 3377 (set __class__ cell from type.__new__ #23722)
# Python 3.6b2 3378 (add BUILD_TUPLE_UNPACK_WITH_CALL #28257)
# Python 3.6rc1 3379 (more thorough __class__ validation #23722)
# Python 3.7a1 3390 (add LOAD_METHOD and CALL_METHOD opcodes #26110)
# Python 3.7a2 3391 (update GET_AITER #31709)
# Python 3.7a4 3392 (PEP 552: Deterministic pycs #31650)
# Python 3.7b1 3393 (remove STORE_ANNOTATION opcode #32550)
# Python 3.7b5 3394 (restored docstring as the first stmt in the body;
# this might affected the first line number #32911)
# Python 3.8a1 3400 (move frame block handling to compiler #17611)
# Python 3.8a1 3401 (add END_ASYNC_FOR #33041)
# Python 3.8a1 3410 (PEP570 Python Positional-Only Parameters #36540)
# Python 3.8b2 3411 (Reverse evaluation order of key: value in dict
# comprehensions #35224)
# Python 3.8b2 3412 (Swap the position of positional args and positional
# only args in ast.arguments #37593)
# Python 3.8b4 3413 (Fix "break" and "continue" in "finally" #37830)
02 PyCodeObject
/* Bytecode object */
typedef struct {
PyObject_HEAD
int co_argcount; /* #arguments, except *args 代码块的位置参数的个数 */
int co_nlocals; /* #local variables 代码块中局部变量的个数,包括其位置参数的个数。 */
int co_stacksize; /* #entries needed for evaluation stack 执行该代码块需要的栈空间。 */
int co_flags; /* 用来表示参数中是否有 *args 或者 **kwargs */
PyObject *co_code; /* instruction opcodes 该段代码块所对应的 opcode 指令序列。 */
PyObject *co_consts; /* list (constants used) 保存代码块中所有的常量 */
PyObject *co_names; /* list of strings (names used) 保存代码块中所有的符号 */
PyObject *co_varnames; /* tuple of strings (local variable names) 代码块中所有局部变量名 */
PyObject *co_freevars; /* tuple of strings (free variable names) 闭包所需要的 */
PyObject *co_cellvars; /* tuple of strings (cell variable names) 闭包所需要的,内部嵌套函数所需要的局部变量集合 */
/* The rest doesn't count for hash/cmp */
PyObject *co_filename; /* string (where it was loaded from) */
PyObject *co_name; /* string (name, for reference) */
int co_firstlineno; /* first source line number */
PyObject *co_lnotab; /* string (encoding addr<->lineno mapping) See
Objects/lnotab_notes.txt for details. */
void *co_zombieframe; /* for optimization only (see frameobject.c) */
PyObject *co_weakreflist;/* to support weakrefs to code objects */
} PyCodeObject;
所有的 PyCodeObject
都是通过调用以下的函数得以运行:
PyObject * PyEval_EvalFrameEx(PyFrameObject *f, int throwflag)
这个函数是 Python 的一个重量级的函数,他的作用即是执行中间码,Python 的代码都是通过调用这个函数来运行的。
PyFrameObject
这个数据结构:
/* Frame object */
typedef struct _frame {
PyObject_VAR_HEAD
struct _frame *f_back; /* previous frame, or NULL */
PyCodeObject *f_code; /* 【重要】code segment */
PyObject *f_builtins; /* builtin symbol table (PyDictObject) */
PyObject *f_globals; /* 【重要】global symbol table (PyDictObject) */
PyObject *f_locals; /* 【重要】local symbol table (any mapping) */
PyObject **f_valuestack; /* points after the last local */
/* Next free slot in f_valuestack. Frame creation sets to f_valuestack.
Frame evaluation usually NULLs it, but a frame that yields sets it
to the current stack top. */
PyObject **f_stacktop; /* 【重要】 */
PyObject *f_trace; /* Trace function */
/* If an exception is raised in this frame, the next three are used to
* record the exception info (if any) originally in the thread state. See
* comments before set_exc_info() -- it's not obvious.
* Invariant: if _type is NULL, then so are _value and _traceback.
* Desired invariant: all three are NULL, or all three are non-NULL. That
* one isn't currently true, but "should be".
*/
PyObject *f_exc_type, *f_exc_value, *f_exc_traceback;
PyThreadState *f_tstate;
int f_lasti; /* Last instruction if called */
/* Call PyFrame_GetLineNumber() instead of reading this field
directly. As of 2.3 f_lineno is only valid when tracing is
active (i.e. when f_trace is set). At other times we use
PyCode_Addr2Line to calculate the line from the current
bytecode index. */
int f_lineno; /* Current line number */
int f_iblock; /* index in f_blockstack */
PyTryBlock f_blockstack[CO_MAXBLOCKS]; /* for try and loop blocks */
PyObject *f_localsplus[1]; /* locals+stack, dynamically sized */
} PyFrameObject;
参考资料:
- Python 字节码 https://www.cnblogs.com/ningjing213/p/16224595.html
- Python 程序的执行原理 http://tech.uc.cn/?p=1932、
- .pyc 文件的结构 https://blog.csdn.net/weixin_45055269/article/details/105682945
- python Magic Number对照表以及pyc修复方法 https://www.cnblogs.com/Here-is-SG/p/15885799.html
评论区