当前位置: 首页 > news >正文

学习笔记: Mach-O 文件

“结构决定性质,性质决定用途”。如果不了解结构,是很难真正理解的。

通过一个示例的可执行文件了解Mach-O文件的结构

Mach-O基本结构

  1. Header: :文件类型、目标架构类型等
  2. Load Commands:描述文件在虚拟内存中的逻辑结构、布局
  3. Data: 在Load commands中定义的Segment的数据

2025-04-17 14.57.00.png

Header

2025-04-17 14.59.03.png

Header的结构定义在loader.h

/** The 64-bit mach header appears at the very beginning of object files for* 64-bit architectures.*/
struct mach_header_64 {// 魔数:64位的mach-o有两个取值// #define MH_MAGIC_64 0xfeedfacf -- 小端:Intel// #define MH_CIGAM_64 0xcffaedfe -- 大端:以前macOS在PowerPC安装uint32_t	magic;		/* mach magic number identifier */// cpu类型// 在machine.h中定义// 例子中的显示的cpu的Value是:CPU_TYPE_ARM,根据下面的定义 0x0000000C | 0x01000000 = 0x0100000C// #define CPU_ARCH_ABI64          0x01000000      /* 64 bit ABI */// #define CPU_TYPE_ARM            ((cpu_type_t) 12)// #define CPU_TYPE_ARM64 (CPU_TYPE_ARM | CPU_ARCH_ABI64)int32_t		cputype;	/* cpu specifier *//**  ARM64 subtypes*  ARM64的具体类型*  例子中的显示的值是0,即CPU_SUBTYPE_ARM64_ALL*/// #define CPU_SUBTYPE_ARM64_ALL           ((cpu_subtype_t) 0)// #define CPU_SUBTYPE_ARM64_V8            ((cpu_subtype_t) 1)// #define CPU_SUBTYPE_ARM64E              ((cpu_subtype_t) 2)int32_t		cpusubtype;	/* machine specifier */// 文件类型/*** #define	MH_OBJECT	0x1 -- .o文件,.a是.o的合集* #define	MH_EXECUTE	0x2 -- 可执行文件* #define	MH_DYLIB	0x6 -- 动态库* #define	MH_DYLINKER	0x7 -- dyld链接器* #define	MH_DSYM		0xa -- 符号表文件*/// 例子中的是2,即MH_EXECUTE,可执行文件uint32_t	filetype;	/* type of file */// Load Commands加载命令的条数// 例子中是23条uint32_t	ncmds;		/* number of load commands */// Load Commands部分的长度// 例子中是2864byteuint32_t	sizeofcmds;	/* the size of all the load commands */// mach-o的标志,通过位移枚举定义// 例子中的/*** #define	 MH_NOUNDEFS	0x1 -- 没有未定义的引用* #define MH_DYLDLINK	0x4 -- 已经静态链接过了,可以动态链接* #define MH_TWOLEVEL	0x8 -- 链接时:库名 + 函数减少同名冲突 见参考一* #define	MH_PIE 0x200000 -- 每次加载主程序在一个随机地址,增加安全*/uint32_t	flags;		/* flags */// 保留uint32_t	reserved;	/* reserved */
};

Load Commands

每个Load Commands都有对应的结构体

LC_SEGMENT_64

/** The 64-bit segment load command indicates that a part of this file is to be* mapped into a 64-bit task's address space.  If the 64-bit segment has* sections then section_64 structures directly follow the 64-bit segment* command and their size is reflected in cmdsize.*/
struct segment_command_64 { /* for 64-bit architectures */uint32_t	cmd;		/* LC_SEGMENT_64 */uint32_t	cmdsize;	/* includes sizeof section_64 structs */char		segname[16];	/* segment name */uint64_t	vmaddr;		/* memory address of this segment */uint64_t	vmsize;		/* memory size of this segment */uint64_t	fileoff;	/* file offset of this segment */uint64_t	filesize;	/* amount to map from the file */int32_t  maxprot;	/* maximum VM protection */int32_t	initprot;	/* initial VM protection */uint32_t	nsects;		/* number of sections in segment */uint32_t	flags;		/* flags */
};

使用segment_command_64结构体的segment

Segment: __PAGEZERO

__PAGEZERO用于捕捉NULL指针引用

2025-04-18 13.29.35.png

#define LC_SEGMENT_64 0x19 // 即64位的segment// vm_prot.h
typedef int             vm_prot_t;#define VM_PROT_NONE    ((vm_prot_t) 0x00)// 读/写/执行
#define VM_PROT_READ    ((vm_prot_t) 0x01)      /* read permission */
#define VM_PROT_WRITE   ((vm_prot_t) 0x02)      /* write permission */
#define VM_PROT_EXECUTE ((vm_prot_t) 0x04)      /* execute permission */
...
变量名说明
cmd0x19segment的类型
cmdsize0x48segment的长度, 这里是0x48 = 0x000000068 - 0x00000020
segname0x5F5F504147455A45524F000000000000segment的名,这里是__PAGEZERO, ASCII表示:5F = ‘_’,50 = ‘P’,41 = ‘A’…,4F = ‘O’
vmaddr0segment在虚拟内存的起始地址,8个字节uint64_t
vmsize0x0000000100000000segment的长度,2^32 = 4GB,即64位的虚拟内存的前4G都是__PAGEZERO
fileoff0文件的偏移量,从磁盘的角度看
filesize0占用文件的大小,这是磁盘的角度看,实际未占用磁盘大小
maxprot0虚拟内存的最高的权限设置,未设置,即不能读,不能写,也不能被加载到cpu中执行
initprot0初始化时的虚拟内存的权限设置,未设置
nsects0segment中包含的section的数量,这里为0个
flags0标志,没有
Segment: __TEXT 代码

__TEXT用于描述代码segment的一些信息

2025-04-18 13.59.07.png

也是segment_command_64结构体,可以看到这个segment中的initprot中是有VM_PROT_EXECUTE,声明这部分是可以被执行的。segment中9个sections

Section: __text

每个section的结构体如下

struct section_64 { /* for 64-bit architectures */char		sectname[16];	/* name of this section */char		segname[16];	/* segment this section goes in */uint64_t	addr;		/* memory address of this section */uint64_t	size;		/* size in bytes of this section */uint32_t	offset;		/* file offset of this section */uint32_t	align;		/* section alignment (power of 2) */uint32_t	reloff;		/* file offset of relocation entries */uint32_t	nreloc;		/* number of relocation entries */uint32_t	flags;		/* flags (section type and attributes)*/uint32_t	reserved1;	/* reserved (for offset or index) */uint32_t	reserved2;	/* reserved (for count or sizeof) */uint32_t	reserved3;	/* reserved */
};

2025-04-18 14.12.42.png

#define	S_REGULAR		0x0	/* regular section */
#define S_ATTR_PURE_INSTRUCTIONS 0x80000000 // 这个sections只包含机器指令
#define S_ATTR_SOME_INSTRUCTIONS 0x00000400	/* section contains somemachine instructions */
变量名说明
sectname0x5F5F7465787400000000000000000000section的名称,__text
segname0x5F5F5445585400000000000000000000section所属segment的名称,__TEXT
addr0x0000000100005F04虚拟内存的起始地址
size0x0000000000000564section的长度
offset0x5F04代码在文件的具体偏移量,每个应用都不一样
align4对齐
reloff0静态链接重定位,.a文件中__objc_const能看到
nreloc0静态链接重定位的符号的数量
flags0x80000400标志,详见loader.h
reserved1保留,动态链接时的符号
reserved2保留,动态链接时的符号数量
reserved3保留

2025-04-18 14.29.09.png

然后因为__PAGEZERO占用了0x0000000100000000 加上前面文件占用了空间,所以应用的汇编代码的起始位置在0x5F04位置,从上面的截图看确实如此

Section: __stubs

动态链接的符号,看reserved2有12个,这部分在二进制中的地址是0x0000000100006468

2025-04-18 15.19.43.png

0x0000000100006468查看

2025-04-18 15.21.09.png

这里存放的是运行时需要从系统和其他动态库中加载的符号

Section: __stub_helper

加载动态库有rebinding符号的过程,比如上面__stub的需要12个外部的符号,__stub_helper是辅助该过程能顺利完成

Section: __objc_stubs

__objc_stubs is a section in iOS binaries that contains stub functions for Objective-C calls. These stubs are used for debugging and analyzing Objective-C code

iOS Apps compiled with recent versions of XCode can generate stubs for msgSend calls, where each stub is just a call to the actual msgSend address after setting a specific selector:

应该是个高版本SDK跳过消息查找过程,加快方法调用的优化,后面再探究。

Section: __objc_methods

OC方法的信息

#define	S_CSTRING_LITERALS	0x2	/* section with only literal C strings*/ // sections里只有C语言的常量字符串

2025-04-18 15.58.12.png

Section:__objc_classname

OC的类名相关的描述,和__objc_methods差不多

Section:__objc_methtype

OC的方法签名部分的描述

找到Data部分实际存的内容

2025-04-19 16.25.52.png

Section: __cstring

C的常量字符串的描述

Section: __unwind_info

用于存储处理异常情况的信息

Segment: __DATA 数据

对数据部分的组织规则的描述,这部分也有一些sections

Section: __got

非懒加载指针,dyld 加载时会立即绑定表项中的符号

2025-04-18 17.39.32.png

dyld_stub_binder 负责绑定符号,objc_msgSend消息发送,这两个懒加载没有意义

Seciton: __la_symbol_ptr

相对的是懒加载指针,表中的指针一开始都指向 __TEXT.__stub_helper

Section: __cfstring

Core Foundation 字符串

Section: __objc_classlist

记录了App中所有的class,包括meta class。该节中存储的是一个个的指针,指针指向的地址是class结构体所在的地址

2025-04-18 20.40.30.png

这里Address是0x100008090,去掉前面的0x100000000(__PAGEZERO),找0x8090的地址

2025-04-18 20.41.38.png

里面的值是0x00000001000091A0,描述是指针,再去找0x91A0,走到__DATA.__objc_data,这里存着实际的OC的类

2025-04-18 20.49.16.png

Section: __objc_protolist

2025-04-18 21.00.38.png

0x1000080A8 => 0x0000000100009298,到了 __DATA.__data

2025-04-18 21.00.48.png

2025-04-18 21.03.23.png

Section: __objc_imageInfo

主要用来区分OC的版本是 1.0 还是 2.0

Section: __objc_const

记录在OC内存初始化过程中的不可变内容,比如 method_t 结构体定义

Section: __objc_selrefs

标记哪些SEL对应的字符串被引用了

Section: __objc_classrefs

标记哪些类被引用了

Section: __objc_superrefs

Objective-C 超类引用

Section: __objc_ivar

存储程序中的 ivar 变量

Section: __objc_data

用于保存 OC 类需要的数据。最主要的内容是映射 __objc_const 地址,用于找到类的相关数据

Section: __data

初始化过的可变数据

Segment: __LINKEDIT16.23.03

fileOffset是 0xc000,size是0x7850,两者相加得 0x13850,从下图可知Dynamic Loader Info 到Code Signature都是这个区间内,里面包含动态库加载哪些符号,符号表,二进制的签名信息。所以可执行文件的加载指令后的实际内容就是__TEXT,__DATA,__LINKEDIT,__PAGEZERO是占位

# 用size命令显示macho文件时就是4个段
$ size -x -m path/to/macho-execute

2025-04-18 18.24.32.png

2025-04-18 16.26.29.png

2025-04-18 16.28.03.png

使用其他结构体的Command

Command:LC_DYLD_INFO_ONLY

描述dyld要绑定动态库的哪些符号,是强绑定还是弱绑定

/** The dyld_info_command contains the file offsets and sizes of * the new compressed form of the information dyld needs to * load the image.  This information is used by dyld on Mac OS X* 10.6 and later.  All information pointed to by this command* is encoded using byte streams, so no endian swapping is needed* to interpret it. */
struct dyld_info_command {uint32_t   cmd;		/* LC_DYLD_INFO or LC_DYLD_INFO_ONLY */uint32_t   cmdsize;		/* sizeof(struct dyld_info_command) */uint32_t   rebase_off;	/* file offset to rebase info  */uint32_t   rebase_size;	/* size of rebase info   */uint32_t   bind_off;	/* file offset to binding info   */uint32_t   bind_size;	/* size of binding info  */uint32_t   weak_bind_off;uint32_t   weak_bind_size;  /* size of weak binding info  */uint32_t   lazy_bind_off;uint32_t   lazy_bind_size;  /* size of lazy binding infs */uint32_t   export_off;	/* file offset to lazy binding info */uint32_t   export_size;	/* size of lazy binding infs */
};
Command: LC_SYMTAB

macho文件的符号表的描述

/** The symtab_command contains the offsets and sizes of the link-edit 4.3BSD* "stab" style symbol table information as described in the header files* <nlist.h> and <stab.h>.*/
struct symtab_command {uint32_t	cmd;		/* LC_SYMTAB */uint32_t	cmdsize;	/* sizeof(struct symtab_command) */uint32_t	symoff;		/* symbol table offset */uint32_t	nsyms;		/* number of symbol table entries */uint32_t	stroff;		/* string table offset */uint32_t	strsize;	/* string table size in bytes */
};
Command: LC_DYSYMTAB

macho文件依赖的动态库的符号表

Command: LC_LOAD_DYLINKER

加载dyld链接器

/** A program that uses a dynamic linker contains a dylinker_command to identify* the name of the dynamic linker (LC_LOAD_DYLINKER).  And a dynamic linker* contains a dylinker_command to identify the dynamic linker (LC_ID_DYLINKER).* A file can have at most one of these.* This struct is also used for the LC_DYLD_ENVIRONMENT load command and* contains string for dyld to treat like environment variable.*/
struct dylinker_command {uint32_t	cmd;		/* LC_ID_DYLINKER, LC_LOAD_DYLINKER orLC_DYLD_ENVIRONMENT */uint32_t	cmdsize;	/* includes pathname string */union lc_str    name;		/* dynamic linker's path name */
};

2025-04-18 16.41.23.png

Command: LC_UUID

静态连接器生成的128位随机数,用于标识macho文件

/** The uuid load command contains a single 128-bit unique random number that* identifies an object produced by the static link editor.*/
struct uuid_command {uint32_t	cmd;		/* LC_UUID */uint32_t	cmdsize;	/* sizeof(struct uuid_command) */uint8_t	uuid[16];	/* the 128-bit uuid */
};
Command: LC_VERSION_MIN_IPHONEOS

指定最低版本号

/** The version_min_command contains the min OS version on which this * binary was built to run.*/
struct version_min_command {uint32_t	cmd;		/* LC_VERSION_MIN_MACOSX orLC_VERSION_MIN_IPHONEOS orLC_VERSION_MIN_WATCHOS orLC_VERSION_MIN_TVOS */uint32_t	cmdsize;	/* sizeof(struct min_version_command) */uint32_t	version;	/* X.Y.Z is encoded in nibbles xxxx.yy.zz */uint32_t	sdk;		/* X.Y.Z is encoded in nibbles xxxx.yy.zz */
};
Command: LC_SOURCE_VERSION

指定iOS SDK系统库的版本

/** The source_version_command is an optional load command containing* the version of the sources used to build the binary.*/
struct source_version_command {uint32_t  cmd;	/* LC_SOURCE_VERSION */uint32_t  cmdsize;	/* 16 */uint64_t  version;	/* A.B.C.D.E packed as a24.b10.c10.d10.e10 */
};
Command: LC_MAIN

应用程序入口

/** The entry_point_command is a replacement for thread_command.* It is used for main executables to specify the location (file offset)* of main().  If -stack_size was used at link time, the stacksize* field will contain the stack size need for the main thread.*/
struct entry_point_command {uint32_t  cmd;	/* LC_MAIN only used in MH_EXECUTE filetypes */uint32_t  cmdsize;	/* 24 */uint64_t  entryoff;	/* file (__TEXT) offset of main() */uint64_t  stacksize;/* if not zero, initial stack size */
};

2025-04-18 16.52.55.png

地址是 0x6120,找到对应地址可知就是 _main函数的地址

2025-04-18 16.53.19.png

Command: LC_ENCRYPTION_INFO_64
/** The encryption_info_command contains the file offset and size of an* of an encrypted segment.*/
struct encryption_info_command {uint32_t	cmd;		/* LC_ENCRYPTION_INFO */uint32_t	cmdsize;	/* sizeof(struct encryption_info_command) */uint32_t	cryptoff;	/* file offset of encrypted range */uint32_t	cryptsize;	/* file size of encrypted range */uint32_t	cryptid;	/* which enryption system,0 means not-encrypted yet */
};

加密部分是Crypt Offset:0x4000 , Crypt Size: 0x4000,两者相加末尾地址为0x8000,根据下图看,实际加密的部分是代码Segment的内容

2025-04-18 17.11.15.png

2025-04-18 17.11.34.png

Command: LC_LOAD_DYLIB

有若干个该命令,用于加载系统及应用链接的动态库

/** Dynamicly linked shared libraries are identified by two things.  The* pathname (the name of the library as found for execution), and the* compatibility version number.  The pathname must match and the compatibility* number in the user of the library must be greater than or equal to the* library being used.  The time stamp is used to record the time a library was* built and copied into user so it can be use to determined if the library used* at runtime is exactly the same as used to built the program.*/
struct dylib {union lc_str  name;			/* library's path name */uint32_t timestamp;			/* library's build time stamp */uint32_t current_version;		/* library's current version number */uint32_t compatibility_version;	/* library's compatibility vers number*/
};/** A dynamically linked shared library (filetype == MH_DYLIB in the mach header)* contains a dylib_command (cmd == LC_ID_DYLIB) to identify the library.* An object that uses a dynamically linked shared library also contains a* dylib_command (cmd == LC_LOAD_DYLIB, LC_LOAD_WEAK_DYLIB, or* LC_REEXPORT_DYLIB) for each library it uses.*/
struct dylib_command {uint32_t	cmd;		/* LC_ID_DYLIB, LC_LOAD_{,WEAK_}DYLIB,LC_REEXPORT_DYLIB */uint32_t	cmdsize;	/* includes pathname string */struct dylib	dylib;		/* the library identification */
};

2025-04-18 17.17.48.png

name字段指明加载路径

Command: LC_RPATH

前面动态库name里有@rpath变量的描述,@rpath的值在这里指定

Command: LC_FUNCTION_STARTS

该命令用于描述函数的起始地址信息,指向了链接信息段中 Function Starts 的首地址 Function Starts 定义了一个函数起始地址表,调试器和其他程序通过该表可以很容易地判断出一个地址是否在函数内

Command: LC_DATA_IN_CODE

该命令使用一个 struct linkedit_data_command 指向一个 data_in_code_entry 数组 data_in_code_entry 数组中的每一个元素,用于描述代码段中一个存储数据的区域

Command: LC_CODE_SIGATURE

签名信息的描述,从这里可知,二进制文件的签名是在文件内

Data

Load Commands部分是在描述MachO文件如何组织。比如代码部分的长度是多少,这种很像C语言操作数组时要传长度。如果再扩展一下概念,网络协议通过各种包的格式控制数据的传输,那前面这些命令也是在控制如何解析后面的Data。

参考

  1. MacOS 链接特性:Two-Level Namespace
  2. ghidra-issues
  3. MachO文件学习笔记
http://www.xdnf.cn/news/34867.html

相关文章:

  • Datawhale AI春训营 世界科学智能大赛--合成生物赛道:蛋白质固有无序区域预测 小白经验总结
  • 【信息系统项目管理师】高分论文:论信息系统项目的风险管理(钢铁企业生产计划管理系统)
  • 支持中文对齐的命令行表格打印python库——tableprint
  • cesium中postProcessStages全面解析
  • 13.第二阶段x64游戏实战-分析人物等级和升级经验
  • JNI 学习
  • Linux基础IO(九)之软链接
  • 洛谷P3373线段树详解【模板】
  • QML动画--ParticleSystem
  • 构造函数和析构函数
  • 数据结构排序算法全解析:从基础原理到实战应用
  • LabVIEW 程序维护:为何选靠谱团队?
  • C# 变量||C# 常量
  • Linux教程-常用命令系列一
  • 定制一款国密浏览器(10):移植SM2算法前,解决错误码的定义问题
  • 如何实现一个MCP server呢?
  • 基于蚁群算法的柔性车间调度最优化设计
  • mysql的函数(第二期)
  • Linux下 文件的查找、复制、移动和解压缩
  • spring-batch批处理框架(1)
  • Qt项目——Tcp网络调试助手服务端与客户端
  • 时态--06--现在完成時
  • GPT-SoVITS 使用指南
  • 【概率论】条件期望
  • 【网络原理】UDP协议
  • 下采样(Downsampling)
  • stm32(gpio的四种输出)
  • c++:线程(std::thread)
  • java怎么找bug?Arthas原理与实战指南
  • opencv图像旋转(单点旋转的原理)