Understanding Mach-O Header
At the start of every Mach-O file, there is a header. It contains basic information about the rest of the file.
Header structure for 32-bit architecture. The structure can be found under user/include/mach-o/loader.h
We can visualize the Mach-O header to be something like this.
struct mach_header {
uint32_t magic; /* mach magic number identifier */
cpu_type_t cputype; /* cpu specifier */
cpu_subtype_t cpusubtype; /* machine specifier */
uint32_t filetype; /* type of file */
uint32_t ncmds; /* number of load commands */
uint32_t sizeofcmds; /* the size of all the load commands */
uint32_t flags; /* flags */
};
Header structure for 64 bit architecture
struct mach_header_64 {
uint32_t magic; /* mach magic number identifier */
cpu_type_t cputype; /* cpu specifier */
cpu_subtype_t cpusubtype; /* machine specifier */
uint32_t filetype; /* type of file */
uint32_t ncmds; /* number of load commands */
uint32_t sizeofcmds; /* the size of all the load commands */
uint32_t flags; /* flags */
uint32_t reserved; /* reserved */
};
Magic Number Identifier (magic)
For 32-bit
#define MH_MAGIC 0xfeedface /* the mach magic number */
#define MH_CIGAM 0xcefaedfe /* NXSwapInt(MH_MAGIC) */
For 64-bit
#define MH_MAGIC_64 0xfeedfacf /* the 64-bit mach magic number */
#define MH_CIGAM_64 0xcffaedfe /* NXSwapInt(MH_MAGIC_64) */
You can see two variants for the ‘magic number’.
They are MAGIC at its reverse CIGAM. CIGAM / CIGAM_64 represents all the bytes must be swapped / reversed since the host machine on which the binary was created has the opposite byte alignment to that of the target machine.
CPU Type (cputype)
The CPU Type field shows the architecture targeted by the binary. The type cpu_type_t is an integer alias.
#define CPU_TYPE_ANY ((cpu_type_t) -1)
#define CPU_TYPE_VAX ((cpu_type_t) 1)
/* skip ((cpu_type_t) 2) */
/* skip ((cpu_type_t) 3) */
/* skip ((cpu_type_t) 4) */
/* skip ((cpu_type_t) 5) */
#define CPU_TYPE_MC680x0 ((cpu_type_t) 6)
#define CPU_TYPE_X86 ((cpu_type_t) 7)
#define CPU_TYPE_I386 CPU_TYPE_X86 /* compatibility */
#define CPU_TYPE_X86_64 (CPU_TYPE_X86 | CPU_ARCH_ABI64)
/* skip CPU_TYPE_MIPS ((cpu_type_t) 8) */
/* skip ((cpu_type_t) 9) */
#define CPU_TYPE_MC98000 ((cpu_type_t) 10)
#define CPU_TYPE_HPPA ((cpu_type_t) 11)
#define CPU_TYPE_ARM ((cpu_type_t) 12)
#define CPU_TYPE_ARM64 (CPU_TYPE_ARM | CPU_ARCH_ABI64)
#define CPU_TYPE_ARM64_32 (CPU_TYPE_ARM | CPU_ARCH_ABI64_32)
#define CPU_TYPE_MC88000 ((cpu_type_t) 13)
#define CPU_TYPE_SPARC ((cpu_type_t) 14)
#define CPU_TYPE_I860 ((cpu_type_t) 15)
/* skip CPU_TYPE_ALPHA ((cpu_type_t) 16) */
/* skip ((cpu_type_t) 17) */
#define CPU_TYPE_POWERPC ((cpu_type_t) 18)
#define CPU_TYPE_POWERPC64 (CPU_TYPE_POWERPC | CPU_ARCH_ABI64)
/* skip ((cpu_type_t) 19) */
/* skip ((cpu_type_t) 20 */
/* skip ((cpu_type_t) 21 */
/* skip ((cpu_type_t) 22 */
CPU SubType (cpusubtype)
This field specifies the specific machine the code can run. I won’t paste the whole list in the source file here.
However, a few interesting values can be mentioned.
#define CPU_SUBTYPE_ANY ((cpu_subtype_t) -1)
The definition in the header file was hard fo me to understand. So I will quote it verbatim here
When selecting a slice, ANY will pick the slice with the best
grading for the selected cpu_type_t, unlike the “ALL” subtypes,
which are the slices that can run on any hardware for that cpu type.
File Types (filetype)
This field let us know what kind of file the Mach-O represents and also it defines what the layout of the file will be. Let us examine a few. See mach-o/loader.h for more references.
File Type |
Flag |
Description |
MH_OBJECT |
0x1 |
Represents intermediate files produced by compiler or assembler. This is used by .o files |
MH_EXECUTE |
0x2 |
A standard executable file. |
MH_DYLIB |
0x6 |
Represent a .dylib or dynamically linked binary |
MH_BUNDLE |
0x8 |
Represent a .bundle file |
MH_DSYM |
0xa |
The file storing symbol information. Services like Firebase uses these files to reproduce the class names and details that lead to a crash |
MH_APP_EXTENSION_SAFE |
0x02000000 |
This seems to represent .appex or App extension files |
MH_SIM_SUPPORT |
0x08000000 |
Possibly represent tvOS, watchOS, iOS app builds that can be executed on Simulator |
MH_DYLIB_IN_CACHE |
0x80000000 |
Represents ‘dylibs’ that are part of shared cache. Think UIKit or Foundation frameworks |
These are flags, implies filetypes can represent one or more file types since they can be ‘OR’ed. filetype field for both 32/64 bit architectures is a 32-bit unsigned integer.
Number of load commands (ncmds)
Before explaining what this field is, let us answer an important question
What are load commands?
Loading is the process of bringing a program into the main memory (RAM) so it could be executed. And load commands specifies how to do it. The process of loading the following happens
- Find how much address space is required by the executable
- Allocate the address space, in seperate segments if required
- Read the program into the segments in the address space
- Zero out any bss space at the end of the program if the virtual memory system doesn’t do it automatically
- Create a stack segment if needed
- Setup any runtime information such as arguments or environment variables
- Start the program.
Now, back to the ncmds
field. It defines the total number of all load commands in the Mach-O file
Size of load commands (sizeofcmds)
Defines how many bytes the load commands occupy in the Mach-O binary.
Flags (flags)
They represent bit flags, to indicate optional features in the Mach-O files. We won’t be discussing much about them in this post
Reserved (reserved)
This field only exist for 64-bit Mach-O binaries. As its everywhere, its ‘reserved’ for future use.