本文内容基于 glibc-2.43 版本

内存分配

64 位 Linux 内存布局如下图所示

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# 64-bit Linux (x86_64) Userspace Virtual Memory Layout
#
# High addresses
# 0xffffffffffffffff +---------------------------------------------------+
# | (non-canonical / guard) |
# 0xffff800000000000 +---------------------------------------------------+
# | Kernel space |
# | direct map, vmalloc, modules, kernel text, ... |
# | (not accessible from user mode) |
# 0x0000800000000000 +---------------------------------------------------+
# | (non-canonical / hole) |
# 0x00007fffffffffff +---------------------------------------------------+
# | User space (canonical) |
# | |
# | [stack] grows down |
# | | |
# | v |
# | +------------------------------+ |
# | | Stack | |
# | +------------------------------+ |
# | | argv/envp/auxv (startup) | |
# | +------------------------------+ |
# | |
# | (mmap region: shared libs, anon mmaps, files) |
# | +------------------------------+ |
# | | libc.so / ld-linux.so | |
# | | other .so, mapped files | |
# | | anonymous mappings | |
# | +------------------------------+ |
# | ^ |
# | | grows down (typical) |
# | |
# | [heap] grows up |
# | ^ |
# | | |
# | +------------------------------+ |
# | | Heap | (brk/sbrk) |
# | +------------------------------+ |
# | |
# | +------------------------------+ |
# | | .bss (zero-inited) | |
# | | .data (inited) | |
# | | .rodata | |
# | | .text (code) | |
# | +------------------------------+ |
# | | ELF header / program headers | |
# 0x0000000000400000 +--+------------------------------+-----------------+
# | (PIE main exe may be randomized elsewhere) |
# 0x0000000000000000 +---------------------------------------------------+
# Low addresses
#
#
# Notes:
# - Addresses are approximate; ASLR randomizes stack, mmap base, and PIE executables.
# - x86_64 uses "canonical" addresses (typically 48-bit, newer CPUs support 57-bit w/ 5-level paging).
# - Kernel/user split shown is a common layout; exact boundaries differ by config.

在编写 C/C++程序时,往往需要动态分配内存,现代内存分配器同时管理 heap 空间和 mmap 空间,位置关系如上图所示,简单地说,malloc 背后小块分配 heap,大块分配 mmap(参考 malloc.c 源代码

简单的内存分配器运行原理如下(仅考虑堆空间)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Low addresses                         High addresses
+-----------------------------------------------------+
| [chunk A] [chunk B] [chunk C] ... |
+-----------------------------------------------------+


chunk pointer(start address)
|
v
+----------------------+---------------------------+
| header | user data |
+----------------------+---------------------------+
| size | flags | next | ...... payload ...... |
+----------------------+---------------------------+
^ ^ ^
| | +-- Points to the next free chunk in the free list
| +--------- e.g.: flags such as whether the chunk is in use (inuse)
+------------------ Total size of the chunk (including header + payload)


Free List Structure
free_head
|
v
+----------+ +----------+ +----------+
| chunk X | --> | chunk Y | --> | chunk Z | --> NULL
+----------+ +----------+ +----------+

malloc 的基本动作在 Free List 里找

  • 遍历 Free List:找到 size 足够的 chunk
  • 标记为 inuse,并返回 &chunk-> user_data

free(p) 的基本动作把 chunk 放回 Free List

  • 通过 p 回到 chunk header(p 减去 header 大小)
  • 标记为 free,并插入 Free List

这样的运行机制,mallocfree 都需要遍历链表,效率较低 ,而且频繁地进行 mallocfree 操作,容易造成内存碎片,浪费内存。

ptmalloc 运行机制

ptmalloc 是 GNU C Library (glibc) 中默认的内存分配器,它基于 Doug Lea 的 dlmalloc 实现,并针对多线程环境进行了扩展。

其设计核心在于

  • 减少系统调用次数,加快内存分配和释放速度;
  • 通过内存合并和复用,降低内存碎片化程度;
  • 支持多线程并发分配,避免锁竞争;

数据结构

ptmalloc 采用三级管理结构,分别为

1
2
3
4
5
6
7
8
9
10
┌───────────────────────────────────────
│ 第1层: Arena(线程内存区)
│ └── 管理多个 Heap,含独立锁(mutex)
├───────────────────────────────────────
│ 第2层: Heap(堆)
│ └── 通过 brk 或 mmap 获得的连续内存区域
├───────────────────────────────────────
│ 第3层: Chunk(内存块)
│ └── 实际分配/释放的最小单元
└───────────────────────────────────────

内存块(malloc_chunk)

ptmalloc 将堆内存划分为 chunk,使用 malloc_chunk 数据结构描述(参考 malloc.c 源代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
/*
----------------------- Chunk representations -----------------------
*/


/*
This struct declaration is misleading (but accurate and necessary).
It declares a "view" into memory allowing access to necessary
fields at known offsets from a given base. See explanation below.
*/

struct malloc_chunk {

INTERNAL_SIZE_T mchunk_prev_size; /* Size of previous chunk (if free). */
INTERNAL_SIZE_T mchunk_size; /* Size in bytes, including overhead. */

struct malloc_chunk* fd; /* double links -- used only if free. */
struct malloc_chunk* bk;

/* Only used for large blocks: pointer to next larger size. */
struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */
struct malloc_chunk* bk_nextsize;
};


/*
malloc_chunk details:

(The following includes lightly edited explanations by Colin Plumb.)

Chunks of memory are maintained using a `boundary tag' method as
described in e.g., Knuth or Standish. (See the paper by Paul
Wilson ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps for a
survey of such techniques.) Sizes of free chunks are stored both
in the front of each chunk and at the end. This makes
consolidating fragmented chunks into bigger chunks very fast. The
size fields also hold bits representing whether chunks are free or
in use.

An allocated chunk looks like this:


chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Size of previous chunk, if unallocated (P clear) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Size of chunk, in bytes |A|M|P|
mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| User data starts here... .
. .
. (malloc_usable_size() bytes) .
. |
nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| (size of chunk, but used for application data) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Size of next chunk, in bytes |A|0|1|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Where "chunk" is the front of the chunk for the purpose of most of
the malloc code, but "mem" is the pointer that is returned to the
user. "Nextchunk" is the beginning of the next contiguous chunk.

Chunks always begin on even word boundaries, so the mem portion
(which is returned to the user) is also on an even word boundary, and
thus at least double-word aligned.

Free chunks are stored in circular doubly-linked lists, and look like this:

chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Size of previous chunk, if unallocated (P clear) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
`head:' | Size of chunk, in bytes |A|0|P|
mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Forward pointer to next chunk in list |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Back pointer to previous chunk in list |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Unused space (may be 0 bytes long) .
. .
. |
nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
`foot:' | Size of chunk, in bytes |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Size of next chunk, in bytes |A|0|0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The P (PREV_INUSE) bit, stored in the unused low-order bit of the
chunk size (which is always a multiple of two words), is an in-use
bit for the *previous* chunk. If that bit is *clear*, then the
word before the current chunk size contains the previous chunk
size, and can be used to find the front of the previous chunk.
The very first chunk allocated always has this bit set,
preventing access to non-existent (or non-owned) memory. If
prev_inuse is set for any given chunk, then you CANNOT determine
the size of the previous chunk, and might even get a memory
addressing fault when trying to do so.

The A (NON_MAIN_ARENA) bit is cleared for chunks on the initial,
main arena, described by the main_arena variable. When additional
threads are spawned, each thread receives its own arena (up to a
configurable limit, after which arenas are reused for multiple
threads), and the chunks in these arenas have the A bit set. To
find the arena for a chunk on such a non-main arena, heap_for_ptr
performs a bit mask operation and indirection through the ar_ptr
member of the per-heap header heap_info (see arena.c).

Note that the `foot' of the current chunk is actually represented
as the prev_size of the NEXT chunk. This makes it easier to
deal with alignments etc but can be very confusing when trying
to extend or adapt this code.

The two exceptions to all this are:

1. The special chunk `top' doesn't bother using the
trailing size field since there is no next contiguous chunk
that would have to index off it. After initialization, `top'
is forced to always exist. If it would become less than
MINSIZE bytes long, it is replenished.

2. Chunks allocated via mmap, which have the second-lowest-order
bit M (IS_MMAPPED) set in their size fields. Because they are
allocated one-by-one, each must contain its own trailing size
field. If the M bit is set, the other bits are ignored
(because mmapped chunks are neither in an arena, nor adjacent
to a freed chunk). The M bit is also used for chunks which
originally came from a dumped heap via malloc_set_state in
hooks.c.
*/

关键字段说明如下

  • mchunk_prev_size:仅当前一个物理相邻的 chunk 为空闲时,该字段才有效,表示前一个 chunk 的大小;若前一个 chunk 已分配,则该字段被前一个 chunk 复用为用户数据区;
  • mchunk_size:表示当前 chunk 的大小(包括 chunk 头部),低 3 位用作标志位
    • P (PREV_INUSE):第 0 位,表示前一个 chunk 是否在使用中(1 = 已分配,0 = 空闲);
    • M (IS_MMAPPED):第 1 位,表示该 chunk 是否通过 mmap 分配(1 = mmap,0 = 堆内存);
    • N (NON_MAIN_ARENA):第 2 位,表示该 chunk 是否属于非主 arena(1 = 非主 arena,0 = 主 arena);
  • fd/bk:仅当 chunk 空闲时有效,用于将空闲 chunk 链接到双向链表 Bins 中。

空闲链表(Bins)

为了加快分配速度,ptmalloc 使用多种 bins 管理不同大小的空闲 chunk,分别为 Fast Bins, Unsorted Bins, Small Bins, Large Bins

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
┌───────────────────────────────────────────────────────
│ Fast Bins (fastbins)
│ ├── 数量: 10 个 (索引 0-9)
│ ├── 大小: 16, 24, 32, 40, 48, 56, 64, 72, 80, 88 字节
│ └── 特性: 管理小内存;LIFO单链表;不合并相邻空闲块,分配/释放速度快
├───────────────────────────────────────────────────────
│ Unsorted Bin (unsorted bin)
│ ├── 数量: 1 个
│ └── 特性: 释放后的 chunk 首先放入这里,"缓存"机制(局部性原理)
├───────────────────────────────────────────────────────
│ Small Bins (smallbins)
│ ├── 数量: 62 个 (索引 2-63)
│ ├── 大小: 16, 24, ..., 504 字节 (间隔8字节)
│ └── 特性: FIFO循环双向链表,自动合并
├───────────────────────────────────────────────────────
│ Large Bins (largebins)
│ ├── 数量: 64 个 (索引 64-127)
│ ├── 大小: 512+ 字节,非均匀分布
│ │ 512-1024: 64字节间隔, 1024-2048: 128字节间隔...
│ └── 特性: 双向链表,按大小排序,支持 best-fit
└────────────────────────────────────────────────────────

位于堆顶部的 chunk 特殊标记为 Top Chunk,当所有的 bins 中都找不到合适的 chunk 时,ptmalloc 会尝试从 Top Chunk 中分配内存,如果内存不足,ptmalloc 会通过 sbrk() 扩展堆(主 arena)或创建新的 heap(非主 arena)

线程内存区(Arena)

为了支持多线程,ptmalloc 引入 arena

  • 主 arena:启动进程时创建,使用 sbrk() 管理内存;
  • 非主 arena:当多线程并发分配时,ptmalloc 会创建多个非主 arena,每个 arena 使用 mmap() 分配独立的内存池;

当每个线程首次分配内存时,会绑定到一个 arena,后续分配/释放在该 arena 中进行,减少锁竞争。

内存分配流程

当调用 malloc(size) 函数分配内存时,调用的是 malloc.c 文件中的 __libc_malloc 函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#if IS_IN (libc)
weak_alias (__malloc_info, malloc_info)

strong_alias (__libc_calloc, __calloc) weak_alias (__libc_calloc, calloc)
strong_alias (__libc_free, __free) strong_alias (__libc_free, free)
strong_alias (__libc_malloc, __malloc) strong_alias (__libc_malloc, malloc)
strong_alias (__libc_memalign, __memalign)
weak_alias (__libc_memalign, memalign)
strong_alias (__libc_realloc, __realloc) strong_alias (__libc_realloc, realloc)
strong_alias (__libc_valloc, __valloc) weak_alias (__libc_valloc, valloc)
strong_alias (__libc_pvalloc, __pvalloc) weak_alias (__libc_pvalloc, pvalloc)
strong_alias (__libc_mallinfo, __mallinfo)
weak_alias (__libc_mallinfo, mallinfo)
strong_alias (__libc_mallinfo2, __mallinfo2)
weak_alias (__libc_mallinfo2, mallinfo2)
strong_alias (__libc_mallopt, __mallopt) weak_alias (__libc_mallopt, mallopt)

weak_alias (__malloc_stats, malloc_stats)
weak_alias (__malloc_usable_size, malloc_usable_size)
weak_alias (__malloc_trim, malloc_trim)
#endif
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
malloc(size) 调用流程:

├─ 1. 转换请求大小为 chunk 大小
│ └─ 将请求大小向上对齐到 `CHUNK_HDR_SZ`,确保 chunk 地址对齐到 8/16 字节 + 头部开销

├─ 2. 判断大小范围
│ │
│ ├─ ≤ MAX_FAST_SIZE (88字节): Fast Bin 路径
│ │ └─ 从对应 fastbin 取第一个 chunk,返回
│ │
│ ├─ ≤ MIN_LARGE_SIZE (512字节): Small Bin 路径
│ │ └─ 从对应 smallbin 取 chunk(FIFO)
│ │
│ ├─ ≤ 128KB (默认): Large Bin 路径
│ │ └─ 遍历 unsorted bin → 排序到 bins → best-fit 查找
│ │
│ └─ > 128KB: 直接 mmap 分配
│ └─ 独立匿名映射,单独管理,munmap 释放

├─ 3. 如果 bins 中无合适 chunk
│ └─ 使用 Top Chunk(arena 顶部的剩余空间)
│ └─ 如果 Top Chunk 不够 → 扩展堆(brk/mmap)

└─ 4. 切割 chunk(如果需要)
└─ 剩余部分形成新的空闲 chunk,放入 unsorted bin

内存回收流程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
free(ptr) 调用流程:

├─ 1. 检查 ptr 有效性(非NULL、对齐、在堆范围内)

├─ 2. 获取 chunk 指针和 arena
│ └─ mem2chunk(ptr) → chunk; arena_for_chunk(chunk)

├─ 3. 判断 chunk 类型
│ │
│ ├─ IS_MMAPPED: 直接 munmap 释放
│ │
│ └─ 普通 chunk:
│ ├─ 检查相邻 chunk 是否空闲(合并)
│ │ └─ 前一 chunk 空闲: 合并到前一个
│ │ └─ 后一 chunk 空闲: 合并到当前
│ │
│ ├─ 判断合并后大小
│ │ ├─ ≤ MAX_FAST_SIZE: 放入 fast bin(不合并时)
│ │ └─ > MAX_FAST_SIZE: 放入 unsorted bin
│ │
│ └─ 尝试向后合并 top chunk
│ └─ 如果相邻 top chunk,合并到 top chunk

└─ 4. 触发 malloc_consolidate(特定条件)
└─ 将 fast bins 中的 chunk 合并并移入 unsorted bin

线程缓存(Tcache - glibc 2.26+)

参考资料

The GNU C Library (glibc)

glibc.git

深入理解 ptmalloc 的运作机制

ptmalloc 源码分析