Implement an allocator for allocating memory region of different lengths.
The allocator registers 4MB memory chunks as memory regions, and select a
free segment from one of them each time.
4KB is the minimum allocation unit, and free segments in the same chunk can be
merged into a larger free segment by the rule of buddy allocation. As a result,
size of allocated segments will be power of 2, this may waste some space but
produces less fragments.
In each chunk, a complete binary tree (which is actully an array) is used
to maintain free segments. Each node records the order of the largest segment
can be allocated from its subtree. Here's a miniature example.
A chunk with all segments free:
level 4 4(0x1)
level 3 3(0x2) 3(0x3)
level 2 2(0x4) 2(0x5) 2(0x6) 2(0x7)
level 1 1(0x8) 1(0x9) 1(0xa) 1(0xb) 1(0xc) 1(0xd) 1(0xe) 1(0xf)
After allocate a 16KB(order=3) memory region:
level 4 3(0x1)
level 3 0(0x2) 3(0x3)
level 2 2(0x4) 2(0x5) 2(0x6) 2(0x7)
level 1 1(0x8) 1(0x9) 1(0xa) 1(0xb) 1(0xc) 1(0xd) 1(0xe) 1(0xf)
It tooks 1 comparison to determine if a chunk can satisfy and at most 11
loops to find the leftmost free segment meets the requirments.
The value of each node is not more than 11, and a 8-bit integer is enough
to store it, so only 2048 bytes is required for each tree. And since the
entire tree is in a contiguous piece of memory and no rotations are needed,
it's far more efficient than self-balancing trees of the same size.
Different 4MB chunks are linked as a list, and the selection order is from
head to tail each time. If no existing chunks can satisfy the allocation,
the allocator will register another 4M chunk and add it to the tail.
+---------+ +---------+ +---------+
|4MB chunk| ---> |4MB chunk| ---> |4MB chunk|
+---------+ +---------+ +---------+
In most cases, smaller IOs can always get memory regions from the first or
second chunk and never traverse the list too much, and if we really send a lot
of large IOs, the cost of the traversal is rarely critical.
At last, obviously, the chunks can only allocate a maximum of 4MB memory region,
if a larger memory region is needed, the allocater registers/deregisters a
memory region directly regardless of buffer.
Signed-off-by: wanghonghao <wanghonghao@bytedance.com>