Department of Computer Science & Engineering
WUCSE-2004-58
Heap Defragmentation in Bounded Time (Extended Abstract)
Authors: Cholleti, Sharath; Defoe, Delvin; Cytron, Ron K.
October 7, 2004
Abstract: Knuth's buddy system is an attractive algorithm for managing storage
allocation, and it can be made to operate in real time. However, the issue of
defragmentation for heaps that are managed by the buddy system has not been
studied. In this paper, we present strong bounds on the amount of storage
necessary to avoid defragmentation. We then present an algorithm for
defragmenting buddy heaps and present experiments from applying that
algorithm to real and synthetic benchmarks.
Our algorithm is
within a factor of two of optimal in terms of the time required to
defragment the heap so as to respond to a single allocation request.
Our experiments show
our algorithm to be much more efficient than extant defragmentation
algorithms.
Department of Computer Science & Engineering - Washington University in St. Louis
Campus Box 1045 - St. Louis, MO - 63130 - ph: (314) 935-6160
Heap Defragmentation in Bounded Time∗(Extended Abstract)
Sharath R. Cholleti, Delvin Defoe and Ron K. Cytron
Department of Computer Science and Engineering
Washington University
Saint Louis, MO 63130
Abstract paper we study defragmentation of the buddy al-
locator, whose allocation time is otherwise rea-
Knuth’s buddy system is an attractive algorithm sonably bounded [5] and thus suitable for real-
for managing storage allocation, and it can be time applications.
made to operate in real time. However, the is-
sue of defragmentation for heaps that are man- Over time, the heap becomes fragmented so
aged by the buddy system has not been studied. that the allocator might fail to satisfy an al-
In this paper, we present strong bounds on the location request for lack of sufficient, contigu-
amount of storage necessary to avoid defragmen- ous storage. As a remedy, one can either start
tation. We then present an algorithm for defrag- with a sufficiently large heap so as to avoid frag-
menting buddy heaps and present experiments mentation problems, or devise an algorithm that
from applying that algorithm to real and syn- can rearrange storage in bounded time to sat-
thetic benchmarks. Our algorithm is within a isfy the allocationrequest. We consider both of
factor of two of optimal in terms of the time re- those options in this paper, presenting the first
quired to defragment the heap so as to respond tight bounds on the necessary storage and the
to a single allocation request. Our experiments first algorithm that defragments a buddy heap
show our algorithm to be much more efficient in reasonably bounded time.
than extant defragmentation algorithms.
Our paper is organized as follows. Sec-
1 Introduction tion 1.1 explains the buddy allocator and de-
fragmentation; Section 2 shows how much stor-
When an application starts, the storage alloca- age is necessary for an application-agnostic,
tor usually obtains a large block of storage, called defragmentation-free buddy allocator; Section 3
the heap, from the operating system, which the gives the worst-case storage relocation necessary
allocator uses to satisfy the application’s alloca- for defragmetnation; and Section 4 presents an
tion requests. In real-time or embedded systems algorithm that performs within twice the cost of
the heap size is usually fixed a priori, as the ap- an optimal defragmentation algorithm. Section 5
plication’s needs are known. Storage allocators presents various experimental results of our de-
are characterized by how they allocate and keep fragmentation algorithm and compares its effi-
track of free storage blocks. There are various ciency with extant defragmentation approaches.
types of allocators including unstructured lists,
segragated lists and buddy allocators [9]. In this 1.1 Buddy Allocator
∗Sponsored by DARPA under contract F3333333; con- In the binary buddy system [7], separate lists are
tact author [email protected] maintained for available blocks of size 2k bytes,
0 ≤ k ≤ m, where 2m bytes is the heap size.
1
b1 b2 16 b3 b4 lot. Our analysis of the binary-buddy heap re-
b3 b4 8 b5 b6 quirement is based on an allocator that uses this
4 policy, which we call an Address-Ordered Binary
b5 b6 b7 b8 b9 2 (b) b7 b8 b9 Buddy Allocator (AOBBA).
(a)
1 b2 Address-Ordered Best-Fit Buddy Alloca-
tor In this policy a block of smallest possible
b2 16 (d) size equal to or greater than the required size
8 is selected with preference to the lowest address
(c) 4 block. When a block of required size is not avail-
Occupied 2 able then a block of higher size is selected and
Free split repeatedly until a block of required size is
1 obtained. For example, if a 1-byte block is re-
b7 quested in Figure 1(c), block b7 is chosen. If a
2-byte block is requested then block b2 is cho-
Figure 1: Buddy Example sen. Our implementation of the binary-buddy is
based on this policy.
Initially the entire block of 2m bytes is available.
When a block of 2k bytes is requested, and if no 1.2 Defragmentation
blocks of that size are available, then a larger
block is split into two equal parts repeatedly un- Defragmentation can be defined as moving al-
til a block of 2k bytes is obtained. ready allocated storage blocks to some other ad-
dress so as to create contiguous free blocks that
When a block is split into two equal sized can be coalesced to form a larger free block. Gen-
blocks, these blocks are called buddies (of each erally, defragmentation is performed when the al-
other). If these buddies become free at a later locator cannot satisfy a request for a block. This
time, then they can be coalesced into a larger can be performed by garbage collection [9]. A
block. The most useful feature of this method is garbage collector tries to separate the live and
that given the address and size of a block, the deallocated or unused blocks, and by combining
address of its buddy can be computed very eas- the unused blocks, if possible, usually a larger
ily, with just a bit flip. For example, the buddy block is formed, which the allocator can use to
of the block of size 16 beginning in binary loca- satisfy the allocation requests. Some program-
tion xx . . . x10000 is xx . . . x00000 (where the x’s ming languages like C and C++ need the pro-
represent either 0 or 1). gram to specify when to deallocate an object,
whereas the programming languages, like Java,
Address-Ordered Buddy Allocator The find which objects are live and which are not by
address-ordered policy selects a block of re- using some garbage collection technique. Gen-
quested or greater size with the lowest address. erally garbage collection is done by either iden-
If there are no free blocks of the requested size tifying all the live objects and assuming all the
then the allocator searches for a larger free block, unreachable objects to be dead as in mark and
starting from the lowest address, and continuing sweep collectors [9] or by tracking the objects
until it finds a sufficiently large block to divide when they die as in reference counting [9] and
to get a required-size block. For example, if a contaminated garbage collection techniques [1].
1-byte block is requested in Figure 1(c), block b2
(of 4 bytes) is split to obtain a 1-byte block even In our study, we assume we are given both the
though there is a 1-byte block available in the allocation and deallocation requests. For this
heap. Using this policy the lower addresses of the study it does not matter how the deallocated
heap tend to get preference leaving the other end
unused unless the heap gets full or fragmented a
2
blocks are found – whether it is by explicit deal- Theorem 2.2 The tight bound of M (log M +
location request using free() for C programs or 1)/2 bytes holds for any buddy-style storage man-
by using some garbage collection technique for anger (i.e., not just those that are address-
Java programs. For real-time purposes we get ordered).
the traces for Java programs using contaminated
garbage collection [1], which keeps track objects Proof: There is insufficient room to present the
when they die, instead of mark and sweep collec- proof in this paper; see [4]. The proof shows
tors which might take unreasonably long time. that however an allocator picks the next block
Our work makes use of the following definitions: to allocate, there is a program that can force
the allocator to behave like an address-ordered
1. Maxlive, M , is defined as the maximum allocator.
number of bytes alice at any instant during
the program’s execution. Discussion While a bound of O(M log M ) is
usually quite satisfactory for most problems,
2. Max-blocksize, n, is the size of the largest consider its ramifications for embedded, real-
block the program can allocate. time systems. For a system that has at most
M bytes of live storage at any time, the bound
2 Defragmentation-Free Buddy implies that a factor of log M extra storage is
Allocator required to avoid defragmentation. Even if M is
only the Kilobytes, log M is a factor of 10, mean-
We examine the storage requirements for a ing that the system needs 10x as much storage
binary-buddy allocator so that heap defragmen- as it really uses to avoid defragmentation. In-
tation is never necessary to satisfy an allocation flating the RAM or other storage requirements of
request. If the amount of storage a program can an embedded system by that amount could make
use at any instant is known, assuming worst-case the resulting product noncompetitive in terms of
fragmentation, then a system with that much cost.
storage suffices.
3 Worst case relocation with
Some previous work on bounding the storage heap of M bytes
requirement is found in [8]. Even though the
bound given in that paper is for a system in Given that it is unlikely that suffient storage
which blocks allocated are always a power of 2, would be deployed to avoid defragmentation, we
the allocator assumed is not a buddy allocator. next attack this problem from the opposite side.
We find the amount of storage that has to be re-
Theorem 2.1 M (log M + 1)/2 bytes of located in the worst case, if the allocator has M
storage are necessary and sufficient for a bytes and the maxlive of the program is also ex-
defragmentation-free buddy allocator, where M actly M bytes. By defintion, a program can run
is the maxlive and the max-blocksize. in its maxlive storage; however, having only that
much storage places as much pressure as possible
Proof: See [2] for a detailed proof. The neces- on a defragmentor. The work that must be done
sary part is proven by constructing a program is stated in the theorems below, with the proofs
that requires that number of bytes to avoid de- found in [2].
fragmentation. The sufficiency part is proven by
observing that if each buddy list has M bytes to- Theorem 3.1 With heap of M bytes and
tal, then sufficient storage of any size is available maxlive M bytes, to allocate a s-byte block, where
without defragmentation.
3
s < M, s log s bytes must be relocated, in worst 16 9
2
case. 84 5
Theorem 3.2 With heap of M bytes and 4 3141 Data Structure
maxlive M bytes, to allocate a s-byte block, s − 1 Storage
blocks must be relocated, in worst case. 2 2 1 01 2 21 0
10010110 0011 0110 0011010100110011 0110
Discussion With the smallest possible heap, 100110 Occupied Free
defragmentation may have to move O(s log s)
bytes by moving O(s) blocks to satisfy an al- Figure 2: Data Structure
location of size s. If an application reasonably
bounds its maximum storage request, then the But to relocate these blocks we might have to
minimum work required to satisfy an allocation empty some other blocks by repeated relocations
is also reasonably bounded in s. Unfortunately, if there are no appropriate free blocks.
finding the “right” blocks to move in this man-
ner is not easy, and has been shown to be NP- How much storage must these recursive reloca-
hard [8]. Thus, a heap of just M bytes is not suit- tions move? For example, consider an allocation
able for real-time, embedded applications, and a request for a 256-byte block but there is no such
heap of M log M bytes, while avoiding defrag- free block. According Lemma 4.1, there is a 256-
mentation, is too costly for embedded applica- byte chunk in which less than 128 bytes are live.
tions. Assume there is a 64-byte live block that has to
be moved out of 256-byte chunk. But now sup-
4 Greedy Heuristic with 2M pose there is no free block of 64 bytes. Now,
a 64-byte chunk has to be emptied. Let that
Heap contain a block of 16 bytes. This 16-byte block
cannot be moved in to either the 256 or the 64-
In this section we consider a practical solution to byte chunk, as it is being relocated in the first
the above problems. We consider a heap slightly place to empty those chunks. The result of ac-
bigger than optimal—twice maxlive—and con- counting for the above work is captured by the
sider a heuristic for defragmentation that is lin- follwoing [2]:
ear in the amount of work required to relocate
storage. Here, we use the terms chunk and block Theorem 4.2 With 2M -byte heap, the greedy
(of storage). A 2k-byte chunk is a contiguous approach of selecting a chunk with minimum
storage in the heap which consists of either free amount of live storage for relocation, relocates
or occupied blocks (objects). Our heuristic de- less than twice the amount of storage relocated
fragmentation algorithm is based on the follow- by an optimal strategy.
ing lemma [2]:
Corollary 4.3 With a 2M -byte heap, the
Lemma 4.1 With 2M -byte heap, when alloca- amount of storage relocated to allocate a s-byte
tion for a block of 2k bytes is requested, there block is less than s bytes.
is a 2k-byte chunk with less than 2k−1 bytes live
storage. Heap Manager Algorithm A naive method
for finding the minimally occupied chunk in-
If there is an allocation request for a 2k-byte volves processing the entire storage pool, but
block and there is no free block of 2k bytes then,
from the Lemma 4.1, less than 2k−1 bytes have
to be relocated to create a free 2k-byte block.
4
the complexity of such a search is unacceptable 5 Experimental Results
at O(M ). We therefore use the data structure
shown in Figure 2 to speed our search. Each level Storage requirements as well allocation and deal-
of the tree has a node associated with a chunks location patterns vary from one program to an-
of storage that could be allocatable at that level. other. Hence, storage fragmentation and the
In each node, an integer shows the number of need for defragmentation vary as well. To fa-
live bytes available in the subtree rooted at that cilitate experimentation, we implemented a sim-
node. In Figure 2, the heap is 16 bytes. The ulator that takes the allocation and deallocation
level labeled with “2 →” keeps track of the num- information from a program trace and simulates
ber of live bytes in each 2 − byte chunk. Above the effects of allocation, deallocation, and de-
that, each node at the 4 → level is the sum of fragmentation using buddy algorithm and our
its children, showing how many bytes are free at heuristic described in Section 4.
the 4-byte level, and so on.
For benchmarks, we used the Java SPEC
When a 2-byte block is allocated, the appro- benchmarks [3]. To illustrate the efficiency of
priate node at the 2 → level is updated, but that our defragmentation algorithm, we compare our
information must propagate up the tree using results with the following approaches currently
parent-pointers, taking O(log M ) time. To find in common practice.
a minimally occupied block of the required size,
only that particular level is searched in the data Left-First Compaction This approach com-
structure, decreasing the time complexity of the pacts all the storage to the lower end, based
search to O(M/s), where s is the requested block on address, by filling up the holes with the
size (it has to go through 2M/s numbers to find closest block (to the right) of less than or
a minimum). We have used hardware to reduce equal size.
this kind of search to near-constant time [5] and
we expect a similar result could obtain here. Right-First Compaction This is similar to
left-first compaction in moving all the stor-
The algorithm’s details are straightforward age to the left end of the heap, but it picks
and appear in [4], wherein proofs can be found the blocks by scanning from the right end,
of the following theorems: where we expect storage is less congested
due to our address-ordered policy for allo-
Theorem 4.4 With 2M -byte heap, defragmen- cation.
tation according to the Heap Manager Algo-
rithm(Section 4) to satisfy a single allocation re- Compaction Without Using Buddy Properties
quest of a s-byte block takes O(M s0.695) time. This method has been implemented to com-
pare our defragmentation algorithm to a
Theorem 4.5 With M -byte heap, defragmen- naive compaction method of sliding the
tation according to the Heap Manager Algo- blocks to one end of the heap, without
rithm(Section 4) to satisfy an allocation request following the buddy block properties similar
for a s-byte block takes O(M s) time. to the general non-buddy allocators.
Discussion A constant increase in heap size 5.1 Defragmentation with 2M -byte
(from M to 2M ) allows the heuristic to oper- Heap
ate polynomially, but its complexity may be un-
suitable for real-time systems. The work that We explored the defragmentation in various
must be done is reasonably bounded, in terms of benchmarks with 2M -byte heap using the algo-
the allocation-request size s, but more research is rithm described in Section 4, which is based on
needed to find heuristics that operate in similar the theorem 4.2. To our surprise we found that
time.
5
none of the benchmarks needed any defragmen- ness of the compaction methods when compared
tation when the heap size is 2M bytes! So having to our defragmentation algorithm, indicate the
twice the maxlive storage, the address-ordered value of localized defragmentation to satisfy a
best-fit buddy allocator is able to avoid any re- single allocation request instead of defragment-
location of the storage. Even some randomly ing the whole heap. If the defragmentation is
generated program traces did not need any relo- needed for very few allocations (according to the
cation with a heap of size 2M bytes. These re- amount of storage relocated as shown in Fig-
sults actually confirm what is known in practice: ure 3, and given the number of relocations as
most allocation algorithms do not need defrag- shown in Figure 5), there is no point in doing
mentation [6]. extra work by compacting the whole heap, ei-
ther in anticipation of satisfying the future allo-
However, our theoretical results show that cation requests without any defragmentation or
there are some programs for which defragmenta- for some other reason.
tion will be problematic, and real-time systems
must be concerned with accomodating worst- 5.3 Minimally Occupied vs Random
case behavior. Block Selection
5.2 Defragmentation with M -byte Our heuristic selects the minimally occupied
Heap block for relocation. In this section we compare
that strategy against what happens when the re-
A 2M -byte heap induced no defragmentation for located block is chose at random and the heap
our benchmarks, so we next experiment wtih an is sufficiently small so as to cause defragmenta-
exact-sized heap of M bytes. Note that with an tion (M bytes). We report results only for those
M -byte heap, there is no guarantee about how benchmarks that needed defragmentation under
much storage is relocated when compared to the those conditions.
optimal. From Figure 3 we see that very few
programs required defragmentation. Among the From Figure 4 we see that out of the 6
programs that needed defragmentation, except Java SPEC benchmark programs which needed
for Jess of size 1 and Javac of size 10, the amount some defragmentation, for 3 programs (check(1),
of storage relocated by our defragmentation al- db(1) and jess(100)) the same amount of reloca-
gorithm for other programs is very insignificant. tion is required. For 2 programs (jess(1) and
But compared to our algorithm which relocates jess(10)), selecting the minimally occupied block
storage only to satisfy a particular request with- is better and for 1 program (javac(10)), selecting
out defragmenting the whole heap, all other com- a random block is better.
paction methods perform badly. The amount of
storage relocated by our defragmentation algo- From Figure 5, 2 Java SPEC benchmark pro-
rithm is summed over all the relocations neces- grams (check(1) and db(1)) needed the same
sary, whereas for the compaction methods the number of relocations while 4 others (jess(1),
amount is only for one compaction which is javac(10), jess(10) and jess(100)) needed a dif-
done when the allocator fails to satisfy an al- ferent number of relocations. For all those pro-
location request for the first time. Among all grams, selecting the minimally occupied block
the compaction methods, only right-first com- is better. Note that even though random selec-
paction performed reasonably well. The other tion needed more relocations for javac(10), the
two methods—left-first compaction and naive amount of storage relocated is less.
compaction—relocated significantly more stor-
age, sometimes close to M bytes. The above results indicate that using the ran-
dom block selection might be a good alternative,
The above results, which showed the weak- and it avoids searching for the minimaly occu-
pied block. We have not as yet determined the
6
Java Benchmarks
100000000
10000000
Bytes (log scale) 1000000 Maxlive
100000 Overall Memory Allocated
10000 Relocation
LeftFirstCompaction
RightFirstCompaction
Compaction
1000
100
10
1
comcprheescsk11
db1
jack1
javac1
mpegajuedsiso11
co rmapyrtersamtscrt1e101
javdabc1100
mpegjaejusedssi1so101000
Programs(size)
Figure 3: Java Test Programs – Amount of Relocation
7
Relocation: Minimal vs Random
10000000
1000000
100000
Bytes (log scale) 10000
1000
100
10
1 check1 db1 jess1 javac10 jess10 jess100
411552 500832 1177768 1377448
Maxlive 559112 774696 638328 2204600 5510056 8974528
Overall Memory Allocated
Relocation: Minimally Occupied 8 8 2355552 8743504 56 56
Relocation: Random Partially Occupied 8 8 120 56
296 1536
968 1024
Program(size)
Figure 4: Minimally Occupied vs Random – Amount of Relocation
8
Number of Relocations: Minimal vs Random
35
30
25
Number of Relocations 20
15
10
5
0 check1 db1 jess1 javac10 jess10 jess100
1 1 8 2 3 3
Number of Relocations: Minimally
Occupied 1 1 27 30 7 4
Number of Relocations: Random
Partially Occupied
Program(size)
Figure 5: Minimally Occupied vs Random – Number of Relocations
9
theoretical time bound when random blocks are References
chosen for relocation.
[1] Dante J. Cannarozzi, Michael P. Plezbert,
6 Conclusions and Future and Ron K. Cytron. Contaminated garbage
Work collection. Proceedings of the ACM SIG-
PLAN ’00 conference on Programming lan-
guage design and implementation, pages 264–
273, 2000.
We presented a tight bound for the storage re- [2] Sharath Reddy Cholleti. Storage allocation
quired for a defragmentation-free buddy alloca- in bounded time. Master’s thesis, Washing-
tor. While the bound could be appropriate for ton University, 2002.
desktop computers, for embedded systems there
is an inflationary factor of log M for the stor- [3] SPEC Corporation. Java SPEC benchmarks.
age required to avoid defragmentation. While a Technical report, SPEC, 1999. Available by
factor of 2 might be tolerable, a factor of log M purchase from SPEC.
is most likely prohibative for embedded applica-
tions. [4] Delvin Defoe, Sharath Cholleti, and Ron K.
Cytron. Heap defragmentation in bounded
We presnted a greedy algorithm that allocates time. Technical report, Washington Univer-
an s-byte block using less than twice the optimal sity, 2004.
relocation, on a 2M -byte (twice maxlive) heap.
Even though our benchmark programs did not [5] Steven M. Donahue, Matthew P. Hampton,
need any relocation with 2M , there is some re- Morgan Deters, Jonathan M. Nye, Ron K.
location involved when heap size is M . The re- Cytron, and Krishna M. Kavi. Storage al-
sults show better performance of a localized de- location for real-time, embedded systems.
fragmentation algorithm, which defragments just In Thomas A. Henzinger and Christoph M.
a small portion of the heap to satisfy a single Kirsch, editors, Embedded Software: Pro-
request, over the conventional compaction algo- ceedings of the First International Workshop,
rithms. pages 131–147. Springer Verlag, 2001.
We compared our greedy algorithm with ran- [6] Mark S. Johnstone and Paul R. Wilson. The
dom selection heuristic to find that our greedy memory fragmentation problem: Solved?
algorithm performed better, as expected. But In International Symposium on Memory
since the overall fragmentation is low, the ran- Mangement, October 1998.
dom heuristic takes less time and might work
well in practice; a theoretical bound on its time [7] Donald E. Knuth. The Art of Computer
complexity is needed. Programming, volume 1: Fundamental Al-
gorithms. Addison-Wesley, Reading, Mas-
As the effectiveness of the localized defragmen- sachusetts, 1973.
tation, by relocation, is established in this paper,
it is a good idea to concentrate on studying such [8] J.M. Robson. Bounds for Some Func-
algorithms instead of defragmenting the entire tions Concerning Dynamic Storage Alloca-
heap. We proposed only one algorithm based on tion. Journal of ACM, 21(3):491–499, July
a heap-like data structure, so further study could 1974.
involve designing and implementing better data
structures and algorithms to improve on the cur- [9] Paul R. Wilson. Uniprocessor garbage collec-
rent ones. tion techniques (Long Version). Submitted to
ACM Computing Surveys, 1994.
10