Purely Functional Data Structures

Chris Okasaki

September 1996 CMU-CS-96-177

School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213

Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy.

Thesis Committee: Peter Lee, Chair Robert Harper Daniel Sleator Robert Tarjan, Princeton University

Copyright © 1996 Chris Okasaki

This research was sponsored by the Advanced Research Projects Agency (ARPA) under Contract No. F19628- 95-C-0050.

The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of ARPA or the U.S. Government.

Keywords: functional programming, data structures, lazy evaluation, amortization

For Maria

Abstract

When a C programmer needs an efficient data structure for a particular prob- lem, he or she can often simply look one up in any of a number of good text- books or handbooks. Unfortunately, programmers in functional languages such as Standard ML or Haskell do not have this luxury. Although some data struc- tures designed for imperative languages such as C can be quite easily adapted to a functional setting, most cannot, usually because they depend in crucial ways on as- signments, which are disallowed, or at least discouraged, in functional languages. To address this imbalance, we describe several techniques for designing functional data structures, and numerous original data structures based on these techniques, including multiple variations of lists, queues, double-ended queues, and heaps, many supporting more exotic features such as random access or efficient catena- tion.

In addition, we expose the fundamental role of lazy evaluation in amortized functional data structures. Traditional methods of amortization break down when old versions of a data structure, not just the most recent, are available for further processing. This property is known as persistence, and is taken for granted in functional languages. On the surface, persistence and amortization appear to be incompatible, but we show how lazy evaluation can be used to resolve this conflict, yielding amortized data structures that are efficient even when used persistently. Turning this relationship between lazy evaluation and amortization around, the notion of amortization also provides the first practical techniques for analyzing the time requirements of non-trivial lazy programs.

Finally, our data structures offer numerous hints to programming language de- signers, illustrating the utility of combining strict and lazy evaluation in a single language, and providing non-trivial examples using polymorphic recursion and higher-order, recursive modules.

Acknowledgments

Without the faith and support of my advisor, Peter Lee, I probably wouldn’t even be a graduate student, much less a graduate student on the eve of finishing. Thanks for giving me the freedom to turn my hobby into a thesis.

I am continually amazed by the global nature of modern research and how e- mail allows me to interact as easily with colleagues in Aarhus, Denmark and York, England as with fellow students down the hall. In the case of one such colleague, Gerth Brodal, we have co-authored a paper without ever having met. In fact, sorry Gerth, but I don’t even know how to pronounce your name!

I was extremely fortunate to have had excellent English teachers in high school. Lori Huenink deserves special recognition; her writing and public speaking classes are undoubtedly the most valuable classes I have ever taken, in any subject. In the same vein, I was lucky enough to read my wife’s copy of Lyn Dupré’s BUGS in Writing just as I was starting my thesis. If your career involves writing in any form, you owe it to yourself to buy a copy of this book.

Thanks to Maria and Colin for always reminding me that there is more to life than grad school. And to Amy and Mark, for uncountable dinners and other out- ings. We’ll miss you. Special thanks to Amy for reading a draft of this thesis.

And to my parents: who would have thought on that first day of kindergarten that I'd still be in school 24 years later?

Contents

Abstract Acknowledgments

1 Introduction

1.1 Functional vs. Imperative Data Structures

1.2 Strict vs. Lazy Evaluation

1.3. Contributions

1.4 Source Language

1.5 Terminology

1.6 Overview

2 Lazy Evaluation and $-Notation

2.1 Streams

3 Amortization and Persistence via Lazy Evaluation 3.1 Traditional Amortization

3.1.1 Example: Queues

3.2 Persistence: The Problem of Multiple Futures

3.2.1 Execution Traces and Logical Time 3.3. Reconciling Amortization and Persistence

3.3.1. The Role of Lazy Evaluation

3.3.2 A Framework for Analyzing Lazy Data Structures

vii

6

3.4 The Banker’s Method Justifying the Banker’s Method 3.4.2 Example: Queues The Physicist’s Method Example: Queues

3.5.2 Example: Bottom-Up Mergesort with Sharing 3.6 Related Work

Eliminating Amortization

Real-Time Queues

Bottom-Up Mergesort with Sharing

Lazy Rebuilding Batched Rebuilding Global Rebuilding Lazy Rebuilding Double-Ended Queues Output-restricted Deques 5.4.2. Banker’s Deques 5.4.3. Real-Time Deques

Numerical Representations

Positional Number Systems Binary Representations Binary Random-Access Lists 6.2.2 Binomial Heaps Segmented Binary Numbers

Segmented Binomial Random-Access Lists and Heaps

CONTENTS

CONTENTS i

64. Skew Binary Numbers: <.. 4-4 6 & geo Y Se ae ee ee RE EM es 6.4.1. Skew Binary Random-Access Lists ................0.4. 6:4.2°~- Skew Binomial Heaps: «42.0 si eee GB ae ee Back WY ES SR

G25) IDISCUSSION. 4.25. z, G GW diode GSS Se Ee SA: Cea cw I ee Ea ae

66> Related Work. <6. 44.502 4 4054 4 aR So ed oe ER oh ee eRe

7 Data-Structural Bootstrapping

7.1 Structural Decomposition... 2... 2.0.2... 0000 ee eee 7.1.1. Non-Uniform Recursion and Standard ML ............... eebi2s. MOMGWUCS ReVISIICd. oi. o78. Osea. Sd, es Sew Se eed, ew aR ea Se

dD. sirucniral ADStrACHON <.<.k-< ature me abe NS A Gow QAR we Ma ees 721A Lists With Efficient Catenation =. 2 6. <n. ¢ 6 dea a Soe SS ee Se 7.2.2. Heaps With Efficient Merging ..................00.-.4

do Related Works 6 2.205 bop Ss ee Pe ee ES a ag BBR ae SE ad

8 Implicit Recursive Slowdown

81. Recursive Slowdown: 4 a4 4-4 44 aoe HSM RE Ae See EE ee SS 8.2 Implicit Recursive Slowdown... ..........2 00 eee eee ee ee 8.3 Supportinga Decrement Function ................2 000004 - SA Queues and: DEques: 6 gg ce lx ty Sa a! Geet ange eee me B ahom % Saale wa ee Die 8.5 Catenable Double-Ended Queues... ......0..0. 0.2.2.0... ee ee 6:0. Related Work: 3.20.5 n-ne eoee Bek a ee a ER pes Se ee §

9 Conclusions

9.1. Functional Programming 2... :.. 2c ed ee ee eee ee eee 02. . Persistent Data Structures 2-229 .c2 4 A, ees Bw Reread ok Se Oe Es 9.3. Programming Language Design ..............2.0-.0 0202004 4. “Open Problems, % 4.4 4.<o.4) 6a tet tee ye ale Daal te Yeo! 264 BS eG, Git

A The Definition of Lazy Evaluation in Standard ML Ps! COVINA Ee we Songrk lee, ura ool ay bt ge Belo ce ah tee ac WA ay SO Ale he We Bd a By teed, a ty AG. sStalG Semantics e223) 4 Geet at ta A Se ee dad ee Sle ote Beg As. Dynamie Semantics...) os % oe be eek eR Pe OEE eed ee a Pe eS

AsA. SRECUTSION fi 4.08, 28 scene Shoe be eae alee 6 8 ee ee be eee A

xii CONTENTS

Bibliography 137

Index 147

Chapter 1

Introduction

Efficient data structures have been studied extensively for over thirty years, resulting in a vast literature from which the knowledgeable programmer can extract efficient solutions to a stun- ning variety of problems. Much of this literature purports to be language-independent, but unfortunately it is language-independent only in the sense of Henry Ford: Programmers can use any language they want, as long as it’s imperative.' Only a small fraction of existing data structures are suitable for implementation in functional languages, such as Standard ML or Haskell. This thesis addresses this imbalance by specifically considering the design and analysis of functional data structures.

1.1 Functional vs. Imperative Data Structures

The methodological benefits of functional languages are well known [Bac78, Hug89, HJ94], but still the vast majority of programs are written in imperative languages such as C. This apparent contradiction is easily explained by the fact that functional languages have historically been slower than their more traditional cousins, but this gap is narrowing. Impressive advances have been made across a wide front, from basic compiler technology to sophisticated analyses and optimizations. However, there is one aspect of functional programming that no amount of cleverness on the part of the compiler writer is likely to mitigate the use of inferior or inappropriate data structures. Unfortunately, the existing literature has relatively little advice to offer on this subject.

Why should functional data structures be any more difficult to design and implement than imperative ones? There are two basic problems. First, from the point of view of designing and implementing efficient data structures, functional programming’s stricture against destructive

"Henry Ford once said of the available colors for his Model T automobile, “[Customers] can have any color they want, as long as it’s black.”

2 Introduction

updates (assignments) is a staggering handicap, tantamount to confiscating a master chef’s knives. Like knives, destructive updates can be dangerous when misused, but tremendously effective when used properly. Imperative data structures often rely on assignments in crucial ways, and so different solutions must be found for functional programs.

The second difficulty is that functional data structures are expected to be more flexible than their imperative counterparts. In particular, when we update an imperative data structure we typically accept that the old version of the data structure will no longer be available, but, when we update a functional data structure, we expect that both the old and new versions of the data structure will be available for further processing. A data structure that supports multiple versions is called persistent while a data structure that allows only a single version at a time is called ephemeral [DSST89]. Functional programming languages have the curious property that all data structures are automatically persistent. Imperative data structures are typically ephemeral, but when a persistent data structure is required, imperative programmers are not surprised if the persistent data structure is more complicated and perhaps even asymptotically less efficient than an equivalent ephemeral data structure.

Furthermore, theoreticians have established lower bounds suggesting that functional pro- gramming languages may be fundamentally less efficient than imperative languages in some situations [BAG92, Pip96]. In spite of all these points, this thesis shows that it is often possible to devise functional data structures that are asymptotically as efficient as the best imperative solutions.

1.2 Strict vs. Lazy Evaluation

Most (sequential) functional programming languages can be classified as either strict or lazy, according to their order of evaluation. Which is superior is a topic debated with religious fervor by functional programmers. The difference between the two evaluation orders is most apparent in their treatment of arguments to functions. In strict languages, the arguments to a function are evaluated before the body of the function. In lazy languages, arguments are evaluated in a demand-driven fashion; they are initially passed in unevaluated form and are evaluated only when (and if!) the computation needs the results to continue. Furthermore, once a given argument is evaluated, the value of that argument is cached so that if it is ever needed again, it can be looked up rather than recomputed. This caching is known as memoization [Mic68].

Each evaluation order has its advantages and disadvantages, but strict evaluation is clearly superior in at least one area: ease of reasoning about asymptotic complexity. In strict lan- guages, exactly which subexpressions will be evaluated, and when, is for the most part syn- tactically apparent. Thus, reasoning about the running time of a given program is relatively straightforward. However, in lazy languages, even experts frequently have difficulty predicting when, or even if, a given subexpression will be evaluated. Programmers in such languages

1.3 Contributions 3

Running Times of Supported Functions

banker’s queues snoc/head/tail: O(1)

physicist’s queues snoclhead/tail: O(1)

real-time queues snoc/head/tail: O(1)1

bootstrapped queues head: O(1)', snoc/tail: O(log* n) implicit queues snocl/head/tail: O(1)

banker’s deques cons/head/tail/snoc/last/init: O(1) real-time deques cons/head/tail/snoc/last/init: O(1)t implicit deques cons/head/tail/snoc/last/init: O(1)

catenable lists cons/snoc/head/tail/++: O(1) simple catenable deques cons/head/tail/snoc/last/init: O(1), +: O(log n) catenable deques cons/head/tail/snoc/last/init/+: O(1)

skew-binary random-access lists | cons/head/tail: O(1)", lookup/update : O(log n) skew binomial heaps insert: O(1)", merge/findMin/deleteMin : O(log n insert/mergelfindMin: O(1)', deleteMin: O(log n) sortable collections add: O(log n), sort: O(n) add: O(log n)*, sort: O(n)t

Worst-case running times marked with {. All other running times are amortized.

Table 1.1: Summary of Implementations

are often reduced to pretending the language is actually strict to make even gross estimates of running time!

Both evaluation orders have implications for the design and analysis of data structures. As we will see in Chapters 3 and 4, strict languages can describe worst-case data structures, but not amortized ones, and lazy languages can describe amortized data structures, but not worst-case ones. To be able to describe both kinds of data structures, we need a programming language that supports both evaluation orders. Fortunately, combining strict and lazy evaluation in a single language is not difficult. Chapter 2 describes $-notation a convenient way of adding lazy evaluation to an otherwise strict language (in this case, Standard ML).

1.3 Contributions

This thesis makes contributions in three major areas:

e Functional programming. Besides developing a suite of efficient data structures that are useful in their own right (see Table 1.1), we also describe general approaches to

4 Introduction

designing and analyzing functional data structures, including powerful new techniques for reasoning about the running time of lazy programs.

e Persistent data structures. Until this research, it was widely believed that amortization was incompatible with persistence [DST94, Ram92]. However, we show that memoiza- tion, in the form of lazy evaluation, is the key to reconciling the two. Furthermore, as noted by Kaplan and Tarjan [KT96b], functional programming is a convenient medium for developing new persistent data structures, even when the data structure will eventu- ally be implemented in an imperative language. The data structures and techniques in this thesis can easily be adapted to imperative languages for those situations when an imperative programmer needs a persistent data structure.

e Programming language design. Functional programmers have long debated the relative merits of strict and lazy evaluation. This thesis shows that both are algorithmically im- portant and suggests that the ideal functional language should seamlessly integrate both. As a modest step in this direction, we propose $-notation, which allows the use of lazy evaluation in a strict language with a minimum of syntactic overhead.

1.4 Source Language

All source code will be presented in Standard ML [MTH90], extended with primitives for lazy evaluation. However, the algorithms can all easily be translated into any other functional language supporting both strict and lazy evaluation. Programmers in functional languages that are either entirely strict or entirely lazy will be able to use some, but not all, of the data structures in this thesis.

In Chapters 7 and 8, we will encounter several recursive data structures that are difficult to describe cleanly in Standard ML because of the language’s restrictions against certain sophisti- cated and difficult-to-implement forms of recursion, such as polymorphic recursion and recur- sive modules. When this occurs, we will first sacrifice executability for clarity and describe the data structures using ML-like pseudo-code incorporating the desired forms of recursion. Then, we will show how to convert the given implementations to legal Standard ML. These examples should be regarded as challenges to the language design community to provide a programming language capable of economically describing the appropriate abstractions.

1.5 Terminology

Any discussion of data structures is fraught with the potential for confusion, because the term data structure has at least four distinct, but related, meanings.

1.6 Overview AS)

e An abstract data type (that is, a type and a collection of functions on that type). We will refer to this as an abstraction.

e A concrete realization of an abstract data type. We will refer to this as an implementa- tion, but note that an implementation need not be actualized as code a concrete design is sufficient.

e An instance of a data type, such as a particular list or tree. We will refer to such an instance generically as an object or a version. However, particular data types typically have their own nomenclature. For example, we will refer to stack or queue objects simply as stacks or queues.

e A unique identity that is invariant under updates. For example, in a stack-based in- terpreter, we often speak informally about “the stack” as if there were only one stack, rather than different versions at different times. We will refer to this identity as a persis- tent identity. This issue mainly arises in the context of persistent data structures; when we speak of different versions of the same data structure, we mean that the different versions share a common persistent identity.

Roughly speaking, abstractions correspond to signatures in Standard ML, implementations to structures or functors, and objects or versions to values. There is no good analogue for persistent identities in Standard ML.”

The term operation is similarly overloaded, meaning both the functions supplied by an abstract data type and applications of those functions. We reserve the term operation for the latter meaning, and use the terms operator or function for the former.

1.6 Overview

This thesis is structured in two parts. The first part (Chapters 2-4) concerns algorithmic aspects of lazy evaluation. Chapter 2 sets the stage by briefly reviewing the basic concepts of lazy evaluation and introducing $-notation.

Chapter 3 is the foundation upon which the rest of the thesis is built. It describes the mediating role lazy evaluation plays in combining amortization and persistence, and gives two methods for analyzing the amortized cost of data structures implemented with lazy evaluation.

Chapter 4 illustrates the power of combining strict and lazy evaluation in a single language. It describes how one can often derive a worst-case data structure from an amortized data struc- ture by systematically scheduling the premature execution of lazy components.

>The persistent identity of an ephemeral data structure can be reified as a reference cell, but this is insufficient for modelling the persistent identity of a persistent data structure.

6 Introduction

The second part of the thesis (Chapters 5—8) concerns the design of functional data struc- tures. Rather than cataloguing efficient data structures for every purpose (a hopeless task!), we instead concentrate on a handful of general techniques for designing efficient functional data structures and illustrate each technique with one or more implementations of fundamental ab- stractions such as priority queues, random-access structures, and various flavors of sequences.

Chapter 5 describes lazy rebuilding, a lazy variant of global rebuilding [Ove83]. Lazy re- building is significantly simpler than global rebuilding, but yields amortized rather than worst- case bounds. By combining lazy rebuilding with the scheduling techniques of Chapter 4, the worst-case bounds can be recovered.

Chapter 6 explores numerical representations, implementations designed in analogy to rep- resentations of numbers (typically binary numbers). In this model, designing efficient insertion and deletion routines corresponds to choosing variants of binary numbers in which adding or subtracting one take constant time.

Chapter 7 examines data-structural bootstrapping [Buc93]. Data-structural bootstrapping comes in two flavors: structural decomposition, in which unbounded solutions are bootstrapped from bounded solutions, and structural abstraction, in which efficient solutions are boot- strapped from inefficient solutions.

Chapter 8 describes implicit recursive slowdown, a lazy variant of the recursive-slowdown technique of Kaplan and Tarjan [KT95]. As with lazy rebuilding, implicit recursive slowdown is significantly simpler than recursive slowdown, but yields amortized rather than worst-case bounds. Again, we can recover the worst-case bounds using scheduling.

Finally, Chapter 9 concludes by summarizing the implications of this work on functional programming, on persistent data structures, and on programming language design, and by describing some of the open problems related to this thesis.

Chapter 2

Lazy Evaluation and $-Notation

Lazy evaluation is an evaluation strategy employed by many purely functional programming languages, such as Haskell [Ht 92]. This strategy has two essential properties. First, the evalu- ation of a given expression is delayed, or suspended, until its result is needed. Second, the first time a suspended expression is evaluated, the result is memoized (i.e., cached) so that the next time it is needed, it can be looked up rather than recomputed.

Supporting lazy evaluation in a strict language such as Standard ML requires two primi- tives: one to suspend the evaluation of an expression and one to resume the evaluation of a suspended expression (and memoize the result). These primitives are often called delay and force. For example, Standard ML of New Jersey offers the following primitives for lazy eval- uation:

type a susp val delay : (unit a) a susp val force : a susp > a

These primitives are sufficient to encode all the algorithms in this thesis. However, program- ming with these primitives can be rather inconvenient. For instance, to suspend the evaluation of some expression e, one writes delay (fn () => e). Depending on the use of whitespace, this introduces an overhead of 13-17 characters! Although acceptable when only a few expressions are to be suspended, this overhead quickly becomes intolerable when many expressions must be delayed.

To make suspending an expression as syntactically lightweight as possible, we instead use $-notation to suspend the evaluation of some expression e, we simply write $e. $e is called a suspension and has type 7 susp, where 7 is the type of e. The scope of the $ operator extends as far to the right as possible. Thus, for example, $f x parses as $(f x) rather than ($f) x and $r+y parses as $(r+y) rather than ($z)+y. Note that $e is itself an expression and can be suspended by writing $$e, yielding a nested suspension of type 7 susp susp.

8 Lazy Evaluation and $-Notation

If s is a suspension of type 7 susp, then force s evaluates and memoizes the contents of s and returns the resulting value of type 7. However, explicitly forcing a suspension with a force operation can also be inconvenient. In particular, it often interacts poorly with pattern matching, requiring a single case expression to be broken into two or more nested case ex- pressions, interspersed with force operations. To avoid this problem, we integrate $-notation with pattern matching. Matching a suspension against a pattern of the form $p first forces the suspension and then matches the result against p. At times, an explicit force operator is still useful. However, it can now be defined in terms of $ patterns.

fun force ($x) = x

To compare the two notations, consider the standard take function, which extracts the first n elements of a stream. Streams are defined as follows:

datatype a StreamCell = Nil | Cons of a x a Stream withtype a Stream = a StreamCell susp

Using delay and force, take would be written

fun take (n, s) = delay (fn () => case n of 0=> Nil | _ => case force s of Nil > Nil | Cons (a, s’) = Cons (2, take (n—1, s’)))

In contrast, using $-notation, take can be written more concisely as

fun take (n, s) = $cease (n, s) of (O, _) => Nil | (. $Nil) > Nil | (_, $Cons (2, s’)) = Cons (2, take (n—1, s’))

In fact, it is tempting to write take even more concisely as

fun take (0, _) = $Nil | take (_, $Nil) = $Nil | take (n, $Cons (2, s)) = $Cons (z, take (n—1, s))

However, this third implementation is not equivalent to the first two. In particular, it forces its second argument when take is applied, rather than when the resulting stream is forced.

The syntax and semantics of $-notation are formally defined in Appendix A.

2.1 Streams 9

2.1 Streams

As an extended example of lazy evaluation and $-notation in Standard ML, we next develop a small streams package. These streams will also be used by several of the data structures in subsequent chapters.

Streams (also known as lazy lists) are very similar to ordinary lists, except that every cell is systematically suspended. The type of streams is

datatype a StreamCell = Nil | Cons of a x a Stream withtype a Stream = a StreamCell susp

A simple stream containing the elements 1, 2, and 3 could be written

$Cons (1, $Cons (2, $Cons (3, $Nil)))

It is illuminating to contrast streams with simple suspended lists of type a list susp. The computations represented by the latter type are inherently monolithic once begun by forcing the suspended list, they run to completion. The computations represented by streams, on the other hand, are often incremental forcing a stream executes only enough of the computation to produce the outermost cell and suspends the rest. This behavior is common among datatypes such as streams that contain nested suspensions.

To see this difference in behavior more clearly, consider the append function, written s + t. On suspended lists, this function might be written

fun s + ¢ = $(force s @ force ?)

Once begun, this function forces both its arguments and then appends the two lists, producing the entire result. Hence, this function is monolithic. On streams, the append function is written

fun s + ¢ =$case s of $Nil = force ¢ | $Cons (z, s’) = Cons (2, s’ + t)

Once begun, this function forces the first cell of s (by matching against a $ pattern). If this cell is Nil, then the first cell of the result is the first cell of t, so the function forces t. Otherwise, the function constructs the first cell of the result from the first element of s and this is the key point the suspension that will eventually calculate the rest of the appended list. Hence, this function is incremental. The take function described earlier is similarly incremental.

However, consider the function to drop the first n elements of a stream.

fun drop (n, s) = let fun drop’ (0, s’) = force s’ | drop’ (n, $Nil) = Nil

10 Lazy Evaluation and $-Notation

| drop’ (n, $Cons (x, s’)) = drop’ (n—1, s’) in $drop’ (n, s) end

This function is monolithic because the recursive calls to drop’ are never delayed calcu- lating the first cell of the result requires executing the entire drop function. Another common monolithic stream function is reverse.

fun reverse s = let fun reverse’ ($Nil, r) = r | reverse’ ($Cons (z, s), r) = reverse’ (s, Cons (x, $r)) in $reverse’ (s, Nil) end

Here the recursive calls to reverse’ are never delayed, but note that each recursive call creates a new suspension of the form $r. It might seem then that reverse does not in fact do all of its work at once. However, suspensions such as these, whose bodies are manifestly values (i.e., composed entirely of constructors and variables, with no function applications), are called trivial. A good compiler would create these suspensions in already-memoized form, but even if the compiler does not perform this optimization, trivial suspensions always evaluate in O(1) time.

Although monolithic stream functions such as drop and reverse are common, incremental functions such as + and take are the raison d’étre of streams. Each suspension carries a small but significant overhead, so for maximum efficiency laziness should be used only when there is a good reason to do so. If the only uses of lazy lists in a given application are monolithic, then that application should use simple suspended lists rather than streams.

Figure 2.1 summarizes these stream functions as a Standard ML module. Note that the type of streams is defined using Standard ML’s withtype construction, but that older versions of Standard ML do not allow withtype declarations in signatures. This feature will be sup- ported in future versions of Standard ML, but if your compiler does not allow it, then a sim- ple workaround is to delete the Stream type and replace every occurrence of 7 Stream with T StreamCell susp. By including the StreamCell datatype in the signature, we have delib- erately chosen to expose the internal representation in order to support pattern matching on streams.

2.2 Historical Notes

Lazy Evaluation Wadsworth [Wad71] first proposed lazy evaluation as an optimization of normal-order reduction in the lambda calculus. Vuillemin [Vui74] later showed that, under certain restricted conditions, lazy evaluation is an optimal evaluation strategy. The formal semantics of lazy evaluation has been studied extensively [Jos89, Lau93, OLT94, AFM' 95].

2.2 Historical Notes 11

signature STREAM =

sig datatype a StreamCell = Nil | Cons of a x a Stream withtype a Stream = a StreamCell susp

val + : a Stream x a Stream a Stream (* stream append *) valtake : int x a@ Stream > a Stream valdrop : int x a Stream a Stream val reverse : a Stream a Stream end

structure Stream : STREAM =

sig datatype a StreamCell = Nil | Cons of a x a Stream withtype a Stream = a StreamCell susp

fun s + ¢ = $case s of $Nil => force ¢ | $Cons (a, s’) = Cons (2, s’ + t) fun take (n, s) = $case (n, s) of (0, _) > Nil | (_, $Nil) > Nil | (, $Cons (a, s’)) = Cons (, take (n—1, s’)) fun drop (n, s) = let fun drop’ (0, $c) = ¢ | drop’ (n, $Nil) = Nil | drop’ (n, $Cons (a, s’)) = drop’ (n—1, s’) in $drop’ (n, s) end fun reverse s = let fun reverse’ ($Nil, r) = r | reverse’ ($Cons (2, s), 7) = reverse’ (s, Cons (x, $r)) in $reverse’ (s, Nil) end

Figure 2.1: A small streams package.

Streams Landin introduced streams in [Lan65], but without memoization. Friedman and Wise [FW76] and Henderson and Morris [HM76] extended Landin’s streams with memoiza- tion.

Memoization Michie [Mic68] coined the term memoization to denote the augmentation of functions with a cache of argument-result pairs. (The argument field is dropped when memoiz- ing suspensions by regarding suspensions as nullary functions.) Hughes [Hug85] later applied

12 Lazy Evaluation and $-Notation memoization, in the original sense of Michie, to functional programs.

Algorithmics Both components of lazy evaluation delaying computations and memoizing the results have a long history in algorithm design, although not always in combination. The idea of delaying the execution of potentially expensive computations (often deletions) is used to good effect in hash tables [WV86], priority queues [ST86b, FT87], and search trees [DSST89]. Memoization, on the other hand, is the basic principle of such techniques as dynamic program- ming [Bel57] and path compression [HU73, TvL84].

Syntax for Lazy Evaluation Early versions of CAML [W*90], a close cousin of Standard ML, offered support for lazy evaluation similar to the $-notation proposed here. Rather than providing a single lazy constructor, however, CAML allowed any data constructor to be tagged as lazy, after which all applications of the constructor would be evaluated lazily. Although this is more flexible than $-notation, it also leads to programs that are significantly harder to read. With $-notation, it is syntactically apparent which subexpressions are to be evaluated strictly and which are to be evaluated lazily, but in CAML, this information can only be determined by referring back to the type declarations.

Chapter 3

Amortization and Persistence via Lazy Evaluation

Over the past fifteen years, amortization has become a powerful tool in the design and analysis of data structures. Implementations with good amortized bounds are often simpler and faster than implementations with equivalent worst-case bounds. Unfortunately, standard techniques for amortization apply only to ephemeral data structures, and so are unsuitable for designing or analyzing functional data structures, which are automatically persistent.

In this chapter, we review the two traditional techniques for analyzing amortized data struc- tures the banker’s method and the physicist’s method and show where they break down for persistent data structures. Then, we demonstrate how lazy evaluation can mediate the con- flict between amortization and persistence. Finally, we adapt the banker’s and physicist’s meth- ods to analyze lazy amortized data structures.

The resulting techniques are both the first techniques for designing and analyzing persis- tent amortized data structures and the first practical techniques for analyzing non-trivial lazy programs.

3.1 Traditional Amortization

The notion of amortization arises from the following observation. Given a sequence of oper- ations, we may wish to know the running time of the entire sequence, but not care about the running time of any individual operation. For instance, given a sequence of n operations, we may wish to bound the total running time of the sequence by O(n) without insisting that each individual operation run in O(1) time. We might be satisfied if a few operations run in O(log 7) or even O(n) time, provided the total cost of the sequence is only O(n). This freedom opens

14 Amortization and Persistence via Lazy Evaluation

up a wide design space of possible solutions, and often yields new solutions that are simpler and faster than worst-case solutions with equivalent bounds. In fact, for some problems, such as the union-find problem [TvL84], there are amortized solutions that are asymptotically faster than any possible worst-case solution (assuming certain modest restrictions) [Blu86].

To prove an amortized bound, one defines the amortized cost of each operation and then proves that, for any sequence of operations, the total amortized cost of the operations is an upper bound on the total actual cost, Le.,

m m ae a; = ye i i=l i=l

where a; is the amortized cost of operation 7, ¢; is the actual cost of operation 7, and m is the total number of operations. Usually, in fact, one proves a slightly stronger result: that at any intermediate stage in a sequence of operations, the accumulated amortized cost is an upper bound on the accumulated actual cost, Le.,

j j rea ae for any 7. The difference between the accumulated amortized costs and the accumulated actual

costs is called the accumulated savings. Thus, the accumulated amortized costs are an upper bound on the accumulated actual costs whenever the accumulated savings is non-negative.

Amortization allows for occasional operations to have actual costs that exceed their amor- tized costs. Such operations are called expensive. Operations whose actual costs are less than their amortized costs are called cheap. Expensive operations decrease the accumulated savings and cheap operations increase it. The key to proving amortized bounds is to show that expen- sive operations occur only when the accumulated savings are sufficient to cover the cost, since otherwise the accumulated savings would become negative.

Tarjan [Tar85] describes two techniques for analyzing ephemeral amortized data structures: the banker’s method and the physicist’s method. In the banker’s method, the accumulated sav- ings are represented as credits that are associated with individual locations in the data structure. These credits are used to pay for future accesses to these locations. The amortized cost of any operation is defined to be the actual cost of the operation plus the credits allocated by the operation minus the credits spent by the operation, 1.e.,

aj =i +G—-—G

where c; is the number of credits allocated by operation 7, and ¢; is the number of credits spent by operation 7. Every credit must be allocated before it is spent, and no credit may be spent more than once. Therefore, > c¢; > >°>€;, which in turn guarantees that }7a; > >°t;, as desired. Proofs using the banker’s method typically define a credit invariant that regulates

3.1 Traditional Amortization 15

the distribution of credits in such a way that, whenever an expensive operation might occur, sufficient credits have been allocated in the right locations to cover its cost.

In the physicist’s method, one describes a function © that maps each object d to a real number called the potential of d. The function ® is typically chosen so that the potential is initially zero and is always non-negative. Then, the potential represents a lower bound on the accumulated savings.

Let d; be the output of operation 7 and the input of operation z + 1. Then, the amortized cost of operation z is defined to be the actual cost plus the change in potential between d;_; and d;, 1.€.,

ay = by + @(d;) = (d;_1)

The accumulated actual costs of the sequence of operations are

Mit; = Dhi(ai + O(d;1) ©(d))) = i ai dia1(®(dj-1) = ®(d;)) = Vi ai + O(do) O(d;)

Sums such as }>(®(d;_1) ®(d;)), where alternating positive and negative terms cancel each other out, are called telescoping series. Provided ® is chosen in such a way that ®(d) is zero and (d;) is non-negative, then ®(d;) > ®(do) and > a; > ¥> t;, so the accumulated amortized costs are an upper bound on the accumulated actual costs, as desired.

Remark: This is a somewhat simplified view of the physicist’s method. In real analyses, one often encounters situations that are difficult to fit into the framework as described. For example, what about functions that take or return more than one object? However, this simplified view suffices to illustrate the relevant issues. ©

Clearly, the two methods are very similar. We can convert the banker’s method to the physi- cist’s method by ignoring locations and taking the potential to be the total number of credits in the object, as indicated by the credit invariant. Similarly, we can convert the physicist’s method to the banker’s method by converting potential to credits, and placing all credits on the root. It is perhaps surprising that the knowledge of locations in the banker’s method offers no extra power, but the two methods are in fact equivalent [Tar85, Sch92]. The physicist’s method is usually simpler, but it is occasionally convenient to take locations into account.

Note that both credits and potential are analysis tools only; neither actually appears in the program text (except maybe in comments). 3.1.1 Example: Queues

We next illustrate the banker’s and physicist’s methods by analyzing a simple functional im- plementation of the queue abstraction, as specified by the signature in Figure 3.1.

16 Amortization and Persistence via Lazy Evaluation

signature QUEUE = sig type a Queue exception EMPTY

valempty : a Queue

val isEmpty : a Queue bool

val snoc : @ Queue X a a Queue

val head : @ Queue > a (* raises EMPTY if queue is empty *)

val tail : @ Queue a Queue (* raises EMPTY if queue is empty *) end

Figure 3.1: Signature for queues.

(Etymological note: snoc is cons spelled backward and means “cons on the right”.)

A common representation for purely functional queues [Gri81, HM81, Bur82] is as a pair of lists, / and R, where fF contains the front elements of the queue in the correct order and R contains the rear elements of the queue in reverse order. For example, a queue containing the integers 1...6 might be represented by the lists #’ = [1,2,3] and &k =[6,5,4]. This representation is described by the following datatype:

datatype a Queue = Queue of {F: a list, R : a list}

In this representation, the head of the queue is the first element of F’, so head and tail return and remove this element, respectively.

fun head (Queue {F=2::f,R=r})=2 fun tail (Queue {F= 2 :: f, R= r}) = Queue {F=/,R=r}

Remark: To avoid distracting the reader with minor details, we will commonly ignore error cases when presenting code fragments. For example, the above code fragments do not describe the behavior of head or tail on empty queues. We will always include the error cases when presenting complete implementations. o7

Now, the last element of the queue is the first element of , so snoc simply adds a new element at the head of R.

fun snoc (Queue {F=f,R =r}, 7) =Queue {F=f,R=2:: r}

3.1 Traditional Amortization 17

Elements are added to # and removed from fF’, so they must somehow migrate from one list to the other. This is accomplished by reversing # and installing the result as the new fF’ whenever F would otherwise become empty, simultaneously setting the new F& to []. The goal is to maintain the invariant that f’ is empty only if ? is also empty (1.e., the entire queue is empty). Note that if /’ were empty when Ff was not, then the first element of the queue would be the last element of 2, which would take O(n) time to access. By maintaining this invariant, we guarantee that head can always find the first element in O(1) time.

snoc and tazl must now detect those cases that would otherwise result in a violation of the invariant, and change their behavior accordingly.

fun snoc (Queue {F=[],...}, 7) = Queue {F = [x], R=[]}

| snoc (Queue {F=f, R=r}, x) = Queue {F=f/,R=2:: r} fun tail (Queue {F = [z], R= r}) = Queue {F=rev r, R=[]}

| tail (Queue {F= 2 :: f, R= r})= Queue {F=/,R=r}

Note the use of the record wildcard (...) in the first clause of snoc. This is Standard ML pattern-matching notation meaning “the remaining fields of this record are irrelevant’. In this case, the F field is irrelevant because we know by the invariant that if Fis [], then so is R.

A cleaner way to write these functions is to consolidate the invariant-maintenance duties of snoc and tail into a single pseudo-constructor. Pseudo-constructors, sometimes called smart constructors [Ada93], are functions that replace ordinary constructors in the construction of data, but that check and enforce an invariant. In this case, the pseudo-constructor queue re- places the ordinary constructor Queue, but guarantees that F' is empty only if R is also empty.

fun queue {F =[], R= r} = Queue {F=rev r, R=[]} | queue {F=/,R=r}=Queue {F=/,R=r}

fun snoc (Queue {F=/,R =r}, 7) =queue {F=/,R=2:: r} fun tail (Queue {F=2 :: f, R=r})=queue {F=f/,R=r}

The complete code for this implementation is shown in Figure 3.2. Every function except tail takes O(1) worst-case time, but tail takes O(n) worst-case time. However, we can show that snoc and tail each take only O(1) amortized time using either the banker’s method or the physicist’s method.

Using the banker’s method, we maintain a credit invariant that the rear list always contains a number of credits equal to its length. Every snoc into a non-empty queue takes one actual step and allocates a credit to the new element of the rear list, for an amortized cost of two. Every tail that does not reverse the rear list takes one actual step and neither allocates nor spends any credits, for an amortized cost of one. Finally, every taz! that does reverse the rear list takes m + 1 actual steps, where m is the length of the rear list, and spends the m credits contained by that list, for an amortized cost of m+ 1—m=l.

18 Amortization and Persistence via Lazy Evaluation

structure BatchedQueue : QUEUE = struct datatype a Queue = Queue of {F: a list, R : @ list} (* Invariant: F is empty only if R is also empty *)

exception EMPTY

val empty = Queue {F=[], R=[]} fun isEmpty (Queue {F = f, R=r}) =null f

fun queue {F =[], R=7r) = Queue {F=rev r, R=[]} | queue g = Queue ¢

fun snoc (Queue {F=/, R=r), 2) =queue {F=/,R=2:: r}

fun head (Queue {F =[], ... }) = raise EMPTY | head (Queue {F=2::f,...})=2 fun tail (Queue {F=[], ... }) = raise EMPTY | tail (Queue {F=2 :: f, R=r})=queue {F=/,R=r}

end

Figure 3.2: A common implementation of purely functional queues [Gri81, HM81, Bur82].

Using the physicist’s method, we define the potential function ® to be the length of the rear list. Then every snoc into a non-empty queue takes one actual step and increases the potential by one, for an amortized cost of two. Every tail that does not reverse the rear list takes one actual step and leaves the potential unchanged, for an amortized cost of one. Finally, every tail that does reverse the rear list takes m-+ | actual steps and sets the new rear list to [ ], decreasing the potential by m, for an amortized cost of m+ 1—m=1.

In this simple example, the proofs are virtually identical. Even so, the physicist’s method is slightly simpler for the following reason. Using the banker’s method, we must first choose a credit invariant, and then decide for each function when to allocate or spend credits. The credit invariant provides guidance in this decision, but does not make it automatic. For instance, should snoc allocate one credit and spend none, or allocate two credits and spend one? The net effect is the same, so this freedom is just one more potential source of confusion. On the other hand, using the physicist’s method, we have only one decision to make the choice of the potential function. After that, the analysis is mere calculation, with no more freedom of choice.

3.2 Persistence: The Problem of Multiple Futures 19

3.2 Persistence: The Problem of Multiple Futures

In the above analyses, we implicitly assumed that queues were used ephemerally (i.e., in a single-threaded fashion). What happens if we try to use these queues persistently?

Let g be the result of inserting n elements into an initially empty queue, so that the front list of g contains a single element and the rear list contains n | elements. Now, suppose we use q persistently by taking its tail n times. Each call of tail q takes n actual steps. The total actual cost of this sequence of operations, including the time to build g, is n? + n. If the operations truly took O(1) amortized time each, then the total actual cost would be only O(n). Clearly, using these queues persistently invalidates the O(1) amortized time bounds proved above. Where do these proofs go wrong?

In both cases, a fundamental requirement of the analysis is violated by persistent data struc- tures. The banker’s method requires that no credit be spent more than once, while the physi- cist’s method requires that the output of one operation be the input of the next operation (or, more generally, that no output be used as input more than once). Now, consider the second call to tai! q in the example above. The first call to tail ¢ spends all the credits on the rear list of g, leaving none to pay for the second and subsequent calls, so the banker’s method breaks. And the second call to tail q reuses q rather than the output of the first call, so the physicist’s method breaks.

Both these failures reflect the inherent weakness of any accounting system based on ac- cumulated savings that savings can only be spent once. The traditional methods of amor- tization operate by accumulating savings (as either credits or potential) for future use. This works well in an ephemeral setting, where every operation has only a single logical future. But with persistence, an operation might have multiple logical futures, each competing to spend the same savings.

3.2.1 Execution Traces and Logical Time

What exactly do we mean by the “logical future” of an operation?

We model logical time with execution traces, which give an abstract view of the history of a computation. An execution trace is a directed graph whose nodes represent “interesting” operations, usually just update operations on the data type in question. An edge from v to v’ indicates that operation v’ uses some result of operation v. The logical history of operation v, denoted %, is the set of all operations on which the result of v depends (including v itself). In other words, @ is the set of all nodes w such that there exists a path (possibly of length 0) from w to v. A logical future of a node v is any path from v to a terminal node (1.e., a node with out-degree zero). If there is more than one such path, then node v has multiple logical

20 Amortization and Persistence via Lazy Evaluation

futures. We will sometimes refer to the logical history or logical future of an object, meaning the logical history or logical future of the operation that created the object.

Execution traces generalize the notion of version graphs [DSST89], which are often used to model the histories of persistent data structures. In a version graph, nodes represent the various versions of a single persistent identity and edges represent dependencies between versions. Thus, version graphs model the results of operations and execution traces model the operations themselves. Execution traces are often more convenient for combining the histories of several persistent identities (perhaps not even of the same data type) or for reasoning about operations that do not return a new version (e.g., queries) or that return several results (e.g., splitting a list into two sublists).

For ephemeral data structures, the out-degree of every node in a version graph or execu- tion trace is typically restricted to be at most one, reflecting the limitation that objects can be updated at most once. To model various flavors of persistence, version graphs allow the out-degree of every node to be unbounded, but make other restrictions. For instance, version graphs are often limited to be trees (forests) by restricting the in-degree of every node to be at most one. Other version graphs allow in-degrees of greater than one, but forbid cycles, making every graph a dag. We make none of these restrictions on execution traces. Nodes with in- degree greater than one correspond to operations that take more than one argument, such as list catenation or set union. Cycles arise from recursively defined objects, which are supported by many lazy languages. We even allow multiple edges between a single pair of nodes, as might occur if a list is catenated with itself.

We will use execution traces in Section 3.4.1 when we extend the banker’s method to cope with persistence.

3.3. Reconciling Amortization and Persistence

In the previous section, we saw that traditional methods of amortization break in the presence of persistence because they assume a unique future, in which the accumulated savings will be spent at most once. However, with persistence, multiple logical futures might all try to spend the same savings. In this section, we show how the banker’s and physicist’s methods can be repaired by replacing the notion of accumulated savings with accumulated debt, where debt measures the cost of unevaluated lazy computations. The intuition is that, although savings can only be spent once, it does no harm to pay off debt more than once.

3.3.1 The Role of Lazy Evaluation

Recall that an expensive operation is one whose actual costs are greater than its (desired) amor- tized costs. For example, suppose some application f x is expensive. With persistence, a

3.3 Reconciling Amortization and Persistence 21

malicious adversary might call f x arbitrarily often. (Note that each operation is a new logi- cal future of x.) If each operation takes the same amount of time, then the amortized bounds degrade to the worst-case bounds. Hence, we must find a way to guarantee that if the first application of f to x is expensive, then subsequent applications of f to x will not be.

Without side-effects, this is impossible under call-by-value (1.e., strict evaluation) or call- by-name (i.e., lazy evaluation without memoization), because every application of f to x will take exactly the same amount of time. Therefore, amortization cannot be usefully combined with persistence in languages supporting only these evaluation orders.

But now consider call-by-need (i.e., lazy evaluation with memoization). If x contains some suspended component that is needed by f, then the first application of f to x will force the (potentially expensive) evaluation of that component and memoize the result. Subsequent op- erations may then access the memoized result directly. This is exactly the desired behavior!

Remark: In retrospect, the relationship between lazy evaluation and amortization is not surprising. Lazy evaluation can be viewed as a form of self-modification, and amortization often involves self-modification [ST85, ST86b]. However, lazy evaluation is a particularly disciplined form of self-modification not all forms of self-modification typically used in amortized ephemeral data structures can be encoded as lazy evaluation. In particular, splay- ing [ST85] does not appear to be amenable to this technique. 2

3.3.2 A Framework for Analyzing Lazy Data Structures

We have just shown that lazy evaluation is necessary to implement amortized data structures purely functionally. Unfortunately, analyzing the running times of programs involving lazy evaluation is notoriously difficult. Historically, the most common technique for analyzing lazy programs has been to pretend that they are actually strict. However, this technique is completely inadequate for analyzing lazy amortized data structures. We next describe a basic framework to support such analyses. In the remainder of this chapter, we will adapt the banker’s and physicist’s methods to this framework, yielding both the first techniques for analyzing persistent amortized data structures and the first practical techniques for analyzing non-trivial lazy programs.

We classify the costs of any given operation into several categories. First, the unshared cost of an operation is the actual time it would take to execute the operation under the assumption that every suspension in the system at the beginning of the operation has already been forced and memoized (i.e., under the assumption that force always takes O(1) time, except for those suspensions that are created and forced within the same operation). The shared cost of an operation is the time that it would take to execute every suspension created but not evaluated by the operation (under the same assumption as above). The complete cost of an operation is

22 Amortization and Persistence via Lazy Evaluation

the sum of its shared and unshared costs. Note that the complete cost is what the actual cost of the operation would be if lazy evaluation were replaced with strict evaluation.

We further partition the total shared costs of a sequence of operations into realized and unrealized costs. Realized costs are the shared costs for suspensions that are executed during the overall computation. Unrealized costs are the shared costs for suspensions that are never executed. The total actual cost of a sequence of operations is the sum of the unshared costs and the realized shared costs unrealized costs do not contribute to the actual cost. Note that the amount that any particular operation contributes to the total actual cost is at least its unshared cost, and at most its complete cost, depending on how much of its shared cost is realized.

We account for shared costs using the notion of accumulated debt. Initially, the accumu- lated debt is zero, but every time a suspension is created, we increase the accumulated debt by the shared cost of the suspension (and any nested suspensions). Each operation then pays off a portion of the accumulated debt. The amortized cost of an operation is the unshared cost of the operation plus the amount of accumulated debt paid off by the operation. We are not allowed to force a suspension until the debt associated with the suspension is entirely paid off. This treatment of debt is reminiscent of a Jayaway plan, in which one reserves an item and then makes regular payments, but receives the item only when it is entirely paid off.

There are three important moments in the life cycle of a suspension: when it is created, when it is entirely paid off, and when it is executed. The proof obligation is to show that the second moment precedes the third. If every suspension is paid off before it is forced, then the total amount of debt that has been paid off is an upper bound on the realized shared costs, and therefore the total amortized cost (i.e., the total unshared cost plus the total amount of debt that has been paid off) is an upper bound on the total actual cost (i.e., the total unshared cost plus the realized shared costs). We will formalize this argument in Section 3.4.1.

One of the most difficult problems in analyzing the running time of lazy programs is rea- soning about the interactions of multiple logical futures. We avoid this problem by reasoning about each logical future as if it were the only one. From the point of view of the operation that creates a suspension, any logical future that forces the suspension must itself pay for the suspension. If two logical futures wish to force the same suspension, then both must pay for the suspension individually they may not cooperate and each pay only a portion of the debt. An alternative view of this restriction is that we are allowed to force a suspension only when the debt for that suspension has been paid off within the logical history of current operation. Using this method, we will sometimes pay off a debt more than once, thereby overestimating the total time required for a particular computation, but this does no harm and is a small price to pay for the simplicity of the resulting analyses.

3.4 The Banker’s Method 23

3.4 The Banker’s Method

We adapt the banker’s method to account for accumulated debt rather than accumulated savings by replacing credits with debits. Each debit represents a constant amount of suspended work. When we initially suspend a given computation, we create a number of debits proportional to its shared cost and associate each debit with a location in the object. The choice of location for each debit depends on the nature of the computation. If the computation is monolithic (i.e., once begun, it runs to completion), then all debits are usually assigned to the root of the result. On the other hand, if the computation is incremental (i.e., decomposable into fragments that may be executed independently), then the debits may be distributed among the roots of the partial results.

The amortized cost of an operation is the unshared cost of the operation plus the number of debits discharged by the operation. Note that the number of debits created by an operation is not included in its amortized cost. The order in which debits should be discharged depends on how the object will be accessed; debits on nodes likely to be accessed soon should be discharged first. To prove an amortized bound, we must show that, whenever we access a location (possibly triggering the execution of a suspension), all debits associated with that location have already been discharged (and hence the suspended computation has been paid for). This guarantees that the total number of debits discharged by a sequence of operations is an upper bound on the realized shared costs of the operations. The total amortized costs are therefore an upper bound on the total actual costs. Debits leftover at the end of the computation correspond to unrealized shared costs, and are irrelevant to the total actual costs.

Incremental functions play an important role in the banker’s method because they allow debits to be dispersed to different locations in a data structure, each corresponding to a nested suspension. Then, each location can be accessed as soon as its debits are discharged, without waiting for the debits at other locations to be discharged. In practice, this means that the initial partial results of an incremental computation can be paid for very quickly, and that subsequent partial results may be paid for as they are needed. Monolithic functions, on the other hand, are much less flexible. The programmer must anticipate when the result of an expensive monolithic computation will be needed, and set up the computation far enough in advance to be able to discharge all its debits by the time its result is needed.

3.4.1 Justifying the Banker’s Method

In this section, we justify the claim that the total amortized cost is an upper bound on the total actual cost. The total amortized cost is the total unshared cost plus the total number of debits discharged (counting duplicates); the total actual cost is the total unshared cost plus the realized shared costs. Therefore, we must show that the total number of debits discharged is an upper bound on the realized shared costs.

24 Amortization and Persistence via Lazy Evaluation

We can view the banker’s method abstractly as a graph labelling problem, using the execu- tion traces of Section 3.2.1. The problem is to label every node in a trace with three (multi)sets s(v), a(v), and r(v) such that

(1) v ~v' > s(v) Ns(v') = 9 (I) a(v) c wes s(w) (ID re) SU ealw)

s(v) is a set, but a(v) and r(v) may be multisets (i.e., may contain duplicates). Conditions II and III ignore duplicates.

s(v) is the set of debits allocated by operation v. Condition I states that no debit may be allocated more than once. a(v) is the multiset of debits discharged by v. Condition II insists that no debit may be discharged before it is created, or more specifically, that an operation can only discharge debits that appear in its logical history. Finally, r(v) is the multiset of debits realized by v (that is, the multiset of debits corresponding to the suspensions forced by v). Condition II requires that no debit may be realized before it is discharged, or more specifically, that no debit may realized unless it has been discharged within the logical history of the current operation.

Why are a(v) and r(v) multisets rather than sets? Because a single operation might dis- charge the same debits more than once or realize the same debits more than once (by forcing the same suspensions more than once). Although we never deliberately discharge the same debit more than once, it could happen if we were to combine a single object with itself. For example, suppose in some analysis of a list catenation function, we discharge a few debits from the first argument and a few debits from the second argument. If we then catenate a list with itself, we might discharge the same few debits twice.

Given this abstract view of the banker’s method, we can easily measure various costs of a computation. Let V be the set of all nodes in the execution trace. Then, the total shared cost is >-vev |5(v)| and the total number of debits discharged is )7,,-y |a(v)|. Because of memoization, the realized shared cost is not >,,<y |r(wv)|, but rather | U,<y r(v)|, where L) discards duplicates. By Condition II, we know that U,<y r(v) © Unev a(v). Therefore,

| Uvev r(v)| ss | Uvev a(v)| < dveV la(v)|

So the realized shared cost is bounded by the total number of debits discharged, and the total actual cost is bounded by the total amortized cost, as desired.

Remark: This argument once again emphasizes the importance of memoization. Without

memoization (i.e., if we were using call-by-name rather than call-by-need), the total realized

cost would be >.< |r(v)|, and there is no reason to expect this sum to be less than >, <y |a(v)]. ©

3.4 The Banker’s Method 25

3.4.2 Example: Queues

We next develop an efficient persistent implementation of queues, and prove that every opera- tion takes only O(1) amortized time using the banker’s method.

Based on the discussion in the previous section, we must somehow incorporate lazy eval- uation into the design of the data structure, so we replace the pair of lists in the previous implementation with a pair of streams.! To simplify later operations, we also explicitly track the lengths of the two streams.

datatype a Queue = Queue {F : a Stream, LenF : int, R : a Stream, LenR : int}

Note that a pleasant side effect of maintaining this length information is that we can trivially support a constant-time szze function.

Now, waiting until the front list becomes empty to reverse the rear list does not leave suf- ficient time to pay for the reverse. Instead, we periodically rotate the queue by moving all the elements of the rear stream to the end of the front stream, replacing f with fF’ + reverse R and setting the new rear stream to empty ($Nil). Note that this transformation does not affect the relative ordering of the elements.

When should we rotate the queue? Recall that reverse is a monolithic function. We must therefore set up the computation far enough in advance to be able to discharge all its debits by the time its result is needed. The reverse computation takes || steps, so we will allocate | R| debits to account for its cost. (For now we ignore the cost of the + operation). The earliest the reverse suspension could be forced is after |f’| applications of tail, so if we rotate the queue when || ~ |F'| and discharge one debit per operation, then we will have paid for the reverse by the time it is executed. In fact, we will rotate the queue whenever becomes one longer than F’, thereby maintaining the invariant that |F’'| > |R|. Incidentally, this guarantees that F is empty only if # is also empty. The major queue functions can now be written as follows:

fun snoc (Queue {F = f, LenF = /enF’, R =r, LenR = lenR}, x) =

queue {F = f, LenF = lenF’, R = $Cons (2, r), LenR = /enR+1} fun head (Queue {F = $Cons (z, f),...}) =2

fun tail (Queue {F = $Cons (z, f), LenF = /enF', R = r, LenR = lenR}) =

queue {F = f, LenF = len —1, R=r, LenR = lenR}

where the pseudo-constructor gueue guarantees that |F'| > | RJ.

fun queue (q as {F =f, LenF = /enF’, R= r, LenR = lenR}) = if /enR < lenF then Queue q else Queue {F = f + reverse r, LenF = lenf'+lenR, R = $Nil, LenR = 0}

The complete code for this implementation appears in Figure 3.3.

' Actually, it would be enough to replace only the front list with a stream, but we replace both for simplicity.

26 Amortization and Persistence via Lazy Evaluation

structure BankersQueue : QUEUE = struct datatype a Queue = Queue {F : a Stream, LenF : int, R : a Stream, LenR : int} Invariants: |F| > |R|, LenF = |F|, LenR = |R| *)

exception EMPTY

val empty = Queue {F = $Nil, LenF = 0, R = $Nil, LenR = 0}

fun isEmpty (Queue {LenF = lenF’,...}) =(lenF = 0)

fun queue as {F = f, LenF = lenF’, R=r, LenR = lenR}) = if lenR < lenF then Queue ¢

else Queue {F = f + reverse r, LenF = lenF'+lenR, R = $Nil, LenR = 0}

fun snoc (Queue {F = f, LenF = /enF’, R= r, LenR = lenR}, x) = queue {F = f, LenF = lenF’, R = $Cons (2, r), LenR = lenR+1}

fun head (Queue {F = $Nil, ... }) = raise EMPTY | head (Queue {F = $Cons (x, f),...}) = 2 fun tail (Queue {F = $Nil, ... }) = raise EMPTY | tail (Queue {F = $Cons (x, f), LenF = lenF’, R =r, LenR = lenR}) = queue {F = f, LenF = lenF—1, R=r, LenR = lenR} end

Figure 3.3: Amortized queues using the banker’s method.

To understand how this implementation deals efficiently with persistence, consider the fol- lowing scenario. Let g be some queue whose front and rear streams are both of length m, and let q; = tail q;-1, for 0 <1 <m-+1. The queue is rotated during the first application of tail, and the reverse suspension created by the rotation is forced during the last application of tac. This reversal takes m steps, and its cost is amortized over the sequence q@ ... Gm. (For now, we are concerned only with the cost of the reverse we ignore the cost of the +.)

Now, choose some branch point k, and repeat the calculation from q to g,41. (Note that q, is used persistently.) Do this d times. How often is the reverse executed? It depends on whether the branch point is before or after the rotation. Suppose é is after the rotation. In fact, suppose k; = m so that each of the repeated branches is a single tai/. Each of these branches forces the reverse Suspension, but they each force the same suspension, so the reverse is executed only once. Memoization is crucial here without memoization the reverse would be re-executed each time, for a total cost of m(d + 1) steps, with only m + 1 + d operations over which to amortize this cost. For large d, this would result in an O(m) amortized cost per operation, but memoization gives us an amortized cost of only O(1) per operation.

It is possible to re-execute the reverse however. Simply take k = 0 (i.e., make the branch

3.4 The Banker’s Method 27

point just before the rotation). Then the first tai! of each branch repeats the rotation and creates a new reverse suspension. This new suspension is forced in the last faz! of each branch, executing the reverse. Because these are different suspensions, memoization does not help at all. The total cost of all the reversals is m - d, but now we have (m + 1)(d + 1) operations over which to amortize this cost, yielding an amortized cost of O(1) per operation. The key is that we duplicate work only when we also duplicate the sequence of operations over which to amortize the cost of that work.

This informal argument shows that these queues require only O(1) amortized time per operation even when used persistently. We formalize this proof using the banker’s method.

By inspection, the unshared cost of every queue operation is O(1). Therefore, to show that the amortized cost of every queue operation is O(1), we must prove that discharging O(1) debits per operation suffices to pay off every suspension before it is forced. (In fact, only snoc and ¢az! must discharge any debits.)

Let d(z) be the number of debits on the ith node of the front stream and let D(z) = >-;<o @(7) be the cumulative number of debits on all nodes up to and including the 7th node. We maintain the following debit invariant:

D(i) < min(2i,|F| —|R])

The 27 term guarantees that all debits on the first node of the front stream have been discharged (since d(0) = D(0) < 2-0 = 0), so this node may be forced at will (for instance, by head or tail). The |F'| |A| term guarantees that all debits in the entire queue have been discharged whenever the streams are of equal length (i.e., just before the next rotation).

Theorem 3.1 The snoc and tail operations maintain the debit invariant by discharging one and two debits, respectively.

Proof: Every snoc operation that does not cause a rotation simply adds a new element to the rear stream, increasing |R| by one and decreasing |F'| |R| by one. This will cause the invariant to be violated at any node for which D(i) was previously equal to |F'| |R]. We can restore the invariant by discharging the first debit in the queue, which decreases every subsequent cumulative debit total by one. Similarly, every taz/ that does not cause a rotation simply removes an element from the front stream. This decreases |F'| by one (and hence |F'| |R| by one), but, more importantly, it decreases the index 7 of every remaining node by one, which in turn decreases 27 by two. Discharging the first two debits in the queue restores the invariant. Finally, consider a snoc or tai! that causes a rotation. Just before the rotation, we are guaranteed that all debits in the queue have been discharged, so, after the rotation, the only debits are those generated by the rotation itself. If |#'] = m and |R| = m + | at the time of the rotation, then there will be m debits for the append and m + | debits for the reverse. The append function is incremental so we place one of its debits on each of the first m nodes. On

28 Amortization and Persistence via Lazy Evaluation

the other hand, the reverse function is monolithic so we place all m + | of its debits on node m, the first node of the reversed stream. Thus, the debits are distributed such that

| el pa. eee d(i)=4 m+1 ifi=m and DOS on 0 fy 2

This distribution violates the invariant at both node 0 and node m, but discharging the debit on the first node restores the invariant.

The format of this argument is typical. Debits are distributed across several nodes for incremental functions, and all on the same node for monolithic functions. Debit invariants measure, not just the number of debits on a given node, but the number of debits along the path from the root to the given node. This reflects the fact that accessing a node requires first accessing all its ancestors. Therefore, the debits on all those nodes must be zero as well.

This data structure also illustrates a subtle point about nested suspensions the debits for a nested suspension may be allocated, and even discharged, before the suspension is physi- cally created. For example, consider how + (append) works. The suspension for the second node in the stream is not physically created until the suspension for the first node is forced. However, because of memoization, the suspension for the second node will be shared when- ever the suspension for the first node is shared. Therefore, we consider a nested suspension to be implicitly created at the time that its enclosing suspension is created. Furthermore, when considering debit arguments or otherwise reasoning about the shape of an object, we ignore whether a node has been physically created or not. Rather we reason about the shape of an object as if all nodes were in their final form, 1.e., as if all suspensions in the object had been forced.

3.5 The Physicist’s Method

Like the banker’s method, the physicist’s method can also be adapted to work with accumulated debt rather than accumulated savings. In the traditional physicist’s method, one describes a potential function ® that represents a lower bound on the accumulated savings. To work with debt instead of savings, we replace ® with a function WV that maps each object to a potential representing an upper bound on the accumulated debt (or at least, an upper bound on this object’s portion of the accumulated debt). Roughly speaking, the amortized cost of an operation is then the complete cost of the operation (i.e., the shared and unshared costs) minus the change in potential. Recall that an easy way to calculate the complete cost of an operation is to pretend that all computation is strict.

Any changes in the accumulated debt are reflected by changes in the potential. If an op- eration does not pay any shared costs, then the change in potential is equal to its shared cost,

3.5 The Physicist’s Method 29

so the amortized cost of the operation is equal to its unshared cost. On the other hand if an operation does pay some of its shared cost, or shared costs of previous operations, then the change in potential is smaller than its shared cost (1.e., the accumulated debt increases by less than the shared cost), so the amortized cost of the operation is greater than its unshared cost. However, the change in potential may never be more than the shared cost the amortized cost of an operation may not be less than its unshared cost.

We can justify the physicist’s method by relating it back to the banker’s method. Recall that in the banker’s method, the amortized cost of an operation was its unshared cost plus the number of debits discharged. In the physicist’s method, the amortized cost is the complete cost minus the change in potential, or, in other words, the unshared cost plus the difference between the shared cost and the change in potential. If we consider one unit of potential to be equivalent to one debit, then the shared cost is the number of debits by which the accumulated debt could have increased, and the change in potential is the number of debits by which the accumulated debt did increase. The difference must have been made up by discharging some debits. Therefore, the amortized cost in the physicist’s method can also be viewed as the unshared cost plus the number of debits discharged.

Sometimes, we wish to force a suspension in an object when the potential of the object is not zero. In that case, we add the object’s potential to the amortized cost. This typically happens in queries, where the cost of forcing the suspension cannot be reflected by a change in potential because the operation does not return a new object.

The major difference between the banker’s and physicist’s methods is that, in the banker’s method, we are allowed to force a suspension as soon as the debits for that suspension have been paid off, without waiting for the debits for other suspensions to be discharged, but in the physicist’s method, we can force a shared suspension only when we have reduced the entire accumulated debt of an object, as measured by the potential, to zero. Since potential measures only the accumulated debt of an object as a whole and does not distinguish between different locations, we must pessimistically assume that the entire outstanding debt is associated with the particular suspension we wish to force. For this reason, the physicist’s method appears to be less powerful than the banker’s method. The physicist’s method is also weaker in other ways. For instance, it has trouble with operations that take multiple objects as arguments or return multiple objects as results, for which it is difficult to define exactly what “change in potential” means. However, when it applies, the physicist’s method tends to be much simpler than the banker’s method.

Since the physicist’s method cannot take advantage of the piecemeal execution of nested suspensions, there is no reason to prefer incremental functions to monolithic functions. In fact, a good hint that the physicist’s method might be applicable is if all or most suspensions are monolithic.

30 Amortization and Persistence via Lazy Evaluation

3.5.1 Example: Queues

We next adapt our implementation of queues to use the physicist’s method. Again, we show that every operation takes only O(1) amortized time.

Because there is no longer any reason to prefer incremental suspensions over monolithic suspensions, we use suspended lists instead of streams. In fact, the rear list need not be sus- pended at all, so we represent it with an ordinary list. Again, we explicitly track the lengths of the lists and guarantee that the front list is always at least as long as the rear list.

Since the front list is suspended, we cannot access its first element without executing the entire suspension. We therefore keep a working copy of a prefix of the front list. This working copy is represented as an ordinary list for efficient access, and is non-empty whenever the front list is non-empty. The final datatype is

datatype «a Queue = Queue of {W : a list, F : a list susp, LenF : int, R : a list, LenR : int} The major functions on queues may then be written

fun snoc (Queue {W = w, F=/, LenF = lenF, R= r, LenR = lenR}, 1) = queue {W = w, F=/, LenF = /lenF, R= 2x :: r, LenR = /enR+1}

fun head (Queue {W =x:: w,...})=2

fun tail (Queue {W = 2 :: w, F=f, LenF = lenF’, R= rr, LenR = lenR})= queue {W = w, F = $tl (force f), LenF = /enF —1, R = r, LenR = lenR})

The pseudo-constructor guewe must enforce two invariants: that PR is no longer than /’, and that W is non-empty whenever /’ is non-empty.

fun checkW {W =[], F=/, LenF= len’, R= 1, LenR = lenR}) =

Queue {W = force f, F=f, LenF = lenF’, R= r, LenR = lenR})

| checkW q = Queue ¢

fun checkR (q as {W = w, F=/, LenF= lenF', R= r, LenR = lenk})=

if /enR < lenF then

else let val w’ = force f

in {W = w’, F= $(w’ @ rev r), LenF = lenF'+lenR, R = [], LenR = 0} end

fun queue g = checkW (checkR q)

The complete implementation of these queues appears in Figure 3.4.

To analyze these queues using the physicist’s method, we choose a potential function WV in such a way that the potential will be zero whenever we force the suspended list. This happens in two situations: when W becomes empty and when # becomes longer than fF’. We therefore choose W to be

W(q) = min(2|W1,1F | |Rl)

3.5 The Physicist’s Method 31

structure PhysicistsQueue : QUEUE = struct datatype a Queue = Queue of {W : a list, F : a list susp, LenF : int, R : a list, LenR : int} (* Invariants: W is a prefix of force F, W = [] only if F = $[], *) |F| > |R|, LenF = |F|, LenR = |R| *)

exception EMPTY

val empty = Queue {W =[], F=$[], LenF = 0, R =[], LenR = 0} fun isEmpty (Queue {LenF = lenF’,...}) =(lenF = 0)

fun checkW {W =[], F=f, LenF= lenF’, R= r, LenR = lenR}) = Queue {W = force f, F=f, LenF = lenF’, R = r, LenR = lenR}) | checkW gq = Queue ¢ fun checkR (q¢ as {W = w, F=f, LenF= lenF’, R=r, LenR = lenR}) =

if lenR < lenF then gq else let val w’ = force f in {W = w’, F=$(w’ @ rev r), LenF = lenF' + lenR, R=[], LenR = 0} end fun queue g = checkW (checkR q)

fun snoc (Queue {W = w, F=f, LenF= lenF’, R= 1, LenR = lenR}, x) = queue {W = w, F=f, LenF= lenF, R= :: r, LenR = lenR+1}

fun head (Queue {W =[], ... }) = raise EMPTY | head (Queue {W=2:: w,...}J=a2 fun tail (Queue {W =[], ... }) = raise EMPTY | tail (Queue {W = 2 :: w, F=f, LenF= lenF’, R= 1, LenR = lenR}) = queue {W = w, F = $tl (force f), LenF = lenF—1, R= r, LenR = lenR})

end

Figure 3.4: Amortized queues using the physicist’s method.

Theorem 3.2 The amortized costs of snoc and tail are at most two and four, respectively.

Proof: Every snoc that does not cause a rotation simply adds a new element to the rear list, increasing |R| by one and decreasing |F'| |R| by one. The complete cost of the snoc is one, and the decrease in potential is at most one, for an amortized cost of at most | (—1) = 2. Every tail that does not cause a rotation removes the first element from the working list and lazily removes the same element from the front list. This decreases |W| by one and |F'| |R| by one, which decreases the potential by at most two. The complete cost of taz/ is two, one for the unshared costs (including removing the first element from W) and one for the shared cost of lazily removing the head of fF’. The amortized cost is therefore at most 2 (—2) = 4.

Finally, consider a snoc or tail that causes a rotation. In the initial queue, |/’| = |R| so

32 Amortization and Persistence via Lazy Evaluation

W = 0. Just before the rotation, |F'| = m and |R| = m + 1. The shared cost of the rotation is 2m + | and the potential of the resulting queue is 2m. The amortized cost of snoc is thus 1+ (2m + 1) 2m = 2. The amortized cost of tail is 2+ (2m + 1) 2m = 3. (The difference is that ¢a7/ must also account for the shared cost of removing the first element of F’.)

Finally, we consider two variations of these queues that on the surface appear to be mod- est improvements, but which actually break the amortized bounds. These variations illustrate common mistakes in designing persistent amortized data structures.

In the first variation, we observe that checkR forces F during a rotation and installs the result in W. Wouldn’t it be “‘lazier’, and therefore better, to never force / until W becomes empty? The answer is no, and a brief consideration of the potential function reveals why. If W were very short, then the potential would only increase to 2|W| after the rotation. This increase would not be large enough to offset the large shared cost of the rotation. Another way of looking at it is that, if |W] = 1 at the time of the rotation, then the front list could be forced during the very next taz!, which does not leave enough time to pay for the rotation.

In the second variation, we observe that during a tail, we replace F’ with $#/ (force F). Creating and forcing suspensions have non-trivial overheads that, even if O(1), can contribute to a large constant factor. Wouldn’t it be “lazier’, and therefore better, to not change /’, but instead to merely decrement LenF to indicate that the element has been removed? The answer is again no, because the removed elements would be discarded all at once when the front list was finally forced. This would contribute to the unshared cost of the operation, not the shared cost, making the unshared cost linear in the worst case. Since the amortized cost can never be less than the unshared cost, this would also make the amortized cost linear.

3.5.2 Example: Bottom-Up Mergesort with Sharing

The majority of examples in the remaining chapters use the banker’s method rather than the physicist’s method. Therefore, we give a second example of the physicist’s method here.

Imagine that you want to sort several similar lists, such as xs and x :: xs, or xs @ zs and ys @ zs. For efficiency, you wish to take advantage of the fact that these lists share common tails, so that you do not repeat the work of sorting those tails. We call an abstract data type for this problem a sortable collection.

Figure 3.5 gives a signature for sortable collections. Note that the new function, which creates an empty collection, is parameterized by the “less than” relation on the elements to be sorted.

Now, if we create a sortable collection xs’ by adding each of the elements in xs, then we can sort both xs and z :: xs by calling sort xs’ and sort (add (, xs’)).

3.5 The Physicist’s Method 33

signature SORTABLE = sig type a Sortable

val new : {Less : a x a bool} + a Sortable (x sort in increasing order by Less *) val add: a x a Sortable + a Sortable val sort : a Sortable a list

end

Figure 3.5: Signature for sortable collections.

One possible representation for sortable collections is balanced binary search trees. Then add takes O(log n) worst-case time and sort takes O(n) time. We achieve the same bounds, but in an amortized sense, using bottom-up mergesort.

Bottom-up mergesort first splits a list into n ordered segments, where each segment initially contains a single element. It then merges equal-sized segments in pairs until only one segment of each size remains. Finally, segments of unequal size are merged, from smallest to largest.

Suppose we take a snapshot just before the final cleanup phase. Then the sizes of all segments are distinct powers of 2, corresponding to the one bits of n. This is the representation we will use for sortable collections. Then similar collections will share all the work of bottom- up mergesort except for the final cleanup phase merging unequal-sized segments. The complete representation is a suspended list of segments, each of which is an a list, together with the comparison function and the size.

type a Sortable = {Less : a x a bool, Size : int, Segments : a list list susp}

The individual segments are stored in increasing order of size, and the elements in each segment are stored in increasing order as determined by the comparison function.

The fundamental operation on segments is merge, which merges two ordered lists. Except for being parameterized on /ess, this function is completely standard.

fun merge less (rs, ys) = let fun mrg ([], ys) = ys | mrg (xs, []) = xs | mrg (x :: 2s, ys: ys) =if less (x, y) then x :: mrg (xs, y :: ys) else y :: mrg (x :: vs, ys) in mrg (xs, ys) end

To add a new element, we create a new singleton segment. If the smallest existing segment is also a singleton, we merge the two segments and continue merging until the new segment

34 Amortization and Persistence via Lazy Evaluation

is smaller than the smallest existing segment. This merging is controlled by the bits of n. If the lowest bit of n is zero, then we simply cons the new segment onto the segment list. If the lowest bit is one, then we merge the two segments and repeat. Of course, all this is done lazily.

fun add (x, {Less = less, Size = size, Segments = segs }) = let fun addSeg (seg, segs, size) = if size mod 2 = 0 then seg :: segs else addSeg (merge /ess (seg, hd segs), tl segs, size div 2) in {Less = less, Size = size+1, Segments = $addSeg ([2], force segs, size)} end

Finally, to sort a collection, we merge the segments from smallest to largest.

fun sort {Less = less, Segments = segs, ... } = let fun mergeAll (xs, []) = xs | mergeAll (xs, seg :: segs) = mergeAll (merge less (xs, seq), segs) in mergeAll ([], force segs) end

Remark: mergeAll can be viewed as computing [] Msp X--- Xs,

where s; is the th segment and is left-associative, infix notation for merge. This is a specific instance of a very common program schema, which can be written

e311 O::: PL,

for any c and left-associative ©. Other instances of this schema include summing a list of integers (c = 0 and @ = +) or finding the maximum of a list of natural numbers (c = 0 and © = max). One of the greatest strengths of functional languages is the ability to define schemas like this as higher-order functions (i.e., functions that take functions as arguments or return functions as results). For example, the above schema might be written

fun foldl (f, c, []J)=c | foldl (/, c, x :: vs) = foldl (f, f (c, x), xs)

Then sort could be written fun sort {Less = less, Segments = segs, ... } = foldl (merge less, [], force segs)

This also takes advantage of the fact that merge is written as a curried function. A curried function is a multiargument function that can be partially applied (i.e., applied to just some of its arguments). The result is a function that takes the remaining arguments. In this case, we have applied merge to just one of its three arguments, /ess. The remaining two arguments will be supplied by fold!. o)

3.5 The Physicist’s Method 35

structure BottomUpMergeSort : SORTABLE = struct type « Sortable = {Less : a x a > bool, Size : int, Segments : a list list susp}

fun merge less (xs, ys) = let fun mrg ([ J, ys) = ys | mrg (es, []) = «s | mrg (@ :: es, ys: ys) =if less (a, y) then x :: mrg (as, y :: ys) else y :: mrg (a :: 7s, ys) in mrg (as, ys) end

fun new {Less = less} = {Less = less, Size = 0, Segments = $[ ]} fun add (x, {Less = less, Size = size, Segments = segs}) = let fun addSeg (seg, segs, size) = if size mod 2 = 0 then seg :: segs else addSeg (merge less (seg, hd segs), tl segs, size div 2) in {Less = less, Size = size+1, Segments = $addSeg ([], force segs, size)} end fun sort {Less = less, Segments = segs, ... } = let fun mergeAll (xs, [ ]) = xs | mergeAll (xs, seg :: segs) = mergeAll (merge less (xs, seg), segs) in mergeAll ([], force segs) end end

Figure 3.6: Sortable collections based on bottom-up mergesort.

The complete code for this implementation of sortable collections appears in Figure 3.6.

We show that add takes O(log n) amortized time and sort takes O(n) amortized time using the physicist’s method. We begin by defining the potential function VY, which is completely determined by the size of the collection:

W(n) = 2n —2 S— b(n mod 2! + 1) 1=0 where 0; is the ith bit of n. Note that U(n) is bounded above by 2n and that U(n) = 0 exactly when n = 2" | for some k.

We first calculate the complete cost of add. Its unshared cost is one and its shared cost is the cost of performing the merges in addSeg. Suppose that the lowest & bits of n are one (i.e., b; = 1 fori < k and 6; = 0). Then addSeg performs & merges. The first combines two lists of size 1, the second combines two lists of size 2, and so on. Since merging two lists of size m takes 2m steps, addSeg takes (1+1)+(2+2)+---+(28-1 42-1) = 2°08) 2") = 2(2*—-1) steps. The complete cost of add is therefore 2(2* 1) + 1 = 2**1 1,

36 Amortization and Persistence via Lazy Evaluation

Next, we calculate the change in potential. Let n’ = n + 1 and let U’ be the 7th bit of n’. Then,

W(n') Vin) = 2n!’ 20%, Oi (n’ mod 2' +1)—(2n- 2 eo b(n mod See 1)) 2 + 25025 (b;(n mod + 1) Bi(n’ mod 2’ + 1)) = 242055 4(2)

where 6(7) = 0;(m mod 2' + 1) bi(n’ mod 2' + 1). We consider three cases: i < k, i = k, and2 > k.

e (i < k): Since b; = 1 and b! = 0, d(k) = n mod 2' + 1. But n mod 2’ = 2' 1 so eee

e (i = k): Since b, = 0 and bi, = 1, 6(k) = —(n' mod 2* + 1). But n’ mod 2* = 0 so Khas

e (i > k): Since b, = b;, 6(k) = bi(n mod 2' n’ mod 2"). But n’ mod 2' = (n + 1) mod 2' = n mod 2' + 1 so 6(2) = &(—1) = —b.

Therefore, W(n')—V(n) = 2+ 20%) (i) Ss De Dye a Oy elo) = 242(2* 1) - 208, & =: OM 20h!

where #’ is the number of one bits in n’. Then the amortized cost of add is

Ga a oR S28 1

Since B’ is O(log n), so is the amortized cost of add.

Finally, we calculate the amortized cost of sort. The first action of sort is to force the suspended list of segments. Since the potential is not necessarily zero, this adds U(n) to the amortized cost of the operation. It next merges the segments from smallest to largest. The worst case is when n = 2* 1, so that there is one segment of each size from | to 2*-. Merging these segments takes

(ta 2) ce Ciro) Uae) apes 2 eee 22) 2505 f= S(t 1) <M 4) —(b-1) = n=

steps altogether. The amortized cost of sort is therefore O(n) + U(n) = O(n).

3.6 Related Work 37

3.6 Related Work

Debits Some analyses using the traditional banker’s method, such as Tarjan’s analysis of path compression [Tar83], include both credits and debits. Whenever an operation needs more credits than are currently available, it creates a credit-debit pair and immediately spends the credit. The debit remains as an obligation that must be fulfilled. Later, a surplus credit may be used to discharge the credit.?, Any debits that remain at the end of the computation add to the total actual cost. Although there are some similarities between the two kinds of debits, there are also some clear differences. For instance, with the debits introduced in this chapter, any debits leftover at the end of the computation are silently discarded.

It is interesting that debits arise in Tarjan’s analysis of path compression since path com- pression is essentially an application of memoization to the find function.

Amortization and Persistence Until this work, amortization and persistence were thought to be incompatible. Several researchers [DST94, Ram92] had noted that amortized data struc- tures could not be made efficiently persistent using existing techniques for adding persistence to ephemeral data structures, such as [DSST89, Die89], for reasons similar to those cited in Sec- tion 3.2. Ironically, these techniques produce persistent data structures with amortized bounds, but the underlying data structure must be worst-case. (These techniques have other limitations as well. Most notably, they cannot be applied to data structures supporting functions that com- bine two or more versions. Examples of offending functions include list catenation and set union.)

The idea that lazy evaluation could reconcile amortization and persistence first appeared, in rudimentary form, in [Oka95c]. The theory and practice of this technique was further devel- oped in [Oka95a, Oka96b].

Amortization and Functional Data Structures In his thesis, Schoenmakers [Sch93] studies amortized data structures in a strict functional language, concentrating on formal derivations of amortized bounds using the traditional physicist’s method. He avoids the problems of per- sistence by insisting that data structures only be used in a single-threaded fashion.

Queues Gries [Gri81, pages 250-251] and Hood and Melville [HM81] first proposed the queues in Section 3.1.1. Burton [Bur82] proposed a similar implementation, but without the restriction that the front list be non-empty whenever the queue is non-empty. (Burton combines head and tail into a single operation, and so does not require this restriction to support head efficiently.) The queues in Section 3.4.2 first appeared in [Oka96b].

>There is a clear analogy here to the spontaneous creation and mutual annihilation of particle-antiparticle pairs in physics. In fact, a better name for these debits might be “anticredits”.

38 Amortization and Persistence via Lazy Evaluation

Time-Analysis of Lazy Programs Several researchers have developed theoretical frame- works for analyzing the time complexity of lazy programs [BH89, San90, San95, Wad88]. However, these frameworks are not yet mature enough to be useful in practice. One difficulty is that these frameworks are, in some ways, too general. In each of these systems, the cost of a program is calculated with respect to some context, which is a description of how the result of the program will be used. However, this approach is often inappropriate for a methodology of program development in which data structures are designed as abstract data types whose behavior, including time complexity, is specified in isolation. In contrast, our analyses prove results that are independent of context (1.e., that hold regardless of how the data structures are used).

Chapter 4

Eliminating Amortization

Most of the time, we do not care whether a data structure has amortized bounds or worst-case bounds; our primary criteria for choosing one data structure over another are overall efficiency and simplicity of implementation (and perhaps availability of source code). However, in some application areas, it is important to bound the running times of individual operations, rather than sequences of operations. In these situations, a worst-case data structure will often be preferable to an amortized data structure, even if the amortized data structure is simpler and faster overall. Raman [Ram92] identifies several such application areas, including

e Real-time systems: In real-time systems, predictability is more important than raw speed [Sta88]. If an expensive operation causes the system to miss a hard deadline, it does not matter how many cheap operations finished well ahead of schedule.

e Parallel systems: If one processor in a synchronous system executes an expensive oper- ation while the other processors execute cheap operations, then the other processors may sit idle until the slow processor finishes.

e Interactive systems: Interactive systems are similar to real-time systems users often value consistency more than raw speed [But83]. For instance, users might prefer 100 1- second response times to 99 0.25-second response times and 1 25-second response time, even though the latter scenario is twice as fast.

Remark: Raman also identified a fourth application area persistent data structures. As dis- cussed in the previous chapter, amortization was thought to be incompatible with persistence. But, of course, we now know this to be untrue. ©

Does this mean that amortized data structures are of no interest to programmers in these areas? Not at all. Since amortized data structures are often simpler than worst-case data struc- tures, it is sometimes easier to design an amortized data structure, and then convert it to a worst-case data structure, than to design a worst-case data structure from scratch.

40 Eliminating Amortization

In this chapter, we describe scheduling a technique for converting many lazy amortized data structures to worst-case data structures by systematically forcing lazy components in such a way that no suspension ever takes very long to execute. Scheduling extends every object with an extra component, called a schedule, that regulates the order in which the lazy components of that object are forced.

4.1 Scheduling

Amortized and worst-case data structures differ mainly in when the computations charged to a given operation occur. In a worst-case data structure, all computations charged to an operation occur during the operation. In an amortized data structure, some computations charged to an operation may actually occur during later operations. From this, we see that virtually all nominally worst-case data structures become amortized when implemented in an entirely lazy language because many computations are unnecessarily suspended. To describe true worst- case data structures, we therefore need a strict language. If we want to describe both amortized and worst-case data structures, we need a language that supports both lazy and strict evaluation. Given such a language, we can also consider an intriguing hybrid approach: worst-case data structures that use lazy evaluation internally. We will obtain such data structures by beginning with lazy amortized data structures and modifying them in such a way that every operation runs in the allotted time.

In a lazy amortized data structure, any specific operation might take longer than the stated bounds. However, this only occurs when the operation forces a suspension that has been paid off, but that takes a long time to execute. To achieve worst-case bounds, we must guarantee that every suspension executes in less than the allotted time.

Define the intrinsic cost of a suspension to be the amount of time it takes to force the suspension under the assumption that all other suspensions on which it depends have already been forced and memoized, and therefore each take only O(1) time to execute. (This is similar to the definition of the unshared cost of an operation.) The first step in converting an amortized data structure to a worst-case data structure is to reduce the intrinsic cost of every suspension to less than the desired bounds. Usually, this involves rewriting expensive monolithic functions as incremental functions. However, just being incremental is not always good enough the granularity of each incremental function must be sufficiently fine. Typically, each fragment of an incremental function will have an O(1) intrinsic cost.

Even if every suspension has a small intrinsic cost, however, some suspensions might still take longer than the allotted time to execute. This happens when one suspension depends on another suspension, which in turn depends on a third, and so on. If none of the suspensions have been previously executed, then forcing the first suspension will result in a cascade of

4.2 Real-Time Queues 41

forces. For example, consider the following computation: (-- -((s1 ++ sg) ++ $3) +++) HH 3%

++ is the canonical incremental function on streams. It does only one step of the append at a time, and each step has an O(1) intrinsic cost. However, it also forces the first node of its left argument. In this example, forcing the first node of the stream returned by the outermost + forces the first node of the stream returned by the next +, and so on. Altogether, this takes O(k) time to execute (or even more if the first node of s; is expensive to force).

The second step in converting an amortized data structure to a worst-case data structure is to avoid cascading forces by arranging that, whenever we force a suspension, any other suspensions on which it depends have already been forced and memoized. Then, no suspension takes longer than its intrinsic cost to execute. We accomplish this by systematically scheduling the execution of each suspension so that each is ready by the time we need it. The trick is to regard paying off debt as a literal activity, and to force each suspension as it is paid for.

We extend every object with an extra component, called the schedule, that, at least concep- tually, contains a pointer to every unevaluated suspension in the object. (Some of the suspen- sions in the schedule may have already been evaluated in a different logical future, but forcing these suspensions a second time does no harm since it can only make our algorithms run faster than expected, not slower.) Every operation, in addition to whatever other manipulations it performs on an object, forces the first few suspensions in the schedule. The exact number of suspensions forced is governed by the amortized analysis; typically, every suspension takes O(1) time to execute, so we force a number of suspensions proportional to the amortized cost of the operation. Depending on the data structure, maintaining the schedule can be non-trivial. For this technique to apply, adding new suspensions to the schedule, or retrieving the next suspension to be forced, cannot require more time than the desired worst-case bounds.

4.2 Real-Time Queues

As an example of this technique, we convert the amortized banker’s queues of Section 3.4.2 to worst-case queues. Queues such as these that support all operations in O(1) worst-case time are called real-time queues [HM81].

In the original data structure, queues are rotated using + and reverse. Since reverse is monolithic, our first task is finding a way to perform rotations incrementally. This can be done by executing one step of the reverse for every step of the +. We define a function rotate such that

rotate (f, r, a) =f + reverse r + a

Then rotate (f, r, $Nil) = f + reverse r

42 Eliminating Amortization

The extra argument a is called an accumulating parameter and is used to accumulate the partial results of reversing r. It is initially empty.

Rotations occur when |R| = |F'| + 1, so initially |r| = |f| + 1. This relationship is preserved throughout the rotation, so when f is empty, r contains a single element. The base case is therefore

rotate ($Nil, $Cons (y, $Nil), a) = ($Nil) + reverse ($Cons (y, $Nil)) + a = $Cons (y, a)

In the recursive case,

rotate ($Cons (x, f), $Cons (y, 7), a) = ($Cons (x, f)) + reverse ($Cons (y, 7)) + a = $Cons (x, f + reverse ($Cons (y, 7)) + a) $Cons (zx, f + reverse r + $Cons (y, a)) $Cons (z, rotate (f, r, $Cons (y, a)))

The complete code for rotate is

fun rotate (f, r, a) = $case (f, r) of ($Nil, $Cons (y, _)) > Cons (y, a) | ($Cons (x, f’), $Cons (y, r’)) = Cons (2, rotate (f’, r’, $Cons (y, a)))

Note that the intrinsic cost of every suspension created by rotate is O(1). Just rewriting the pseudo-constructor queue to call rotate (f, r, $Nil) instead f + reverse r, and making no other changes, already drastically improves the worst-case behavior of the queue operations from O(n) to O(log n) (see [Oka95c]), but we can further improve the worst-case behavior to O(1) using scheduling.

We begin by adding a schedule to the datatype. The original datatype is datatype a Queue = Queue {F : a Stream, LenF : int, R : a Stream, LenR : int}

We add a new field S of type a Stream that represents a schedule for forcing the nodes of F’, S is some suffix of F’ such that all the nodes before S in F’ have already been forced and memoized. To force the next suspension in fF’, we simply inspect the first node of S.

Besides adding 5, we make two further changes to the datatype. First, to emphasize the fact that the nodes of # need not be scheduled, we change F from a stream to a list. This involves minor changes to rotate. Second, we eliminate the length fields. As we will see shortly, we no longer need the length fields to determine when ? becomes longer than /’ instead, we will obtain this information from the schedule. The new datatype is thus

datatype «a Queue = Queue of {F : a stream, R : a list, S : a stream}

Now, the major queue functions are simply

4.2 Real-Time Queues 43

structure RealTimeQueue : QUEUE = struct datatype a Queue = Queue of {F : a stream, R : a list, S : a stream} Invariant: |S| = |F| |R| *) exception EMPTY val empty = Queue {F = $Nil, R =[], S = $Nil} fun isEmpty (Queue {F=/f,...}) =null f

fun rotate ({, r, a) = $case (f, r) of ($Nil, $Cons (y, -)) = Cons (y, @) | ($Cons (a, f’), $Cons (y, 7’)) = Cons (x, rotate (f’, r’, $Cons (y, «)))

fun queue {F =f, R= r, S = $Cons (z, s)} = Queue {F=/,R=r,S=s} | queue {F =f, R=r, S = $Nil} = let val f’ = rotate (f, r, $Nil) in Queue {F=/f’,R=[], S=/’} end

fun snoc (Queue {F= f/f, R=7r,S=s}, x) =queue {F=f,R=2::r,S=s}

fun head (Queue {F = $Nil, ... }) = raise EMPTY | head (Queue {F = $Cons (z, f),...}) = 2 fun tail (Queue {F = $Nil, ... }) = raise EMPTY | tail (Queue {F = $Cons (x, f), R= r, S=s})=queue {F=f,R=r,S=s} end

Figure 4.1: Real-time queues based on scheduling [Oka95c].

fun snoc (Queue {F=/,R=r,S=s}, 7)=queue {F=/,R=2::r,S=s} fun head (Queue {F = $Cons (z, f),...}) =2 fun tail (Queue {F = $Cons (z, f), R=r, S=s})=queue {F=/,R=r,S=s}

The pseudo-constructor queue maintains the invariant that |.S| = | F'| | R| (which incidentally guarantees that |F'| > |R| since |S| cannot be negative). snoc increases || by one and tail decreases |F'| by one, so when quewe is called, |S| = |F'| |R| + 1. If S is non-empty, then we restore the invariant by simply taking the tail of S. If S is empty, then 2 is one longer than F’, so we rotate the queue. In either case, inspecting S to determine whether or not it is empty forces and memoizes the next suspension in the schedule.

fun queue {F = f, R=r, S = $Cons (z, s)} = Queue {F=f,R=r,S=s} | queue {F=f, R=r, S = $Nil} = let val f’ = rotate (f, r, $Nil) in Queue {F=/f’,R=[],S=/’} end

The complete code for this implementation appears in Figure 4.1.

44 Eliminating Amortization

In the amortized analysis, the unshared cost of every queue operation is O(1). Therefore, every queue operation does only O(1) work outside of forcing suspensions. Hence, to show that all queue operations run in O(1) worst-case time, we must prove that no suspension takes more than O(1) time to execute.

Only three forms of suspensions are created by the various queue functions.

e $Nil is created by empty and queue (in the initial call to rotate). This suspension is trivial and therefore executes in O(1) time regardless of whether it has been forced and memoized previously.

$Cons (y, a) is created in the second line of rotate and is also trivial.

Every call to rotate immediately creates a suspension of the form

$case (f/, 7, a) of ($Nil, [y], 2) = Cons (y, a) | ($Cons (z, f’), ys: r’, a) = Cons (z, rotate (f’, r’, $Cons (y, a)))

The intrinsic cost of this suspension is O(1). However, it also forces the first node of f, creating the potential for a cascade of forces. But note that f is a suffix of the front stream that existed just before the previous rotation. The treatment of the schedule $ guarantees that every node in that stream was forced and memoized prior to the rotation. Forcing the first node of f simply looks up that memoized value in O(1) time. The above suspension therefore takes only O(1) time altogether.

Since every suspension executes in O(1) time, every queue operation takes only O(1) worst- case time.

Hint to Practitioners: These queues are not particularly fast when used ephemerally, because of overheads associated with memoizing values that are never looked at again, but are the fastest

known real-time implementation when used persistently.

4.3 Bottom-Up Mergesort with Sharing

As a second example, we modify the sortable collections from Section 3.5.2 to support add in O(log n) worst-case time and sort in O(n) worst-case time.

The only use of lazy evaluation in the amortized implementation is the suspended call to addSeg in add. This suspension is clearly monolithic, so the first task is to perform this

4.3 Bottom-Up Mergesort with Sharing 45

computation incrementally. In fact, we need only make merge incremental; since addSeg takes only O(log n) steps, we can afford to execute it strictly. We therefore represent segments as streams rather than lists, and eliminate the suspension on the collection of segments. The new type for the Segments field is thus a Stream list rather than a list list susp.

Rewriting merge, add, and sort to use this new type is straightforward, except that sort must convert the final sorted stream back to a list. This is accomplished by the stream ToList conversion function.

fun streamToList ($Nil) = [ ] | streamToList ($Cons (x, xs)) = x :: streamToList zs

The new version of merge, shown in Figure 4.2, performs one step of the merge at a time, with an O(1) intrinsic cost per step. Our second goal is to execute enough merge steps per add to guarantee that any sortable collection contains only O(n) unevaluated suspensions. Then sort executes at most O(n) unevaluated suspensions in addition to its own O(n) work. Executing these unevaluated suspensions takes at most O(n) time, so sort takes only O(n) time altogether.

In the amortized analysis, the amortized cost of add was approximately 2’, where B’ is the number of one bits in n’ = n + 1. This suggests that add should execute two suspensions per one bit, or equivalently, two suspensions per segment. We maintain a separate schedule for each segment. Each schedule is an a Stream list containing the partial results of the merge sequence that created this segment. The complete type is therefore

type a Schedule = a Stream list type a Sortable = {Less : a x a bool, Size : int, Segments : (a Stream x a Schedule) list}

To execute one merge step from a schedule, we call the function ezec/.

fun execl []=[] | execl (($Nil) :: sched) = execl sched | execl (($Cons (2, rs)) :: sched) = xs :: sched

In the second clause, we reach the end of one stream and execute the first step of the next stream. This cannot loop because only the first stream in a schedule can ever be empty. The function erec2PerSeg invokes exec! twice per segment.

fun exec2PerSeg [] = [] | exec2PerSeg ((xs, sched) :: segs) = (as, execl (exec! sched)) :: exec2PerSeg segs

Now, add calls exec2PerSeqg, but it is also responsible for building the schedule for the new segment. If the lowest & bits of n are one, then adding a new element will trigger & merges, of the form

((so M81) Ms) M-+- XM sy

46 Eliminating Amortization

where so is the new singleton segment and s, ...s,; are the first k segments of the existing col- lection. The partial results of this computation are sj ...s;, where sj = So and s; = s/_, M s;. Since the suspensions in s, depend on the suspensions in s;,_,, we must schedule the execution of s’_, before the execution of s’. The suspensions in s; also depend on the suspensions in s;, but we guarantee that s; ...s;, have been completely evaluated at the time of the call to add.

The final version of add, that creates the new schedule and executes two suspensions per segment, is

fun add (x, {Less = less, Size = size, Segments = segs }) = let fun addSeg (xs, segs, size, rsched) = if size mod 2 = 0 then (s, rev (ws :: rsched)) :: segs else let val ((2s’, [ ]) :: segs’) = segs in addSeg (merge /ess (xs, xs’), segs’, size div 2, xs :: rsched) val segs’ = addSeg ($Cons (x, $Nil), segs, szze, [ ]) in {Less = less, Size = size+1, Segments = exec2PerSeg segs’ } end

The accumulating parameter rsched collects the newly merged streams in reverse order. There- fore, we reverse it back to the correct order on the last step. The pattern match in line 4 asserts that the old schedule for that segment is empty, i.e., that it has already been completely exe- cuted. We will see shortly why this true.

The complete code for this implementation is shown in Figure 4.2. add has an unshared cost of O(log n) and sort has an unshared cost of O(n), so to prove the desired worst-case bounds, we must show that the O(log n) suspensions forced by add take O(1) time each, and that the O(n) unevaluated suspensions forced by sort take O(n) time altogether.

Every merge step forced by add (through exec2PerSeg and exec/ ) depends on two other streams. If the current step is part of the stream s’, then it depends on the streams s/_, and s;. The stream s’_, was scheduled before s‘, so s;_, has been completely evaluated by the time we begin evaluating s’. Furthermore, s; was completely evaluated before the add that created s'. Since the intrinsic cost of each merge step is O(1), and the suspensions forced by each step have already been forced and memoized, every merge step forced by add takes only O(1) worst-case time.

The following lemma establishes both that any segment involved in a merge by addSeg has been completely evaluated and that the collection as a whole contains at most O(n) unevaluated suspensions.

Lemma 4.1 Jn any sortable collection of size n, the schedule for a segment of size m = # contains a total of at most 2m 2(n mod m + 1) elements.

Proof: Consider a sortable collection of size n, where the lowest & bits of n are ones (i.e., n can be written c2*t! + (2* 1), for some integer c). Then add produces a new segment of size

4.3 Bottom-Up Mergesort with Sharing 47

structure ScheduledBottomUpMergeSort : SORTABLE = struct type a Schedule = a Stream list type a Sortable = {Less : a x a bool, Size : int, Segments : (a Stream x a Schedule) list}

fun merge less (xs, ys) = let fun mrg ($Nil, ys) = ys | mrg (xs, $Nil) = zs | mrg (as as $Cons (x, xs’), ys as $Cons (y, ys’)) = if less (x, y) then $Cons (x, mrg (xs’, ys)) else $Cons (y, mrg (as, ys’)) in mrg (zs, ys) end

fun execl []=[] | execl (($Nil) :: sched) = execl sched | execl (($Cons (x, xs)) :: sched) = xs :: sched fun exec2PerSeg [] =[ ] | exec2PerSeg ((xs, sched) :: segs) = (as, execl (execl sched)) :: exec2PerSeg segs

fun new {Less = less} = {Less = less, Size = 0, Segments = [ ]} fun add (x, {Less = less, Size = size, Segments = segs}) = let fun addSeg (ws, segs, size, rsched) = if size mod 2 = 0 then (zs, rev (vs :: rsched)) :: segs else let val ((xs’, [ ]) :: segs’) = segs in addSeg (merge less (xs, xs’), segs’, size div 2, xs :: rsched) val segs’ = addSeg ($Cons (x, $Nil), segs, size, [ ]) in {Less = less, Size = size+1, Segments = exec2PerSeg segs’ } end fun sort {Less = less, Segments = segs, ... } = let fun mergeAll (xs, [ ]) = xs | mergeAll (xs, (as’, sched) :: segs) = mergeAll (merge less (xs, xs’), segs) fun streamToList ($Nil) = [ ] | streamToList ($Cons (2, xs)) = x :: streamToList 2s in streamToList (mergeAll ($Nil, segs)) end end

Figure 4.2: Scheduled bottom-up mergesort.

m = 2", whose schedule contains streams of sizes 1, 2, 4,... , 2°. The total size of this schedule is 2'+1 1 = 2m-—1. After executing two steps, the size of the schedule is 2m 3. The size of the new collectionis n’ = n+1 = c2**!+2*. Since 2m—3 < 2m—2(n' mod m+1) = 2m—2, the lemma holds for this segment.

Every segment of size m’ larger than m is unaffected by the add, except for the execution

48 Eliminating Amortization

of two steps from the segment’s schedule. The size of the new schedule is bounded by

2m! 2(n mod m’ + 1) 2 = 2m" 2(n' mod m’ + 1),

so the lemma holds for these segments as well.

Now, whenever the & lowest bits of n are ones (i.e., whenever the next add will merge the first k segments), we know by Lemma 4.1 that, for any segment of size m = 2’, wherez < k, the number of elements in that segment’s schedule is at most

2m 2(n mod m+ 1) = 2m 2((m 1) 4+ 1) =0

In other words, that segment has been completely evaluated.

Finally, the combined schedules for all segments comprise at most

25-d,(2' (n mod yee 1)) =2n - 25° di(n mod + 1)

i=0 i=0

elements, where 0; is the zth bit of n. Note the similarity to the potential function from the physicist’s analysis in Section 3.5.2. Since this total is bounded by 2n, the collection as a whole contains only O(n) unevaluated suspensions, and therefore sort takes only O(n) worst- case time.

4.4 Related Work

Eliminating Amortization Dietz and Raman [DR91, DR93, Ram92] have devised a frame- work for eliminating amortization based on pebble games, where the derived worst-case algo- rithms correspond to winning strategies in some game. Others have used ad hoc techniques similar to scheduling to eliminate amortization from specific data structures such as relaxed heaps [DGST88] and implicit binomial queues [CMP88]. The form of scheduling described here was first applied to queues in [Oka95c] and later generalized in [Oka96b].

Queues The queue implementation in Section 4.2 first appeared in [Oka95c]. Hood and Melville [HM81] presented the first purely functional implementation of real-time queues, based on a technique known as global rebuilding [Ove83], which will be discussed further in the next chapter. Their implementation does not use lazy evaluation and is more complicated than ours.

Chapter 5

Lazy Rebuilding

The next four chapters describe general techniques for designing functional data structures. We begin in this chapter with lazy rebuilding, a variant of global rebuilding [Ove83].

5.1 Batched Rebuilding

Many data structures obey balance invariants that guarantee efficient access. The canonical ex- ample is balanced binary search trees, which improve the worst-case running time of many tree operations from the O(n) required by unbalanced trees to O(log n). One approach to main- taining a balance invariant is to rebalance the structure after every update. For most balanced structures, there is a notion of perfect balance, which is a configuration that minimizes the cost of subsequent operations. However, since it is usually too expensive to restore perfect balance after every update, most implementations settle for approximations of perfect balance that are at most a constant factor slower. Examples of this approach include AVL trees [AVL62] and red-black trees [GS78].

However, provided no update disturbs the balance too drastically, an attractive alternative is to postpone rebalancing until after a sequence of updates, and then to rebalance the entire structure, restoring it to perfect balance. We call this approach batched rebuilding. Batched rebuilding yields good amortized time bounds provided that (1) the data structure is not rebuilt too often, and (2) individual updates do not excessively degrade the performance of later op- erations. More precisely, condition (1) states that, if one hopes to achieve a bound of O( f(n)) amortized time per operation, and the global transformation requires O(g(n)) time, then the global transformation cannot be executed any more frequently than every c - g(n)/f(n) oper- ations, for some constant c. For example, consider binary search trees. Rebuilding a tree to perfect balance takes O(n) time, so if one wants each operation to take O(log n) amortized

50 Lazy Rebuilding

time, then the data structure must not be rebuilt more often than every c- n/ log n operations, for some constant c.

Assume that a data structure is to be rebuilt every c- g(n)/f(n) operations, and that an individual operation on a newly rebuilt data structure requires O( f(n)) time (worst-case or amortized). Then, condition (2) states that, after up to c- g(n)/f(n) updates to a newly rebuilt data structure, individual operations must still take only O( f()) time (e., the cost of an indi- vidual operation can only degrade by a constant factor). Update functions satisfying condition (2) are called weak updates.

For example, consider the following approach to implementing a delete function on binary search trees. Instead of physically removing the specified node from the tree, leave it in the tree but mark it as deleted. Then, whenever half the nodes in the tree have been deleted, make a global pass removing the deleted nodes and restoring the tree to perfect balance. Does this approach satisfy both conditions, assuming we want deletions to take O(log n) amortized time?

Suppose a tree contains n nodes, up to half of which are marked as deleted. Then removing the deleted nodes and restoring the tree to perfect balance takes O(n) time. We execute the transformation only every sn delete operations, so condition (1) is satisfied. In fact, condition (1) would allow us to rebuild the data structure even more often, as often as every c-n/ log n operations. The naive delete algorithm finds the desired node and marks it as deleted. This takes O(log n) time, even if up to half the nodes have been marked as deleted, so condition (2) is satisfied. Note that even if half the nodes in the tree are marked as deleted, the average depth per active node is only about one greater than it would be if the deleted nodes had been physically removed. This degrades each operation by only a constant additive factor, whereas condition (2) allows for each operation to be degraded by a constant multiplicative factor. Hence, condition (2) would allow us to rebuild the data structure even less often.

In the above discussion, we described only deletions, but of course binary search trees typically support insertions as well. Unfortunately, insertions are not weak because they can create a deep path very quickly. However, a hybrid approach is possible, in which insertions are handled by local rebalancing after every update, as in AVL trees or red-black trees, but deletions are handled via batched rebuilding.

As a second example of batched rebuilding, consider the batched queues of Section 3.1.1. The global rebuilding transformation reverses the rear list into the front list, restoring the queue to a state of perfect balance in which every element is contained in the front list. As we have already seen, batched queues have good amortized efficiency, but only when used ephemerally. Under persistent usage, the amortized bounds degrade to the cost of the rebuilding transforma- tion because it is possible to trigger the transformation arbitrarily often. In fact, this is true for all data structures based on batched rebuilding.

5.2 Global Rebuilding 51

5.2. Global Rebuilding

Overmars [Ove83] developed a technique for eliminating the amortization from batched re- building. He called this technique global rebuilding. The basic idea is to execute the rebuilding transformation incrementally, performing a few steps per normal operation. This can be use- fully viewed as running the rebuilding transformation as a coroutine. The tricky part of global rebuilding is that the coroutine must be started early enough that it can finish by the time the rebuilt structure is needed.

Concretely, global rebuilding is accomplished by maintaining two copies of each object. The primary, or working, copy is the ordinary structure. The secondary copy is the one that is gradually being rebuilt. All queries and updates operate on the working copy. When the secondary copy is completed, it becomes the new working copy and the old working copy is discarded. A new secondary copy might be started immediately, or the object may carry on for a while without a secondary structure, before eventually starting the next rebuilding phase.

There is a further complication to handle updates that occur while the secondary copy is being rebuilt. The working copy will be updated in the normal fashion, but the secondary copy must be updated as well or the effect of the update will be lost when the secondary copy takes over. However, the secondary copy will not in general be represented in a form that can be efficiently updated. Thus, these updates to the secondary copy are buffered and executed, a few at a time, after the secondary copy has been rebuilt, but before it takes over as the working copy.

Global rebuilding can be implemented purely functionally, and has been several times. For example, the real-time queues of Hood and Melville [HM81] are based on this technique. Unlike batched rebuilding, global rebuilding has no problems with persistence. Since no one operation is particularly expensive, arbitrarily repeating operations has no effect on the time bounds. Unfortunately, global rebuilding is often quite complicated. In particular, representing the secondary copy, which amounts to capturing the intermediate state of a coroutine, can be quite messy.

5.3. Lazy Rebuilding

The implementation of queues in Section 3.5.1, based on the physicist’s method, is closely related to global rebuilding, but there is an important difference. As in global rebuilding, this implementation keeps two copies of the front list, the working copy W and the secondary copy F’, with all queries being answered by the working copy. Updates to F (i.e., tail operations) are buffered, to be executed during the next rotation, by writing

... F=$tl (force f)...

52 Lazy Rebuilding

In addition, this implementation takes care to start (or at least set up) the rotation long before its result is needed. However, unlike global rebuilding, this implementation does not execute the rebuilding transformation (i.e., the rotation) concurrently with the normal operations; rather, it pays for the rebuilding transformation concurrently with the normal operations, but then exe- cutes the transformation all at once at some point after it has been paid for. In essence, we have replaced the complications of explicitly or implicitly coroutining the rebuilding transformation with the simpler mechanism of lazy evaluation. We call this variant of global rebuilding lazy rebuilding.

The implementation of queues in Section 3.4.2, based on the banker’s method, reveals a further simplification possible under lazy rebuilding. By incorporating nested suspensions into the basic data structure for instance, by using streams instead of lists we can often eliminate the distinction between the working copy and the secondary copy and employ a single structure that combines aspects of both. The “working” portion of that structure is the part that has already been paid for, and the “secondary” portion is the part that has not yet been paid for.

Global rebuilding has two advantages over batched rebuilding: it is suitable for implement- ing persistent data structures and it yields worst-case bounds rather than amortized bounds. Lazy rebuilding shares the first advantage, but, at least in its simplest form, yields amortized bounds. However, if desired, worst-case bounds can often be recovered using the schedul- ing techniques of Chapter 4. For example, the real-time queues in Section 4.2 combine lazy rebuilding with scheduling to achieve worst-case bounds. In fact, when lazy rebuilding is combined with scheduling, it can be viewed as an instance of global rebuilding in which the coroutines are reified in a particularly simple way using lazy evaluation.

5.4 Double-Ended Queues

As further examples of lazy rebuilding, we next present several implementations of double- ended queues, also known as deques. Deques differ from FIFO queues in that elements can be both inserted and deleted from either end of the queue. A signature for deques appears in Figure 5.1. This signature extends the signature for queues with three new functions: cons (in- sert an element at the front), /ast (return the rearmost element), and init (remove the rearmost element).

Remark: Note that the signature for queues is a strict subset of the signature for deques the same names have been chosen for the types, exceptions, and overlapping functions. Because deques are thus a strict extension of queues, Standard ML will allow us to use a deque module wherever a queue module is expected. ©

5.4 Double-Ended Queues 53

signature DEQUE = sig type a Queue

exception EMPTY

valempty : a Queue val isEmpty : a Queue bool

(* insert, inspect, and remove the front element *)

val cons > @ X a Queue a Queue

val head : @ Queue > a (* raises EMPTY if queue is empty *) val tail : @ Queue a Queue (* raises EMPTY if queue is empty *)

(* insert, inspect, and remove the rear element *)

val snoc : @ Queue X a a Queue

val last : @ Queue > a (* raises EMPTY if queue is empty *)

val init : a Queue + a Queue (* raises EMPTY if queue is empty *) end

Figure 5.1: Signature for double-ended queues.

5.4.1 Output-restricted Deques

First, note that extending the queue implementations from Chapters 3 and 4 to support cons, in addition to snoc, is trivial. A queue that supports insertions at both ends, but deletions from only one end, is called an output-restricted deque.

For example, we can implement cons for the banker’s queues of Section 3.4.2 as follows:

fun cons (x, Queue {F=/, LenF = lenF’, R=, LenR = lenk}) = Queue {F = $Cons (z, f), LenF = lenF'+1, R = r, LenR = lenR}

Note that we invoke the true constructor Quewe rather than the pseudo-constructor queue be- cause adding an element to /’ cannot possibly make F’ shorter than R.

Similarly, we can easily extend the real-time queues of Section 4.2.

fun cons (x, Queue {F=/,R=r,S=s})= Queue {F = $Cons (z, f), R= r, S = $Cons (z, s)})

We add «x to S only to maintain the invariant that |S| = |F'| |R|. Again, we invoke the true constructor Queue rather than the pseudo-constructor quewe.

54 Lazy Rebuilding

5.4.2 Banker’s Deques

Deques can be represented in essentially the same way as queues, as two streams (or lists) f’ and f, plus some associated information to help maintain balance. For queues, the notion of perfect balance is for all the elements to be in the front stream. For deques, the notion of perfect balance is for the elements to be evenly divided between the front and rear streams. Since we cannot afford to restore perfect balance after every operation, we will settle for guaranteeing that neither stream is more than about c times longer than the other, for some constant c > 1. Specifically, we maintain the following balance invariant:

|F|<e/RJ+1 A |R| <clF|41

The “+1” in each term allows for the only element of a singleton queue to be stored in either stream. Note that both streams will be non-empty whenever the queue contains at least two elements. Whenever the invariant would otherwise be violated, we restore the queue to perfect balance by transferring elements from the longer stream to the shorter stream until both streams have the same length.

Using these ideas, we can adapt either the banker’s queues of Section 3.4.2 or the physicist’s queues of Section 3.5.1 to obtain deques that support every operation in O(1) amortized time. Because the banker’s queues are slightly simpler, we choose to begin with that implementation.

The type of double-ended queues is precisely the same as for ordinary queues. datatype a Queue = Queue {F : a Stream, LenF : int, R : a Stream, LenR : int} The functions on the front element are defined as follows:

fun cons (Queue {F = f, LenF = len’, R =r, LenR = lenR}, x) = queue {F = $Cons (z, f), LenF = lenF'+1, R = r, LenR = lenR} fun head (Queue {F = $Nil, R = $Cons (z, _),...} =z | head (Queue {F = $Cons (z, f),...}) =z fun tail (Queue {F = $Nil, R = $Cons (z, _), ... } = empty | tail (Queue {F = $Cons (2, f), LenF = lenf', R= r, LenR = lenk}) = queue {F = f, LenF = /enF'—1, R= r, LenR = lenR}

The first clauses of head and tail handle singleton queues where the single element is stored in the rear stream. The functions on the rear element snoc, last, and init are defined symmetrically on # rather than fF’.

The interesting portion of this implementation is the pseudo-constructor queue, which re- stores the queue to perfect balance when one stream becomes too long by first truncating the longer stream to half the combined length of both streams and then transferring the re- maining elements of the longer stream onto the back of the shorter stream. For example, if

5.4 Double-Ended Queues 55

|F'| > c|R| +1, then queue replaces F with take (i, F) and R with R + reverse (drop (i, F)), where 7 = |(|F| + |R|)/2]. The full definition of queue is

fun queue (q as {F =f, LenF = lenF’, R= r, LenR = lenR}) = if lenF > celenR + 1 then let val 7 = (lenF + lenR) div 2 val 7 = lenF + lenR i val f’ = take (7, f) val r’ = r + reverse (drop (7, f)) in Queue {F =f’, LenF =i, R=7’, LenR = j} end else if /enR > cxlenF +1 then let val 7 = (lenF + lenR) div 2 val 7 = lenF + lenR 1% val f’ = f + reverse (drop (j, )) val r’ = take (, r) in Queue {F =f’, LenF = 7, R= 7’, LenR = j } end else Queue ¢

The complete implementation appears in Figure 5.2.

Remark: Because of the symmetry of this implementation, we can reverse a deque in O(1) time by simply swapping the roles of / and R.

fun reverse (Queue {F = f, LenF = len’, R= r, LenR = lenk}) = Queue {F= r, LenF = lenk, R= f, LenR = lenF'}

Many other implementations of deques share this property [Hoo92b, CG93]. Rather than es- sentially duplicating the code for the functions on the front element and the functions on the rear element, we could define the functions on the rear element in terms of reverse and the corresponding functions on the front element. For example, we could implement znzt as

fun init ¢ = reverse (tail (reverse ¢)) Of course, init will be slightly faster if implemented directly. e

To analyze these deques, we again turn to the banker’s method. For both the front and rear streams, let d() be the number of debits on element of the stream, and let D(z) = Y7'_5 d(J). We maintain the debit invariants that, for both the front and rear streams,

D(i) < min(ct + t,es + 1-1)

where s = min(|F'|,|R]) and ¢ = max(|F'|,|R|). Since D(0) = 0 for both streams, we can always access the first and last elements of the queue via head or last.

Theorem 5.1 cons and tail (symmetrically, snoc and init) maintain the debit invariants on both the front and rear streams by discharging at most | and c + 1 debits per stream, respec- tively.

56 Lazy Rebuilding

functor BankersDeque (val c : int) : DEQUE = (* c>T1 ¥) struct datatype a Queue = Queue {F : a Stream, LenF : int, R : a Stream, LenR : int} Invariants: |F| < c|R| + 1, |R| < c|F] + 1, LenF = |F|, LenR = |R| +) exception EMPTY

val empty = Queue {F = $Nil, LenF = 0, R = $Nil, LenR = 0} fun isEmpty (Queue {LenF = /enF’, LenR = lenR, ...}) = (lenF'+lenR = 0)

fun queue (q as {F = f, LenF= lenF’, R=r, LenR = lenR}) = if lenF > cxlenR + 1 then let val 7 = (lenF' + lenR) div 2 val j = len + lenR i val f’ = take (7, f) val r’ = r + reverse (drop (7, f')) in Queue {F = f’, LenF = 7, R= r’, LenR = j} end else if lenR > cxlenF + 1 then let val 1 = (lenF' + lenR) div 2 val j = lenF + lenR i val {’ = f ++ reverse (drop (j, r)) val r’ = take (j, r) in Queue {F = f’, LenF = 7, R= r’, LenR = j} end else Queue ¢

fun cons (Queue {F = f, LenF = lenF’, R= r, LenR = lenR}, x) = queue {F = $Cons (x, f), LenF = lenF'+1, R = r, LenR = lenR} fun head (Queue {F = $Nil, R = $Nil, ... }) = raise EMPTY | head (Queue {F = $Nil, R = $Cons (x, _),...} =2 | head (Queue {F = $Cons (z, f),...}) =2 fun tail (Queue {F = $Nil, R = $Nil, ... }) = raise EMPTY | tail (Queue {F = $Nil, R = $Cons (a, _), ... } = empty | tail (Queue {F = $Cons (2, f), LenF = len’, R =r, LenR = lenR}) = queue {F = f, LenF = lenF—1, R=r, LenR = lenR}

... snoc, last, and init defined symmetrically. .. end

Figure 5.2: An implementation of deques based on lazy rebuilding and the banker’s method.

5.4 Double-Ended Queues 57

Proof: Similar to the proof of Theorem 3.1 on page 27.

By inspection, every operation has an O(1) unshared cost, and by Theorem 5.1, every oper- ation discharges no more than O(1) debits. Therefore, every operation runs in O(1) amortized time.

5.4.3. Real-Time Deques

Real-time deques support every operation in O(1) worst-case time. We obtain real-time deques from the deques of the previous section by scheduling both the front and rear streams.

As always, the first step in applying the scheduling technique is to convert all monolithic functions to incremental functions. In the previous implementation, the rebuilding transfor- mation rebuilt F' and F# as take (1, F) and R + reverse (drop (i, F)) (or vice versa). take and + are already incremental, but reverse and drop are monolithic. We therefore rewrite R + reverse (drop (i, F')) as rotateDrop (R, i, F) where rotateDrop performs c steps of the drop for every step of the + and eventually calls rotateRev, which in turn performs c steps of the reverse for every remaining step of the +. rotateDrop can be implemented as

fun rotateDrop (r, i, f) = if 7 < c then rotateRev (r, drop (7, f), $Nil) else let val ($Cons (z, r’)) = r in $Cons (x, rotateDrop (r’, i c, drop (c, f))) end

Initially, | f| = clr| + 1+ where 1 < k < c. Every call to rotateDrop drops c elements of f and processes one element of , except the last, which drops 7 mod ¢ elements of f and leaves r unchanged. Therefore, at the time of the first call to rotateRev, |f| = clr|+1+k—(¢ mod c). It will be convenient to insist that |f| > c|r|, so we require that 1 + k (2 mod c) > 0. This is guaranteed only if ¢ is two or three, so these are the only values of ¢ that we allow. Then we can implement rotate Rev as

fun rotateRev ($Nil, f, a) = reverse f + a | rotateRev ($Cons (2, r), f, a) = $Cons (x, rotateRev (r, drop (c, f), reverse (take (c, f)) + a))

Note that rotateDrop and rotateRev make frequent calls to drop and reverse, which were exactly the functions we were trying to eliminate. However, now drop and reverse are always called with arguments of bounded size, and therefore execute in O(1) steps.

Once we have converted the monolithic functions to incremental functions, the next step is to schedule the execution of the suspensions in #’ and #. We maintain a separate schedule for each stream and execute a few suspensions per operation from each schedule. As with the real- time queues of Section 4.2, the goal is to ensure that both schedules are completely evaluated

58 Lazy Rebuilding

before the next rotation. Assume that both streams have length m immediately after a rotation. How soon can the next rotation occur? It will occur soonest if all the insertions occur on one end and all the deletions occur on the other end. If z is the number of insertions and d is the number of deletions, then the next rotation will occur when

m+i>cm-—d)4+1

Rewriting both sides yields t+ed>m(ce-1)4+1

The next rotation will occur sooner for c = 2 than for c = 3, so substitute 2 for c. 1+2d>m4+1

Therefore, executing one suspension per stream per insertion and two suspensions per stream per deletion is enough to guarantee that both schedules are completely evaluated before the next rotation.

The complete implementation appears in Figure 5.3.

5.5 Related Work

Global Rebuilding Overmars introduced global rebuilding in [Ove83]. It has since been used in many situations, including real-time queues [HM81], real-time deques [Hoo082, GT86, Sar86, CG93], catenable deques [BT95], and the order maintenance problem [DS87].

Deques Hood [Ho0082] first modified the real-time queues of [HM81] to obtain real-time deques based on global rebuilding. Several other researchers later duplicated this work [GT86, Sar86, CG93]. These implementations are all similar to techniques used to simulate multihead Turing machines [Sto70, FMR72, LS81]. Hoogerwoord [Ho092b] proposed amortized deques based on batched rebuilding, but, as always with batched rebuilding, his implementation is not efficient when used persistently. The real-time deques in Figure 5.3 first appeared in [Oka95c].

Coroutines and Lazy Evaluation Streams (and other lazy data structures) have frequently been used to implement a form of coroutining between the producer of a stream and the con- sumer of a stream. Landin [Lan65] first pointed out this connection between streams and coroutines. See [Hug89] for some compelling applications of this feature.

5.5 Related Work

functor RealTimeDeque (val c : int) : DEQUE = (* c=2orc=3 *) struct datatype a Queue = Queue {F : a Stream, LenF : int, SF : a Stream, R: a Stream, LenR : int, SR : a Stream} Invariants: |F| < c|R| + 1, |R| < c|F] + 1, LenF = |F|, LenR = |R| +)

exception EMPTY

val empty = Queue {F = $Nil, LenF = 0, SF = $Nil, R = $Nil, LenR = 0, SR = $Nil} fun isEmpty (Queue {LenF = /enF’, LenR = lenR, ...}) = (lenF'+lenR = 0)

fun exec! ($Cons (z, s)) =s | execl s=s fun exec2 s = exec! (execl s)

fun rotateRev ($Nil, /, a) = reverse f + a | rotateRev ($Cons (x, r), f, @) = $Cons (2, rotateRev (1, drop (c, f), reverse (take (c, f)) + a)) fun rotateDrop (7, 7, {) = if i < c then rotateRev (1, drop (i, f), $Nil) else let val ($Cons (x, r’)) = r in $Cons (x, rotateDrop (r’, 7 c, drop (c, f))) end

fun queue as {F = f, LenF = lenF’, SF = sf, R= r, LenR = lenR, SR = sr})= if lenF > c*lenR+ 1 then let val 7 = (lenF' + lenR) div 2 val j = len + lenR i val f’ = take (7, f) val r’ = rotateDrop (7, r, f) in Queue {F = f’, LenF= i, SF=/’, R=r’, LenR=j, SR=1"} end else if fenR > cxlenF + 1 then let val i = (lenF’ + lenR) div 2 val j = lenF + lenR i val {’ = rotateDrop (j, f, 7) val r’ = take (j, 1) in Queue {F= f’, LenF =i, SF=/f’, R=r’, LenR=j7, SR=r’} end else Queue ¢

fun cons (Queue {F = f, LenF = lenF', SF= sf, R=r, LenR= lenR, SR= sr}, x)= queue {F = $Cons (2, f), LenF = lenF'+1, SF = execl sf, R=r, LenR = lenR, SR =execl sr} fun head (Queue {F = $Nil, R = $Nil, ... }) = raise EMPTY | head (Queue {F = $Nil, R = $Cons (x, _),...} =2 | head (Queue {F = $Cons (z, f),...}) = 2 fun tail (Queue {F = $Nil, R = $Nil, ... }) = raise EMPTY | tail (Queue {F = $Nil, R = $Cons (a, _), ... } = empty | tail (Queue {F = $Cons (x, f), LenF = lenF’, SF= sf, R=r, LenR = lenR, SR = sr}) = queue {F = f, LenF = lenF'—1, SF = exec2 sf, R=r, LenR = lenR, SR = exec2 sr}

... snoc, last, and init defined symmetrically. .. end

Figure 5.3: Real-time deques via lazy rebuilding and scheduling.

59

60

Lazy Rebuilding

Chapter 6

Numerical Representations

Consider the usual representations of lists and natural numbers, along with several typical functions on each data type.

datatype a List = datatype Nat = Nil Zero | Cons of a x a List | Succ of Nat fun tail (Cons (x, xs)) = xs fun pred (Succ n) = n fun append (Nil, ys) = ys fun plus (Zero, n) =n | append (Cons (z, xs), ys) = | plus (Succ m, n) = Cons (x, append (xs, ys)) Succ (plus (m, 7))

Other than the fact that lists contain elements and natural numbers do not, these two imple- mentations are virtually identical. This suggests a strong analogy between representations of the number n and representations of container objects of size n. Functions on the container strongly resemble arithmetic functions on the number. For example, inserting an element re- sembles incrementing a number, deleting an element resembles decrementing a number, and combining two containers resembles adding two numbers. This analogy can be exploited to design new implementations of container abstractions simply choose a representation of nat- ural numbers with certain desired properties and define the functions on the container objects accordingly. Call an implementation designed in this fashion a numerical representation.

The typical representation of lists can be viewed as a numerical representation based on unary numbers. However, numerical representations based on binary numbers are also com- mon; the best known of these is the binomial queues of Vuillemin [Vui78]. Incrementing a unary number takes O(1) time, so inserting an element into a unary representation also usu- ally takes O(1) time. However, adding two unary numbers takes O(n) time, so combining two containers in a unary representation takes O(n) time. Binary numbers improve the time

62 Numerical Representations

required for addition (and hence the time required to combine two containers) to O(log n), but also slow the time required to increment a number or insert an element to O(logn). In this chapter, we consider several variations of binary numbers that achieve the best of both worlds by supporting the increment function in O(1) time and addition in O(log n) time. Numerical representations based on these variations naturally support inserting an element in O(1) time and combining two containers in O(log n) time.

Example abstractions for which numerical representations are particularly useful include random-access lists (also known as flexible arrays) and heaps (also known as priority queues).

6.1 Positional Number Systems

A positional number system [Knu73] is a notation for writing a number as a sequence of digits bo... 6m-1. The digit by is called the least significant digit and the digit 6,,_; is called the most significant digit. Except when writing ordinary, decimal numbers, we will always write sequences of digits from least significant to most significant.

Each digit b; has weight w;, so the value of the sequence bp. . . 6,1 1s as b;w;. For any given positional number system, the sequence of weights is fixed, as is the set of digits D; from which each 6; is chosen. For unary numbers, w; = 1 and D; = {1} for all 2, and for binary

numbers w; = 2' and D; = {0,1}. (By convention, we write all digits in typewriter font except for ordinary, decimal digits.) A number is said to be written in base B if w; = B' and D; = {0,...,B —1}. Usually, but not always, weights are increasing sequences of powers,

and the set D; is the same for every digit.

A number system is said to be redundant if there is more than one way to represent some numbers. For example, we can obtain a redundant system of binary numbers by taking w; = 2' and D; = {0,1,2}. Then the decimal number 13 can be written 1011, or 1201, or 122. If we allow trailing 0s, then almost all positional number systems are redundant, since f&. . . by,—1 is always equivalent to bo... 5,,_10. Therefore, we disallow trailing Os.

Computer representations of positional number systems can be dense or sparse. A dense representation is simply a list (or some other kind of sequence) of digits, including those digits that happen to be 0. A sparse representation, on the other hand, includes only non-zero digits. It must then include information on either the rank (i.e., the index) or the weight of each digit. For example, Figure 6.1 shows two different representations of binary numbers in Standard ML— one dense and one sparse along with several representative functions on each.

6.1 Positional Number Systems

structure Dense =

struct datatype Digit = Zero | One type Nat = Digit list (* increasing order of significance, no trailing Zeros *) fun inc [ ] = [One]

inc (Zero :: ds) = One :: ds

inc (One :: ds) = Zero ::inc ds = (* carry *)

fun dec [One] = [ ]

dec (One :: ds) = Zero :: ds

dec (Zero :: ds) =One:: dec ds_ borrow *)

fun add (ds, [ ]) = ds

add ([ ], ds) = ds

add (d :: ds,, Zero :: dsy) = d :: add (dsj, ds2)

add (Zero :: ds,, d:: dsy) = d :: add (dsj, ds2)

add (One :: ds;, One :: ds2) = Zero :: inc (add (ds,, ds2)) carry *)

structure SparseBy Weight = struct type Nat = int list (* increasing list of weights, each a power of two *)

(* add a new weight to a list, recurse if weight is already present *) fun carry (w, []) =[w] | carry (w, ws as w’ :: rest) =if w < w’ then w :: ws else carry (2*w, rest)

borrow from a digit of weight w, recurse if weight is not present *) fun borrow (w, ws as w’ :: rest) =if w = w’ then rest else w :: borrow (2«w, ws)

fun inc ws = carry (1, ws) fun dec ws = borrow (1, ws)

fun add (ws, [ ]) = ws | add ([], ws) = ws | add (m as wy, 3: ws1, NAS Wy 1: WS2) = if w, < wy» then w, :: add (ws,, 7) else if w. < w, then :: add (m, ws2) else carry (2* w,, add (ws ,, ws2)) end

Figure 6.1: Two implementations of binary numbers.

63

64 Numerical Representations

6.2 Binary Representations

Given a positional number system, we can implement a numerical representation based on that number system as a sequence of trees. The number and sizes of the trees representing a collection of size n are governed by the representation of n in the positional number system. For each weight w;, there are 5; trees of that size. For example, the binary representation of 73 is 1001001, so a collection of size 73 in a binary numerical representation would comprise three trees, of sizes 1, 8, and 64, respectively.

Trees in numerical representations typically exhibit a very regular structure. For example, in binary numerical representations, all trees have sizes that are powers of 2. Three com- mon kinds of trees that exhibit this structure are complete binary leaf trees [KD96], binomial trees [Vui78], and pennants [SS90].

Definition 6.1 (Complete binary leaf trees) A complete binary tree of rank 0 is a leaf and a complete binary tree of rank r > 0 is a node with two children, each of which is a complete binary tree of rank r |. A leaf tree is a tree that contains elements only at the leaves, unlike ordinary trees that contain elements at every node. A complete binary tree of rank r has 2”*! —1 nodes, but only 2” leaves. Hence, a complete binary leaf tree of rank r contains 2” elements.

Definition 6.2 (Binomial trees) A binomial tree of rank r is a node with r children c, ...c,, where c¢; is a binomial tree of rank r 7. Alternatively, a binomial tree of rank r > 0 isa binomial tree of rank r 1 to which another binomial tree of rank r 1 has been added as the leftmost child. From the second definition, it is easy to see that a binomial tree of rank r contains 2” nodes.

Definition 6.3 (Pennants) A pennant of rank 0 is a single node and a pennant of rank r > 0 is anode with a single child that is a complete binary tree of rank r 1. The complete binary tree contains 2” | elements, so the pennant contains 2” elements.

Figure 6.2 illustrates the three kinds of trees. Which kind of tree is superior for a given data structure depends on the properties the data structure must maintain, such as the order in which elements should be stored in the trees. A key factor in the suitability of a particular kind of tree for a given data structure is how easily the tree supports functions analogous to carries and borrows in binary arithmetic. When simulating a carry, we link two trees of rank r to form a tree of rank r + 1. Symmetrically, when simulating a borrow, we unlink a tree of rank r > 0 to obtain two trees of rank r |. Figure 6.3 illustrates the link operation (denoted ©) on each of the three kinds of trees. Assuming that elements are not rearranged, each of the three kinds of trees can be linked or unlinked in O(1) time.

We next describe two existing data structures in terms of this framework: the one-sided flex- ible arrays of Kaldewaij and Dielissen [KD96], and the binomial queues of Vuillemin [Vui78, Bro78].

6.2 Binary Representations 65

(a) (b) (c)

Figure 6.2: Three trees of rank 3: (a) a complete binary leaf tree, (b) a binomial tree, and (c) a pennant.

\A-AD LA A-A-Ad

Figure 6.3: Linking two trees of rank r to obtain a tree of rank r + 1 for (a) complete binary leaf trees, (b) binomial trees, and (c) pennants.

66 Numerical Representations

signature RANDOMACCESSLIST = sig

type a RList

exception EMPTY and INDEX

valempty : a RList val isEmpty : a RList > bool val cons : a xX @ RList > a RList

val head : a RList a (* raises EMPTY if list is empty *) val tail : a RList > a RList (* raises EMPTY if listis empty *)

val lookup : a RList x int a (* raises INDEX if out of bounds *) valupdate : a RList x int x a aRList (* raises INDEX ifout of bounds *) end

Figure 6.4: Signature for random-access lists.

6.2.1 Binary Random-Access Lists

A random-access list, also called a one-sided flexible array, is a data structure that supports array-like lookup and update functions, as well as the usual cons, head, and tail functions on lists. A signature for random-access lists is shown in Figure 6.4.

Kaldewaij and Dielissen [KD96] describe an implementation of random-access lists in terms of leftist left-perfect leaf trees. We can easily translate their implementation into the framework of numerical representations as a binary representation using complete binary leaf trees. A binary random-access list of size n thus contains a complete binary leaf tree for each 1 in the binary representation of n. The rank of each tree corresponds to the rank of the corre- sponding digit; if the 7th bit of n is 1, then the random-access list contains a tree of size 2°. For this example, we choose a dense representation, so the type of binary random-access lists is

datatype a Tree = Leaf of a | Node of int x a Tree x a Tree datatype «a Digit = Zero | One of a Tree type a RList = a Digit list

The integer in each node is the size of the tree. This number is redundant since the size of every tree is completely determined by the size of its parent or by its position in the list of digits, but we include it anyway for convenience. Trees are stored in increasing order of size, and the order of elements (both within and between trees) is left-to-right. Thus, the head of the random-access list is the leftmost leaf of the smallest tree. Figure 6.5 shows a binary random- access list of size 7. Note that the maximum number of trees in a list of size n is |log(n + 1) |

6.2 Binary Representations 67

ih fA

1 2 3.4 5 6

Figure 6.5: A binary random-access list containing the elements 0... 6.

and the maximum depth of any tree is | log n|.

Now, insertion into a binary random-access list (i.e., cons) is analogous to incrementing a binary number. Recall the increment function on dense binary numbers:

fun inc [ ] = [One] | inc (Zero :: ds) = One :: ds | inc (One :: ds) = Zero :: ine ds

To insert an element with cons, we first convert the element into a leaf, and then insert the leaf into the list of trees using a helper function ins Tree that follows the rules of znc.

fun cons (x, ts) = insTree (Leaf x, ts)

fun insTree (¢, [ ]) = [One ¢] | insTree (t, Zero :: ts) = One t :: ts | insTree (t,, One tz :: ts) = Zero :: insTree (link (4, f2), ts)

The link helper function is a pseudo-constructor for Node that automatically calculates the size of the new tree from the sizes of its children.

Deleting an element from a binary random-access list (using tail) is analogous to decre- menting a binary number. Recall the decrement function on dense binary numbers:

fun dec [One] = [ ] | dec (One :: ds) = Zero :: ds | dec (Zero :: ds) = One :: dec ds

Essentially, this function resets the first 1 to 0, while setting all the preceding Os to 1s. The analogous operation on lists of trees is borrow Tree. When applied to a list whose first digit has rank r, borrowTree returns a pair containing a tree of rank r, and the new list without that tree.

fun borrowTree [One 1] = (¢, [ ]) | borrowTree (One ¢ :: ts) = (t, Zero :: ts) | borrowTree (Zero :: ts) = let val (Node (_, t,, tz), ts’) = borrowTree ts in (¢,, One t :: ts’) end

68 Numerical Representations

The head and tail functions “borrow” the leftmost leaf using borrow Tree and then either return that leaf’s element or discard the leaf, respectively.

fun head ts = let val (Leaf x, _) = borrowTree ts in x end fun tail ¢s = let val (_, ts’) = borrowTree ts in ts’ end

The lookup and update functions do not have analogous arithmetic operations. Rather, they take advantage of the organization of binary random-access lists as logarithmic-length lists of logarithmic-depth trees. Looking up an element is a two-stage process. We first search the list for the correct tree, and then search the tree for the correct element. The helper function lookup Tree uses the size field in each node to determine whether the zth element is in the left subtree or the right subtree.

fun lookup (Zero :: ts, 7) = lookup (¢s, 7) | lookup (One ¢ :: ts, ¢) = if 1 < size ¢ then lookupTree (¢, 7) else lookup (ts, 7 size ¢)

fun lookupTree (Leaf x, 0) = | lookupTree (Node (w, t1, t2), i) = if i < w div 2 then lookupTree (¢,, 7) else lookupTree (f2, « w div 2)

update works in same way but also reconstructs the path from the root to the updated leaf. This reconstruction is called path copying [ST86a] and is necessary for persistence.

fun update (Zero :: ts, i, y) = Zero :: update (ts, 7, y) | update (One f :: ts, i, y) = if i < size ¢ then One (updateTree (¢, 7, y)) :: ts else One ¢ :: update (ts, 7 size t, y)

fun updateTree (Leaf x, 0, y) = Leaf y | updateTree (Node (w, t, t2), ¢, y) = if i < w div 2 then Node (w, updateTree (¢,, 7, y), t2) else Node (w, ¢,, updateTree (t2, « w div 2, y))

The complete code for this implementation is shown in Figure 6.6.

cons, head, and tail perform at most O(1) work per digit and so run in O(log n) worst- case time. lookup and update take at most O(log n) time to find the right tree, and then at most O(log n) time to find the right element in that tree, for a total of O(log n) worst-case time.

6.2.2 Binomial Heaps

Binomial queues [Vui78, Bro78] are a classical implementation of mergeable priority queues. To avoid confusion with FIFO queues, we will henceforth refer to priority queues as heaps and binomial queues as binomial heaps. Heaps support four main functions: inserting an element

6.2 Binary Representations 69

structure BinaryRandomAccessList : RANDOMACCESSLIST =

struct datatype a Tree = Leaf of a | Node of int x a Tree x a Tree (x int is size of tree *) datatype a Digit = Zero | One of a Tree type a RList = a Digit list

exception EMPTY and INDEX

val empty = [] fun isEmpty ts = null és

fun size (Leaf x) = 1

size (Node (w, i, f2))=w

fun link (¢,, t2) = Node (size t,+size ty, t, t)

fun insTree (¢, [ ]) = [One ¢]

insTree (7, Zero :: ts) = One ¢ :: ts

insTree (4, One é2 :: ts) = Zero :: insTree (link (71, f), ts)

fun borrowTree [ | = raise EMPTY

borrowTree [One ¢] = (¢, [ ])

borrowTree (One ¢ :: ts) = (#4, Zero :: ts)

borrowTree (Zero :: ts) = let val (Node (_, ¢1, #2), ts’) = borrowTree ts in (¢,, One fy :: ts’) end

fun cons (x, ts) = insTree (Leaf x, ts) fun head ts = let val (Leaf x, _) = borrowTree ts in x end fun tail ¢s = let val (_, ts’) = borrowTree ts in ts’ end

fun lookupTree (Leaf x, 0) =z lookupTree (Leaf x, 7) = raise INDEX lookupTree (Node (w, t, f2), 4) = if 7 < w div 2 then lookupTree (t,, 7) else lookupTree (, ¢ w div 2) fun updateTree (Leaf x, 0, y) = Leaf y updateTree (Leaf x, 7, y) = raise INDEX updateTree (Node (w, t, tf), 1, y) = if 7 < w div 2 then Node (w, updateTree (%, 7, y), h) else Node (w, ¢,, updateTree (2, 1 w div 2, y))

fun lookup ([ ], 7) = raise INDEX lookup (Zero :: ts, 2) = lookup (¢s, 7) lookup (One ¢ :: ts, 7) = if 7 < size ¢ then lookupTree (¢, 7) else lookup (ts, 7 size ¢) fun update ([], 7, y) = raise INDEX update (Zero :: ts, 7, y) = Zero :: update (¢s, 7, y) update (One ¢ :: ts, 7, y) = if 7 < size ¢ then One (updateTree (¢, 7, y)) :: ts else One ¢ :: update (ts, 7 size t, y)

end

Figure 6.6: Binary random-access lists.

70 Numerical Representations

signature ORDERED = sig type T (* type of ordered elements *) val leq: T x T— bool total ordering relation *) end

signature HEAP =

sig structure Elem : ORDERED type Heap exception EMPTY

val empty : Heap valisEmpty : Heap bool

val insert : Elem.T x Heap Heap val merge : Heap x Heap Heap

val findMin :Heap— Elem.T raises EMPTY if heap is empty *) val deleteMin : Heap + Heap (* raises EMPTY if heap is empty *) end

Figure 6.7: Signature for heaps.

(insert), merging two heaps (merge), finding the minimum element (findMin), and deleting the minimum element (deleteMin). A Standard ML signature for heaps appears Figure 6.7.

Remark: Heaps are similar to the sortable collections of Section 3.5.2, but use a different mechanism for specifying the desired comparison function. For sortable collections, the com- parison function is supplied when a new object is created, and every object can have a different comparison function. This approach is very flexible, but causes problems in the presence of an function that combines two objects, such as merge. If the two objects being merged have different comparison functions, which should the resulting object keep? To avoid this ambigu- ity, we fix the comparison function (and therefore the type of elements being compared) when the Standard ML structure implementing heaps is created. Then, we can be sure that any two objects being merged shared the same comparison function. ©

In the framework of numerical representations, binomial heaps are a binary representation with heap-ordered, binomial trees. A tree is heap-ordered if the element at every node is smaller than the elements at any of its children, with ties broken arbitrarily. As with binary random-access lists, binomial heaps contain one tree for each 1 in the binary representation of the size of the heap, and the trees have the same weights as their matching digits.

6.2 Binary Representations 71

Assuming an ORDERED structure lem that specifies the element type and comparison function, the types of binomial trees and binomial heaps can be written as follows:

datatype Tree = Node of int x Elem.T x Tree list type Heap = Tree list

This time, we have chosen a sparse representation, where the integer at each node is the rank of the tree. For reasons that will become clear later, we maintain the list of trees representing a heap in increasing order of rank, but maintain the list of trees representing the children of a node in decreasing order of rank.

Remark: The rank information on each node that is not a root is redundant since the ith child of a node of rank r always has rank r 1. However, we maintain this information anyway because doing so simplifies the code slightly. ©

The fundamental operation on binomial trees is /ink, which compares the roots of two trees of rank r and makes the tree with the larger root a child of the tree with the smaller root, producing a tree of rank r + 1.

fun link (ty as Node (r, L5 C1)s ty as Node (_, v2, €2)) = if Elem.leq (21, 22) then Node (r+1, 21, t2 :: c,) else Node (r+1, 2%, t, :: c2)

Since the children of a tree are maintained in decreasing order of rank, adding the new child to the list takes only O(1) time.

Now, inserting an element into a binomial heap is similar to the increment function on sparse binary numbers. Whenever we find two trees of the same rank, we link them and reinsert the linked tree into the list. This corresponds to a carry in binary arithmetic. We use the ins 7ree helper function to insert new trees into the list of trees; insert builds a new singleton tree and calls ins Tree.

fun insTree (¢, []) = [¢] | insTree (11, ts as t) :: rest) = if rank ¢, < rank ¢, then f, :: ts else insTree (link (4, t2), rest)

fun insert (x, ts) = insTree (Node (0, «x, [ ]), ts)

merge is similar to addition on sparse binary numbers, where again we link trees of equal rank whenever there is a carry.

fun merge (ts,, [ ]) = ts, | merge ([], ts2) = tsy | merge (¢; :: ts1, fg 1: tsg) = if rank ¢; < rank ¢, then ¢, :: merge (ts1, ty :: ts2) else if rank ¢ < rank ¢, then ¢, :: merge (t, :: ts1, ts) else insTree (link (¢;, f2), merge (¢s1, ts2))

72 Numerical Representations

Since every tree is heap-ordered, we know that the minimum element within any given tree is the root. However, we do not know which tree has the minimum root, so findMin scans all the roots in the heap.

fun findMin [t] = root ¢ | findMin :: ts) =let val x = root ¢ val y = findMin ts in if Elem.leq (x, y) then x else y end

Finally, deleteMin begins by removing the tree with the minimum root. (In the case of ties, we should take care to remove the tree with the same root as returned by findMin.) Once we have discarded the root of this tree, we are left with two lists of trees: one representing the children of the discarded tree, and one representing the remaining trees in the heap. To obtain a single heap, we simply merge these two lists, but since the lists are maintained in opposite orders, we first reverse the list of children.

fun deleteMin ts = let fun getMin [t] = (¢, []) | getMin :: ts) = let val (t’, ts’) = getMin ts in if Elem.leq (root t, root t’) then (¢, ts) else (¢’, ¢ :: ts’) end val (Node (_, «, ts), #s2) = getMin ts in merge (rev ts,, fs2) end

The complete implementation of binomial heaps appears in Figure 6.8. Since heaps contain no more than |log(n + 1)| trees, and binomial trees have no more than [log n| children, each of these functions takes O(log n) worst-case time.

6.3 Segmented Binary Numbers

We next explore two variations of binary numbers that allow a number to be incremented or decremented in O(1) worst-case time. Basing a numerical representation on these variations, rather than ordinary binary numbers, reduces the running time of many insertion and deletion functions from O(log n) to O(1). First, we present a somewhat complicated representation and sketch the design of random-access lists and heaps based on this representation. In the next section, we present a much simpler representation that is usually superior in practice.

The problem with ordinary binary numbers is that carries and borrows can cascade. For example, incrementing 2* | causes k carries in binary arithmetic. Symmetrically, decrement- ing causes k borrows. Segmented binary numbers solve this problem by allowing multiple carries or borrows to be executed in a single step.

6.3 Segmented Binary Numbers 73

functor BinomialHeap (structure E : ORDERED) : HEAP = struct structure Elem = E

datatype Tree = Node of int x Elem.T x Tree list (* the integer is the rank of the tree *) type Heap = Tree list

exception EMPTY

val empty = [] fun isEmpty ts = null és

fun rank (Node (r, x, ¢)) =r fun root (Node (r, %, c))=«% fun link (4 as Node (7, Tt, €1)s ly as Node CG ©, €2)) = if Elem.leq (2, 22) then Node (r+1, 2, tg :: ¢,) else Node (r+1, x2, t :: fun insTree (¢, [ ]) = [¢] | insTree (¢,, ts as f2 :: rest) = if rank ¢, < rank f then ¢, :: ts else insTree (link (¢,, 2), rest)

fun insert (x, ts) = insTree (Node (0, z, [ ]), ts) fun merge (¢s,, []) = ts,

| merge ({], ts2) = tsz

| merge (t, :: ts, t2 1: ts2) = if rank t, < rank fj then ¢, :: merge (#51, ty :: ts2) else if rank t2 < rank t then @ :: merge (t, :: ts1, ts) else insTree (link (4, é), merge (¢s1, ts2))

fun findMin [ ] = raise EMPTY | findMin [t] = root t | findMin :: ts) = let val x = root t val y = findMin ts in if Elem.leq (x, y) then z else y end

fun deleteMin [ ] = raise EMPTY | deleteMin ts = let fun getMin [¢] = (, []) | getMin :: ts) = let val (¢’, ts’) = getMin ts in if Elem.leq (root ¢, root t’) then (t, ts) else (¢’, t :: ts’) end val (Node (_, 2, ts), ts2) = getMin ts in merge (rev ts, ts.) end end

Figure 6.8: Binomial heaps [Vui78, Bro78].

74 Numerical Representations

Note that incrementing a binary number takes / steps whenever the number begins with a block of & 1s. Similarly, decrementing a binary number takes / steps whenever the number be- gins with a block of & 0s. Segmented binary numbers group contiguous sequences of identical digits into blocks so that we can execute a carry or borrow on an entire block in a single step. We represent segmented binary numbers as alternating blocks of Os and 1s using the following datatype:

datatype DigitBlock = Zeros of int | Ones of int type Nat = DigitBlock list

where the integer in each Digit Block represents the block’s length. Note that since we have forbidden trailing Os, the last block (if any) always contains 1s.

We use the pseudo-constructors zeros and ones to add new blocks to the front of a list of blocks. These pseudo-constructors merge adjacent blocks of the same digit and discard empty blocks. In addition, the zeros pseudo-constructor discards any trailing block of Os.

fun zeros (7, []) =[]

zeros (i, Zeros 7 :: blks) = Zeros (i+)) :: blks zeros (0, blks) = blks

zeros (i, blks) = Zeros i :: blks

fun ones (i, Ones 7 :: b/ks) = Ones (i+7) :: blks ones (0, blks) = blks ones (7, blks) = Ones 7 :: blks

Now, to increment a segmented binary number, we inspect the first block of digits (if any). If the first block contains « Os, then we replace the first 0 with a 1, creating a new singleton block of 1s and shrinking the block of Os by one. If the first block contains 7 1s, then we perform 7 carries in a single step by changing the 1s to Os and incrementing the next digit.

fun inc [ ] = [Ones 1] | inc (Zeros 7 :: blks) = ones (1, zeros (i—1, blks)) | inc (Ones i :: blks) = Zeros i :: inc blks

In the third line, we know the recursive call to inc cannot loop because the next block, if any, must contain Os. In the second line, the pseudo-constructors deal gracefully with the special cases that occur when the leading block contains a single 0.

Decrementing a segmented binary number is almost exactly the same, but with the roles of Os and 1s reversed.

fun dec (Ones 7 :: 6/ks) = zeros (1, ones (i—1, b/ks)) | dec (Zeros i :: blks) = Ones i :: dec blks

6.3 Segmented Binary Numbers 75

6.3.1 Segmented Binomial Random-Access Lists and Heaps

In both the binary random-access lists of Section 6.2.1 and the binomial heaps of Section 6.2.2, we linked two trees into a new, larger tree for every carry. In a cascade of & carries, we linked

a new singleton tree with existing trees of sizes 2°, 2',..., 2°! to obtain a new tree of size 2". Similarly, in binary random-access lists, a cascade of borrows decomposes a tree of size 2" into a singleton tree and k trees of sizes 2°, 2',...,2'-!.

Segmented binary numbers support fast carries and borrows, but to take advantage of this in a numerical representation, we must choose a tree representation that will allow us to link and unlink many trees in a single step. Of the three kinds of trees described earlier, only binomial trees support this behavior. A node of rank r consists of an element and a sequence of trees of ranks 0,..., 1. Therefore, we can combine an element and a sequence of trees into a new tree or decompose a tree into an element and a sequence of trees in O(1) time.

Adapting the earlier implementations of binary random-access lists and binomial heaps to use segmented binary arithmetic rather than ordinary binary arithmetic, and in the case of binary random-access lists, to use binomial trees rather than complete binary leaf trees, is tedious, but mostly straightforward, except for the following issues:

e To link and unlink multiple trees in a single step, we must use the same representation for the sequence of trees corresponding to a block of 1s (called a segment) and for the children of a node. So, for example, we cannot maintain one in increasing order of rank and the other in decreasing order of rank as we did for binomial heaps. For both segmented binomial heaps and segmented binomial random-access lists, we need easy access to the smallest tree in a segment, but we also need easy access to the largest child of a node. Therefore, we represent both kinds of sequences as real-time deques.

e For binomial heaps, the cascade of links that produces a new tree also compares the roots of trees as it goes to find the minimum element in the tree. For segmented binomial heaps, we do not have time to search a segment for the root with the minimum element, SO we insist that the smallest tree in any segment always have the minimum root. Then, whenever we create a new tree from a new element and a segment of trees of ranks 0,...,7 1, we simply compare the new element with the first root in the segment (1.e., the root of the rank 0 tree). The smaller element becomes the new root and the larger element becomes the rank 0 child of the root. Whenever we add a new tree of rank r to a segment whose smallest tree has rank r + 1, we decompose the tree of rank r + | into two trees of rank r. We then keep the tree with the smallest root, and link the remaining two trees into a new tree of rank r + 1.

With these changes segmented binomial random-access lists support cons, head, and tail in O(1) worst-case time, and lookup and update in O(log n) worst-case time. Segmented bino-

76 Numerical Representations

mial heaps support insert in O(1) worst-case time, and merge, findMin, and deleteMin in O(log n) worst-case time.

6.4 Skew Binary Numbers

Numerical representations based on segmented binary numbers rather than ordinary binary numbers improve the asymptotic behavior of certain operations from O(log n) to O(1), while retaining the same asymptotic behavior for all other operations. Unfortunately, such data struc- tures are too complicated to be useful in practice. We next consider another number system, skew binary numbers, that usually achieves similar asymptotic benefits, but that is simpler and faster in practice.

In skew binary numbers [Mye83, Oka95b], the weight w; of the ith digit is 2'+! 1, rather than 2' as in ordinary binary numbers. Digits may be 0, 1, or 2 (i.e., D; = {0,1,2}). For example, the decimal number 92 could be written 002101 (least-significant digit first).

This number system is redundant, but, if we add the further constraint that only the lowest non-0 digit may be 2, then we regain unique representations. Such a number is said to be in canonical form. Henceforth, we will assume that all skew binary numbers are in canonical form.

Theorem 6.1 (Myers [Mye83]) Every natural number has a unique skew binary canonical form.

Recall that the weight of digit 7 is 2‘*' 1 and note that | + 2(2'*' 1) = 2'+? 1. This implies that we can increment a skew binary number whose lowest non-0 digit is 2 by resetting the 2 to 0 and incrementing the next digit from 0 to 1 or from 1 to 2. (The next digit cannot already be 2.) Incrementing a skew binary number that does not contain a 2 is even easier simply increment the lowest digit from 0 to 1 or from 1 to 2. In both cases, the result is still in canonical form. And, assuming we can find the lowest non-0 digit in O(1) time, both cases take only O(1) time!

We cannot use a dense representation for skew binary numbers since scanning for the lowest non-0 digit would take more than O(1) time. Instead, we choose a sparse representation, so that we always have immediate access to the lowest non-0 digit.

type Nat = int list

The integers represent either the rank or weight of each non-0 digit. For now, we use weights. The weights are stored in increasing order, except that the smallest two weights may be identi- cal, indicating that the lowest non-0 digit is 2. Given this representation, we implement inc as follows:

6.4 Skew Binary Numbers 77

fun inc (ws as wy, 1 we 1: rest) = if w, = w. then (1+w),+wy) :: rest else 1 :: ws | inc ws = 1 :: ws

The first clause checks whether the first two weights are equal and then either combines the weights into the next larger weight (incrementing the next digit) or adds a new weight of 1 (incrementing the smallest digit). The second clause handles the case that ws is empty or contains only a single weight. Clearly, inc runs in only O(1) worst-case time.

Decrementing a skew binary number is just as easy as incrementing a number. If the lowest digit is non-0, then we simply decrement that digit from 2 to 1 or from 1 to 0. Otherwise, we decrement the lowest non-0 digit and reset the previous 0 to 2. This can be implemented as follows:

fun dec (1 :: ws) = ws | dec (w :: ws) = (w div 2) :: (w div 2) :: ws

In the second line, note that if w = 2*+! 1, then | w/2| = 2* 1. Clearly, dec also runs in only O(1) worst-case time.

6.4.1 Skew Binary Random-Access Lists

We next design a numerical representation for random-access lists, based on skew binary num- bers. The basic representation is a list of trees, with one tree for each 1 digit and two trees for each 2 digit. The trees are maintained in increasing order of size, except that the smallest two trees are the same size when the lowest non-0 digit is 2.

The sizes of the trees should correspond to the weights in skew binary numbers, so a tree representing the 7th digit should have size 2't! 1. Up until now, we have mainly considered trees whose sizes are powers of two, but we have also encountered a kind of tree whose sizes have the desired form: complete binary trees. Therefore, we represent skew binary random- access lists as lists of complete binary trees.

To support head efficiently, the first element in the random-access list should be the root of the first tree, so we store the elements within each tree in left-to-right preorder and with the elements in each tree preceding the elements in the next tree.

In previous examples, we have stored a size or rank in every node, even when that infor- mation was redundant. For this example, we adopt the more realistic approach of maintaining size information only for the root of each tree in the list, and not for every subtree as well. The type of skew binary random-access lists is therefore

datatype a Tree = Leaf of a | Node of a x a Tree x a Tree type a RList = (int x a Tree) list

78 Numerical Representations

Now, we can define cons in analogy to ine.

fun cons (2, ts as (wy), t,) 1: (we, tg) 1: rest) = if w, = wy then (1+w ,+w., Node (2, t,, t)) :: rest) else (1, Leaf x) :: ts | cons (x, ts) = (1, Leaf x) :: ts

head and tail inspect and remove the root of the first tree. fai/ returns the children of the root (if any) back to the front of the list, where they represent a new 2 digit.

fun head ((1, Leaf x) :: ts) =z | head ((w, Node (x, 1, t2)) :: ts) =a fun tail (1, Leaf x) :: ts) = ts | tail ((w, Node (x, t1, &2)) :: ts) = (w div 2, t,) :: (w div 2, tg) :: ts

To lookup an element, we first search the list for the right tree, and then search the tree for the right element.

fun lookup ((w, ¢) :: ts, 7) =if 7 < w then lookupTree (w, t, 7) else lookup (ts, i—w)

fun lookupTree (1, Leaf x, 0) = x | lookupTree (w, Node (2, 4), t2), 0) = 2 | lookupTree (w, Node (2, t, t2), i) = if i < w div 2 then lookupTree (w div 2, t,, i—1) else lookupTree (w div 2, f2, 2 1 w div 2)

Note that in the penultimate line, we subtract one from 7 because we have skipped over x. In the last line, we subtract 1+ | w/2| from i because we have skipped over «x and all the elements in tj. update and update Tree are defined similarly, and are shown in Figure 6.9, which contains the complete implementation.

It is easy to verify that cons, head, and tail run in O(1) worst-case time. Like binary random-access lists, skew binary random-access lists are logarithmic-length lists of logarithmic- depth trees. Hence, lookup and update run in O(log n) worst-case time. In fact, every unsuc- cessful step of lookup or update discards at least one element, so this bound can be improved slightly to O(min(7, logn)).

Hint to Practitioners: Skew binary random-access lists are a good choice for applications that take advantage of both the list-like aspects and the array-like aspects of random-access lists.

Although there are better implementations of lists, and better implementations of (persistent) arrays, none are better at both [Oka95b].

6.4 Skew Binary Numbers 79

structure SkewBinaryRandomAccessList : RANDOMACCESSLIST = struct datatype a Tree = Leaf of a | Node of a x a Tree x a Tree type a RList = (int x a Tree) list (* integer is the weight of the tree *)

exception EMPTY and INDEX

val empty = [] fun isEmpty ts = null és

fun cons (2, ts as (wy), t)) 2: (we, tg) 1: ts‘) = if w, = wy then (1+w,+w:, Node (2, #, t2)) :: ts’) else (1, Leaf x) :: ts cons (x, ¢s) = (1, Leaf x) :: ts fun head [ ] = raise EMPTY head ((1, Leaf x) :: ts) =a head ((w, Node (2, 4, #2)) :: ts) =a fun tail [ ] = raise EMPTY tail (1, Leaf x) :: ts) = ts tail ((w, Node (2, t, #2)) :: ts) = (w div 2, #,) =: (w div 2, tg) :: ts

fun lookupTree (1, Leaf «, 0) =z lookupTree (1, Leaf x, 7) = raise INDEX lookupTree (w, Node (a, t, 2), 0)=2 lookupTree (w, Node (z, t, t2), #) = if i < w div 2 then lookupTree (w div 2, t, i—1) else lookupTree (w div 2, t2, 1 1 w div 2) fun updateTree (1, Leaf x, 0, y) = Leaf y | updateTree (1, Leaf 2, i, y) = raise INDEX | updateTree (w, Node (2, t, 2), 0, y) = Node (y, t,, f2) | updateTree (w, Node (2, t1, &), 7, y) = if ¢ < w div 2 then Node (2, updateTree (w div 2, t, 7-1, y), &) else Node (a, ¢,, updateTree (w div 2, fz, 1 1 w div 2, y))

fun lookup ([ J, 7) = raise INDEX | lookup ((w, t) :: ts, 7) =if ¢ < w then lookupTree (w, ft, 7) else lookup (ts, i—w) fun update ([], 7, y) = raise INDEX | update ((w, f) :: ts, 7, y)= if « < w then updateTree (w, t, 7, y) :: ts else (w, ¢) :: update (¢s, i—w, y)

end

Figure 6.9: Skew binary random-access lists.

80 Numerical Representations

6.4.2 Skew Binomial Heaps

Finally, we consider a hybrid numerical representation for heaps based on both skew binary numbers and ordinary binary numbers. Incrementing a skew binary number is both quick and simple, and serves admirably as a template for the ¢nsert function. Unfortunately, addition of two arbitrary skew binary numbers is awkward. We therefore base the merge function on ordinary binary addition, rather than skew binary addition.

A skew binomial tree is a binomial tree in which every node is augmented with a list of up to r elements, where r is the rank of the node in question.

datatype Tree = Node of int x Elem.T x Elem.T list x Tree list

Unlike ordinary binomial trees, the size of a skew binomial tree is not completely determined by its rank; rather the rank of a skew binomial tree determines a range of possible sizes.

Lemma 6.2 If ¢ is a skew binomial tree of rank r, then 2 < |t| < 2"*! —1.

Proof: By induction. ¢ has r children ¢,...t,, where t; is a skew binomial tree of rank r—1, and 2’-" < |t;| < 2"-*++ 1. In addition, the root of ¢ is augmented with a list of k elements, where 0 < k < r. Therefore, |t] > 1+0+ 57%) 2' = 1+ (2" —1) = 2" and le] <1 tr 4g (241-1) =1l4trt(artt—-r—-2) =2"1- 1,

Note that a tree of rank r is always larger than a tree of rank r lL.

Skew binomial trees may be linked or skew linked. The link function combines two trees of rank r to form a tree of rank r + | by making the tree with the larger root a child of the tree with the smaller root.

fun link (¢,; as Node (, 1, 751, ¢1), t2 as Node (_, 22, %s2, €2)) = if Elem.leq (21, 22) then Node (r+1, 21, 751, t2 :: ¢,) else Node (r+1, 2%, rsa, t, %: €2)

The skewLink function combines two trees of rank r with an additional element to form a tree of rank r + | by first linking the two trees, and then comparing the root of the resulting tree with the additional element. The smaller of the two elements remains as the root, and the larger is added to the auxiliary list of elements.

fun skewLink (x, t,, f2) = let val Node (r, y, ys, c) = link (4, #2) in if Elem.leq (x, y) then Node (r, z, y :: ys, c) else Node (r, y, x :: ys, c) end

A skew binomial heap is represented as a list of heap-ordered skew binomial trees of in- creasing rank, except that the first two trees may share the same rank. Since skew binomial trees of the same rank may have different sizes, there is no longer a direct correspondence between the trees in the heap and the digits in the skew binary number representing the size of

6.4 Skew Binary Numbers 81

the heap. For example, even though the skew binary representation of 4 is 11, a skew binomial heap of size 4 may contain one rank 2 tree of size 4; two rank | trees, each of size 2; a rank 1 tree of size 3 and a rank 0 tree; or a rank | tree of size 2 and two rank O trees. However, the maximum number of trees in a heap is still O(log n).

The big advantage of skew binomial heaps is that we can insert a new element in O(1) time. We first compare the ranks of the two smallest trees. If they are the same, we skew link the new element with these two trees. Otherwise, we simply add a new singleton tree to the list.

fun insert (x, ts as t, :: ty 3: rest) = if rank ¢, = rank ¢, then skewLink (x, ¢,, f2) :: rest else Node (0, x, [], [ ]) :: ts | insert (x, ts) = Node (0, z, [], []) :: ts

We implement merge in terms of two helper functions, ins Tree and merge Trees, that behave exactly like their counterparts from ordinary binomial heaps, performing a regular link (not a skew link!) whenever they find two trees of equal rank. Since merge Trees expects lists of strictly increasing rank, merge normalizes its two arguments to remove any leading duplicates before calling merge Trees.

fun normalize [ ] = [ ] | normalize :: ts) = insTree (¢, ts) fun merge (ts, ts2) = mergeTrees (normalize ts,, normalize ts)

findMin also behaves exactly like its counterpart from ordinary binomial heaps; since it ignores the rank of each tree, it is unaffected by the possibility that the first two trees might have the same rank. It simply scans the list of trees for the minimum root.

fun findMin [t] = root ¢ | findMin :: ts) =let val x = root ¢ val y = findMin ts in if Elem.leq (x, y) then x else y end

Finally, deleteMin on skew binomial heaps is similar to its counterpart for ordinary binomial heaps except that it must deal with the list of auxiliary elements that has been added to every node. We first find and remove the tree with the minimum root. After discarding this root, we merge the children of this root with the remaining trees. To do so, we must first reverse the list of children, since it is stored in decreasing order, and normalize the list of trees, since the first rank might be duplicated. Finally, we reinsert each of the elements from the auxiliary list.

fun deleteMin ts = let fun getMin [t] = (¢, []) | getMin :: ts) = let val (¢’, ts’) = getMin ts in if Elem.leq (root t, root t’) then (¢, ts) else (t’, ¢ :: ts’) end

82 Numerical Representations

val (Node (_, x, xs, c), ts’) = getMin ts fun insertAll ({ ], ts) = ts | insertAll (x :: xs, ts) = insertAll (zs, insert (x, ts)) in insertAll (#s, mergeTrees (rev c, normalize ts’)) end

Figure 6.10 presents the complete implementation of skew binomial heaps.

insert clearly runs in O(1) worst-case time, while merge, findMin, and deleteMin run in the same time as their counterparts for ordinary binomial queues, i.e., O(log n) worst-case time each. Note that the various phases of deleteMin finding the tree with the minimum root, reversing the children, merging the children with the remaining trees, and reinserting the auxiliary elements take O(log n) time each.

6.5 Discussion

In designing numerical representations, we draw analogies between container data structures and representations of natural numbers. However, this analogy can also be extended to other kinds of numbers. For example, difference lists [SS86] in Prolog support a notion of lists with negative length; appending a list of length 15 and a list of length —10 results in a list of length 5. This behavior is also possible using the catenable lists of Hughes [Hug86], which are the functional counterpart of difference lists.!

As another example, Brodal and Okasaki [BO96] support a delete function on heaps using two primitive heaps, one containing positive elements and one containing negative elements. The negative elements are ones that have been deleted, but that have not yet been physically removed from the positive heap. In this representation, it is possible to delete elements that have not yet been inserted. If the negative heap is larger than the positive heap, then the overall “size” of the heap is negative.

Can this analogy between data structures and representations of numbers be extended even further, to non-integral numbers? We know of no such examples, but it is intriguing to speculate on possible uses for such data structures. For instance, might a numerical representation based on floating point numbers be useful in approximation algorithms?

6.6 Related Work

Numerical Representations Data structures that can be cast as numerical representations are surprisingly common, but only rarely is the connection to a variant number system noted explicitly [GMPR77, Mye83, CMP88, KT96b].

'Thanks to Phil Wadler for this observation.

6.6 Related Work 83

functor SkewBinomialHeap (structure E : ORDERED) : HEAP = struct structure Elem = E datatype Tree = Node of int x Elem.T x Elem.T list x Tree list type Heap = Tree list

exception EMPTY

val empty = [ ] fun isEmpty ts = null és

fun rank (Node (r, 2, xs, c)) =r fun root (Node (r, x, «8, c))=% fun link (¢; as Node (7, 21, “51, €1), & aS Node (_, 2, 252, C2)) = if Elem.leq (21, 22) then Node (r+1, 21, %51, t2 :: c,) else Node (r+1, 22, %so, t :: fun skewLink (x, tj, &) = let val Node (r, y, ys, c) = link (4, #2) in if Elem.leq (x, y) then Node (vr, x, y :: ys, c) else Node (r, y, x :: ys, c) end fun insTree (¢, [ ]) = [¢] | insTree (¢, t2 :: ts) =if rank t, < rank ft; then ¢, :: tg :: ts else insTree (link (¢,, f2), ts) fun mergeTrees (ts,, []) = ts, | mergeTrees ([ ], ¢s2) = ts2 | mergeTrees (t, :: t81, 2 :: ts2) =if rank t, < rank tf; then ¢, :: mergeTrees (ts1, f2 :: ts2) else if rank & < rank ¢, then f2 :: mergeTrees (t, :: #s1,ts2) else insTree (link (¢1, #2), mergeTrees (ts 1, ts2)) fun normalize [ ] = [ ] | normalize :: ts) = insTree (t, ts) fun insert (x, ts as f 2: fg 1: rest) = if rank ¢,; =rank tf then skewLink (x, ¢,, t) :: rest else Node (0, x, [], [ ]) :: ts | insert (a, ts) = Node (0, «, [], []) :: ts fun merge (ts,, ts2) = mergeTrees (normalize ts;, normalize ts) fun findMin [ ] = raise EMPTY | findMin [t] = root t | findMin :: ts) = let val x = root t and y = findMin ts in if Elem.leq (x, y) then x else y end

fun deleteMin [ ] = raise EMPTY | deleteMin ts = let fun getMin [t] = (¢, []) | getMin :: ts) = let val (¢’, ts’) = getMin ts in if Elem.leq (root ¢, root t’) then (t, ts) else (¢’, t :: ts’) end val (Node (_, x, xs, c), ts’) = getMin ts fun insertAll ({ ], ts) = ts | insertAll (a :: xs, ts) = insertAll (as, insert (a, ts)) in insertAll (zs, mergeTrees (rev c, normalize ts’)) end end

Figure 6.10: Skew binomial heaps.

84 Numerical Representations

Random-Access Lists Random-access lists are usually implemented in purely functional languages as balanced trees, such as AVL trees [Mye84], Braun trees [Hoo92a, Pau91], or leftist left-perfect leaf trees [KD96]. Such trees easily support O(log n) lookups and updates (O(log 7) in the case of Braun trees), but require O(log n) time for cons or tail.

Myers [Mye83] describes the first implementation of random-access lists based on skew binary numbers. He augments a standard singly-linked list with auxiliary pointers allowing one to skip arbitrarily far ahead in the list. The number of elements skipped by each auxiliary pointer is controlled by the digits of a skew binary number. His scheme supports cons, head, and tail in O(1) time, and lookup in O(log n) time, but requires O(7) time for update. The difficulty with updates is that his scheme contains many redundant pointers. Removing those redundant pointers yields a structure isomorphic to the skew binary random-access lists of Section 6.4.1, which first appeared in [Oka95b].

Kaplan and Tarjan [KT95] recently introduced the algorithmic notion of recursive slow- down, and used it to design a new, purely functional implementation of real-time deques. A pleasant accidental property of their data structure is that it also supports random access in O(log d) worst-case time, where d is the distance from the desired element to the nearest end of the deque (i.e. d = min(z?,n 1 2)). We will consider a simplification of their data structure in Chapter 8.

Finger search trees [GMPR77, Tsa85] support not only random access in O(log d) worst- case time, but also insertions and deletions at arbitrary locations. Kaplan and Tarjan apply their methods to purely functional finger search trees in [KT96b].

Binomial Heaps Binomial heaps were introduced by Vuillemin [Vui78] and extensively studied by Brown [Bro78]. King [Kin94] showed that binomial heaps could be implemented elegantly in a purely functional language (in his case, Haskell).

Fagerberg [Fag96] describes a generalization of binomial heaps in which the set D; of allowable digits at position 7 in a sequence of digits can be different for each 2. Varying the choices for each D; allows a tradeoff between the costs of insert and meld, and the cost of deleteMin.

Skew binomial heaps were originally presented, in a slightly different form, in [BO96].

Chapter 7

Data-Structural Bootstrapping

The term bootstrapping refers to “pulling yourself up by your bootstraps”. This seemingly nonsensical image is representative of a common situation in computer science: problems whose solutions require solutions to (simpler) instances of the same problem.

For example, consider loading an operating system from disk or tape onto a bare computer. Without an operating system, the computer cannot even read from the disk or tape! One solu- tion is a bootstrap loader, a very tiny, incomplete operating system whose only purpose is to read in and pass control to a somewhat larger, more capable operating system that in turn reads in and passes control to the actual, desired operating system. This can be viewed as a instance of bootstrapping a complete solution from an incomplete solution.

Another example is bootstrapping a compiler. A common activity is to write the compiler for a new language in the language itself. But then how do you compile that compiler? One solution is to write a very simple, inefficient interpreter for the language in some other, existing language. Then, using the interpreter, you can execute the compiler on itself, thereby obtain- ing an efficient, compiled executable for the compiler. This can be viewed as an instance of bootstrapping an efficient solution from an inefficient solution.

In his thesis [Buc93], Adam Buchsbaum describes two algorithmic design techniques he collectively calls data-structural bootstrapping. The first technique, structural decomposition, involves bootstrapping complete data structures from incomplete data structures. The second technique, structural abstraction, involves bootstrapping efficient data structures from ineffi- cient data structures. In this chapter, we reexamine data-structural bootstrapping, and describe several functional data structures based on these techniques.

86 Data-Structural Bootstrapping

7.1 Structural Decomposition

Structural decomposition is a technique for bootstrapping complete data structures from in- complete data structures. Typically, this involves taking an implementation that can handle objects only up to some bounded size (perhaps even zero), and extending it to handle objects of unbounded size.

Consider typical recursive datatypes such as lists and binary leaf trees:

datatype a List = Nil | Cons of a x a List datatype a Tree = Leaf of a | Node of a Tree x a Tree

In some ways, these can be regarded as instances of structural decomposition. Both consist of a simple implementation of objects of some bounded size (zero for lists and one for trees) and a rule for recursively decomposing larger objects into smaller objects until eventually each object is small enough to be handled by the bounded case.

However, both of these definitions are particularly simple in that the recursive component in each definition is identical to the type being defined. For instance, the recursive component in the definition of a List is also a List. Such a datatype is called uniformly recursive.

In general, we reserve the term structural decomposition to describe recursive data struc- tures that are non-uniform. For example, consider the following definition of sequences:

datatype a Seq = Empty | Seq of a x (a x a) Seq

Here, a sequence is either empty or a single element together with a sequence of pairs of elements. The recursive component (a x a) Seq is different from a Seq so this datatype is non-uniform. (In Chapter 8, we will consider an implementation of queues that is similar to this definition of sequences.)

Why might such a non-uniform definition be preferable to a uniform definition? The more sophisticated structure of non-uniform types often supports more efficient algorithms than their uniform cousins. For example, compare the following szze functions on lists and sequences.

fun sizeL Nil = 0 fun sizeS Empty = 0 | sizeL (Cons (x, xs)) = 1 + sizeL xs | sizeS (Seq (a, ps)) = 142 * sizeS ps

The function on lists runs in O(n) time whereas the function on sequences runs in O(log n) time.

7.1.1 Non-Uniform Recursion and Standard ML

Although Standard ML allows the definition of non-uniform recursive datatypes, the type sys- tem disallows the definition of most useful functions on such datatypes. For instance, consider

7.1 Structural Decomposition 87

the sizeS function on sequences. This function definition would be rejected by Standard ML because the type system requires that all recursive calls in the body of a recursive function have the same type as the enclosing function (i.e., recursive function definitions must be uniform). The sizeS function violates this restriction because the enclosing sizeS has type a Seq int but the inner sizeS has type (a x a) Seq > int.

It is usually possible to convert a non-uniform type into a uniform type by introducing a new datatype to collapse the different instances into a single type. For example, by collapsing elements and pairs, the Seq type could be written

datatype a ElemOrPair = Elem of a | Pair of a ElemOrPair x a ElemOrPair datatype a Seq = Empty | Seq of a ElemOrPair x a Seq

Then the stzeS function would be perfectly legal as written; both the enclosing sizeS and the inner sizeS would have type a Seq > int.

Although necessary to satisfy the Standard ML type system, this solution is unsatisfactory in at least two ways. First, the programmer must manually insert Hlem and Pair constructors everywhere. This is tedious and error-prone. Second, and more importantly, this definition of Seq is not isomorphic to the earlier, non-uniform definition of Seq. In particular, the first defi- nition ensures that the outermost Seq constructor contains a single element, the second a pair of elements, the third a pair of pairs of elements, and so on. However, the second definition makes no such restriction; elements and pairs may be freely mixed. If such a restriction is desired, the programmer must establish it as a system invariant. But if the programmer accidentally violates this invariant say, by using an element where a pair is expected the type system will be of no help in catching the error.

For these reasons, we will often present code as if Standard ML supported non-uniform recursive function definitions, also known as polymorphic recursion [Myc84]. This code will not be executable but will be easier to read. We will then sketch the coercions necessary to eliminate the polymorphic recursion and make the code executable.

7.1.2 Queues Revisited

Consider the use of + in the banker’s queues of Section 3.4.2. During a rotation, the front stream is replaced by fF + reverse FR. After a series of rotations, / will have the form

(---((f + reverse r1) + reverse rg)+ +++ reverse rp)

Append is well-known to be inefficient in left-associative contexts like this because it repeat- edly processes the elements of the leftmost streams. For example, in this case, the elements of f will be processed & times (once by each +4), and the elements of r; will be processed k —1 +1 times (once by reverse and once for each following +). In general, left-associative appends

88 Data-Structural Bootstrapping

can easily lead to quadratic behavior. In this case, fortunately, the total cost of the appends is still linear because each r; is at least twice as long as the one before. Still, this repeated pro- cessing does sometimes make these queues slow in practice. In this section, we use structural decomposition to eliminate this inefficiency.

Given that /’ has the above form and writing # as r, we can decompose a queue into three parts: f, r, and the collection m = {reverse r1,..., reverse r;,}. Previously, f, r, and each reverse r; was a Stream, but now we can represent f and r as ordinary lists and each reverse 1; as a suspended list. This eliminates the vast majority of suspensions and avoids almost all of the overheads associated with lazy evaluation. But how should we represent the collection m? As we will see, this collection is accessed in FIFO order, so using structural decomposition we can represent it as a queue of suspended lists. As with any recursive type, we need a base case, so we will represent empty queues with a special constructor The new representation is therefore

datatype a Queue = Empty | Queue of {F: a list, M: a list susp Queue, LenFM : int, R : a list, LenR : int}

Len F’M is the combined length of F’ and all the suspended lists in © (i.e., what used to be simply Lenf’ in the old representation). can never be longer than this combined length. In addition, /’ must always be non-empty. (In the old representation, /’ could be empty if the entire queue was empty, but now we represent that case separately.)

As always, the queue functions are simple to describe.

fun snoc (Empty, x) = Queue {F = [x], M = Empty, LenFM = 1, R =[], LenR = 0} | snoc (Queue {F = f, M= m, LenFM = lenFM, R= r, LenR = lenkR}, x) = queue {F = {, M= m, LenFM = lenFM, R= x :: r, LenR = lenR+1}

fun head (Queve {F=2::/,...}) =x fun tail (Queue {F = x :: f, M=m, LenFM = lenFM, R= r, LenR = lenR}) = queue {F = f, M = m, LenFM = lenFM—1, R=r, LenR = lenk}

The real action is in the pseudo-constructor queue. If & is too long, queue creates a suspension to reverse # and adds the suspension to /. After checking the length of R, quewe invokes a helper function checkF that guarantees that F' is non-empty. If both / and M are empty, then the entire queue is empty. Otherwise, if f is empty we remove the first suspension from M, force it, and install the resulting list as the new Ff’.

fun queue (q as {F= /, M= m, LenFM = lenF'M, R=r, LenR = lenR}) = if lenR < lenF'M then checkF q else checkF {F = f, M = snoc (m, $rev r), LenFM = lenFM+lenR, R =[], LenR = 0}

'A slightly more efficient alternative is to represent queues up to some fixed size simply as lists.

7.1 Structural Decomposition 89

structure BootstrappedQueue : QUEUE = = (* assumes polymorphic recursion! *) struct datatype o Queue = Empty | Queue of {F: a list, M : a list susp Queue, LenFM : int, R : q list, LenR : int }

exception EMPTY

val empty = Empty fun isEmpty Empty | isEmpty (Queue _) = false

fun queue as {F = f, M = m, LenFM = lenF'M, R= r, LenR = lenR}) = if lenR < lenF’M then checkF gq else checkF {F = f, M = snoc (m, $rev r), LenFM = lenF MtlenR, R =[], LenR = 0}

and checkF {F =[], M = Empty, ... } = Empty checkF {F =[], M = m, LenFM = lenF'M, R=r, LenR = lenR}) =

Queue {F = force (head m), M = tail m, LenFM = lenF'M, R =r, LenR = lenR} checkF ¢ = Queue ¢

and snoc (Empty, «) = Queue {F = [x], M = Empty, LenFM = 1, R=[], LenR = 0}

snoc (Queue {F = f, M = m, LenFM = lenFM, R=r, LenR = lenR}, x) = queue {F = f, M=m, LenFM = lenFM, R=2 :: r, LenR = lenR+1}

and head Empty = raise EMPTY

head (Queue {F=2::f,...})=2

and tail Empty = raise EMPTY

tail (Queue {F = x :: f, M =m, LenFM = lenF'M, R =r, LenR = lenR}) = queue {F = f, M =m, LenFM = lenFM—1, R= r, LenR = lenR}

end

Figure 7.1: Bootstrapped queues based on structural decomposition.

and checkF {F = [], M = Empty, ... } = Empty | checkF {F =[], M= m, LenFM = lenFM, R= r, LenR = lenR}) = Queue {F = force (head m), M = tail m, LenFM = lenFM, R=r, LenR = lenR} | checkF ¢ = Queue ¢

Note that queue and checkF call snoc and tail, which in turn call queue. These functions must therefore all be defined mutually recursively. The complete implementation appears in Figure 7.1.

Remark: To implement these queues without polymorphic recursion, we redefine the datatype as

90 Data-Structural Bootstrapping

datatype a ElemOrList = Elem of a | List of a ElemOrList list susp datatype a Queue = Empty | Queue of {F : a ElemOrList list, M : a Queue, LenFM : int, R: a ElemOrList list, LenR : int }

Then snoc and head add and remove the em constructor when inserting or inspecting an ele- ment, and queue and checkF add and remove the List constructor when inserting or removing a list from M. ©

These queues create a suspension to reverse the rear list at exactly the same time as banker’s queues, and force the suspension one operation earlier than banker’s queues. Thus, since the re- verse computation contributes only O(1) amortized time to each operation on banker’s queues, it also contributes only O(1) amortized time to each operation on bootstrapped queues. How- ever, the running time of the snoc and tai! operations is not constant! Note that snoc calls queue, which in turn might call snoc on M. In this way we might get a cascade of calls to snoc, one at each level of the queue. However, successive lists in M at least double in