Problem:
Given a list xs, answer a large number of questions of the form, what is the accumulation of the elements whose indices are in the range [l, u]?.
Requirements:
Notice that we have only two requirements for the elements of xs:
- we must be able to accumulate them; and
- the neutral element, e, exists (e is the result of queries on empty intervals).
These two suggest that the elements are Monoids.
Strictly speaking, the accumulation function need not be associative, but, if it is, we can build a segment trees from the list.
The Naive Solution:
Our first solution simply mconcats the interval's elements.
queryInterval :: (Monoid a) => [a] -> (Int, Int) -> a
queryInterval xs (l, u) = mconcat . take (u-l+1) . drop l $ xs
The queryInterval function requires no additional space and takes time proportional to the interval's size. Therefore, the time necessary to query all intervals is O(n^{3}). For large values of n, this is clearly unacceptable. We examine a faster algorithm in the next section.
Simple Segment Trees:
A list stores elements; i.e. it stores information of the form, the value of the n^{th} element is something. A segment tree stores intervals; i.e. it stores information of the form, the accumulation of the elements with indices between l and u is something.
An example will clarify matters. Here is the interval tree for [1..7] accumulated using addition.
-- A binary tree.
data (Monoid a) => Tree a = Branch a (Tree a) (Tree a) | Leaf a
-- A 'SegmentTree' is a binary tree and the bounds of its
-- corresponding interval.
data (Monoid a) => SegmentTree a = SegmentTree (Tree a) (Int, Int)
There are two types of nodes:
- Each leaf holds a single element. Reading the leaves left to right yields the initial list;
- Each branch holds an interval which is the union of its children. The value of a branch is the accumulation of the values of its children. Each branch has exactly two children. Combining the branches on a level left-to-right results in an interval which perfectly overlaps the initial list.
We construct a segment tree recursively by starting with the list and repeatedly splitting it in half.
For the sake of efficiency, we only chop off the beginning of the list. The list's head is always where it should be, but its end remains fixed.
-- Build the 'SegmentTree' for the given list. Time: O(n*log n)
mkTree :: (Monoid a) => [a] -> SegmentTree a
mkTree xs = SegmentTree (go xs listBounds) listBounds
where
listBounds = (0, length xs - 1)
go ys (l, u)
-- invariant: head ys == xs !! l
| l == u = Leaf (head ys)
| otherwise =
let m = (u-l) `div` 2
leftc = go ys (l, l+m)
rightc = go (drop (m+1) ys) (l+m+1, u)
in Branch (getCargo leftc `mappend` getCargo rightc)
leftc rightc
getCargo (Branch x _ _) = x
getCargo (Leaf x) = x
To query a tree, we start at the root and descend matching the nodes onto the specified interval.
-- Query the 'SegmentTree' for the specified closed interval. Time:
-- O(log n)
queryTree :: (Monoid a) => SegmentTree a -> (Int, Int) -> a
queryTree (SegmentTree t (s, e)) (l, u) = go t (s, e)
where
-- we're querying for (l, u)
go t (s, e)
| (l > e) || (u < s) = mempty
| (l <= s) && (u >= e) = getCargo t
| otherwise = let (Branch _ leftc rightc) = t
m = (e-s) `div` 2
lv = go leftc (s, s+m)
rv = go rightc (s+m+1, e)
in lv `mappend` rv
During the descent, three distinct situations arise:
- if the queried interval is invalid, the result is mempty;
- if the queried interval completely overlaps the node, the result is the node's value; and
- if the queried interval is included in the current node, we rerun the function on its two halves and accumulate the results.
Resources
- TopCoder article on RMQ and LCA Even though the article is not about segment trees per se (it details a few RMQ and LCA algorithms), the section on them is an excellent down-to-earth introduction.
- Wikipedia entry on segment trees The page is a little to obfuscated for my tastes, but, for the sake of completeness, the link is here.
- Monoids and Finger Trees apfelmus's article is a good example of how to use Monoids to abstract seemingly unrelated operations; he uses them to unify priority queue operations. Similarly, we use Monoids to abstract all valid accumulation operations on intervals.