Thursday, August 13, 2009

Mutable heaps in C++ (for updates to priority_queue priorities)

Posted by Danny Tarlow
I'm usually very happy with the STL C++ data structures and libraries, so I was a bit surprised to find that the built-in heap doesn't support updating a value. This comes up, for example, in iterative algorithms where in each iteration we make a change to a small number of variables at the front of a queue. The changes may affect some other non-front variable values as well, but we want to reuse most of the sorting work that we've done in previous iterations. In this case, we keep the same heap from iteration to iteration and just call update() on the changed values when necessary at a cost of O(log N) for binary heaps.

After not finding what I needed in the core STL, I turned to the Boost C++ libraries and found the mutable_queue. The class lies way far off in the generic programming way of doing things, but I won't comment on that.

Anyway, to define a mutable_queue, there are four template arguments that need to be filled in:
  template <class IndexedType, 
            class RandomAccessContainer = std::vector<IndexedType>, 
            class Comp = std::less<typename RandomAccessContainer::value_type>,
            class ID = identity_property_map >
  class mutable_queue { ...
Conveniently, though, when we are working with basic types, we can just give it one template argument, and the defaults will work for the other template parts. This worked great for me when I tried it out with a simple mutable_queue<double>, and it was pretty easy to get it running and working correctly.

What I need, however, is a mutable_queue of Node * objects. So it seemed fairly straightforward to fill in the template arguments:
typedef Node * entry_t;
typedef vector<entry_t> storage_t;
typedef CompareNodeStar comp_t;
typedef identity_property_map prop_map_t;
typedef mutable_exposed_queue<entry_t, storage_t,
    comp_t, prop_map_t> queue_t;

queue_t *q = new queue_t(N, comp_t(), prop_map_t());
Which says give me a queue of Node *'s, use a vector of Node *'s as the underlying storage, compare elements using the CompareNodeStar comparison function that I defined, and use the default property_map*.

Unfortunately, this gives me errors, as do several other attempts at filling in property_map types. *By the way, here's what a property map is:
The property map interface consists of a set of concepts (see definition of "concept" in [1] and [2]) that define a syntax for mapping key objects to corresponding value objects. Since the property map operations are global functions (actually they don't have to be global, but they are always called unqualified and may be found via argument dependent lookup), it is possible to overload the map functions such that nearly arbitrary property map types and key types can be used. The interface for property maps consists of three functions: get(), put(), and operator[].
At this point, I'd already sunk a couple hours into the boost library, and I was getting a bit frustrated -- why should I have to learn about the internal details of property_maps when all I want to do is apply the data structure in a fairly natural way? Yes, maybe I'm being lazy, and granted I don't use C++ as much as I used to, but it seems to me that this should be easier than it is. (Maybe somebody can come along and point out the correct way of doing this -- I put in a reasonable effort to figure it out, but by no means am I confident that I'm even on the right track).

The other source of my frustration is probably that heaps are not that complicated. To update a heap, all you really have to do is figure out whether the element is ok where it is, needs to be pushed up, or needs to be pushed down. So I changed tactics and found a less generic (but still pretty generic) heap update implementation, called HEAPPlus:
http://www.cse.ust.hk/~lihw/code/heapplus/

For my purposes, I like this implementation quite a bit better. It uses the same style of interface as STL heaps, where you have a vector as the core storage structure, then heaps are just a set of algorithms that maintain a proper ordering of elements in the vector. All that heapplus.h does is define update_heap(start_iterator, end_iterator, edit_position_iterator, new_value).

This worked (almost) on the first try for me for doubles, Nodes, and Node *'s, but it's still not exactly what I want. I want to be able to change the value of an element in the heap separately from the call to updating the heap. This comes up in the iterative algorithm example, where the iterative algorithm maintains pointers to the Node elements, so it can change their priority directly. Granted, you could have the algorithm change the priority, then pass a pointer to the changed node as the new_value, but that seems redundant to me. It's more natural for me to think of leaving the last argument off: update_heap_pos(start_iterator, end_iterator, edit_position_iterator).

So that's what I did. I took the HEAPPlus implementation and defined a couple new functions to support update_heap_pos, then I wrote some simple examples to show the usage. Here's the test file, test.cpp:
// test.cpp
// Used to test heapplus.h
// lihw
// 2006.12.18
// ChangeLog:
//  2009.08.10  - Added tests for Node and Node * heaps, and added Comparator
//                argument to print_heap, so that pop_heap works as expected.
//                (Danny Tarlow: dtarlow@cs.toronto.edu)
//  2006.12.18  - Initially created (lihw)

#include <iostream>
#include <ostream>
#include <functional>
#include <algorithm>
#include <vector>
#include "heapplus.h"

#include <ctime>
#include <cstdlib>

using namespace std;


template<typename Container>
inline
void PRINT_ELEMENTS(const Container& container, const char* title= 0) 
{
    if (title)
        std::cout << title << std::endl;

    typename Container::const_iterator iter, beg = container.begin(),
             end = container.end();
    for (iter = beg; iter != end; ++iter) 
        std::cout << *iter << " ";
    std::cout << std::endl;
}

template<typename Container, typename Compare>
void print_heap(const Container& container, Compare comp, const char* title = 0)
{
    if (title)
        std::cout << title << std::endl;

    std::vector<typename Container::value_type> temp(container.begin(),
            container.end());
    while (temp.size()) {
        std::cout << temp[0] << " ";
        std::pop_heap(temp.begin(), temp.end(), comp);
        temp.pop_back();
    }
    std::cout << std::endl;
}


/**
 *  A very simple node class just to test the update_heap_pos functions
 * and to show an example usage.  Each node lies on an x-y grid, and it
 * has a priority associated with it, which we assume will be computed and
 * and modified by some procedure outside the scope of these functions.
 *
 * This example shows how to maintain a queue of grid points, ordered by
 * priority, where updates take at worst O(lg N) time.
 **/

class Node {

  friend ostream &operator<<(ostream &os, Node &n);
  friend bool operator<(Node a, Node b);
  friend bool operator==(Node &a, Node &b);
  friend ostream &operator<<(ostream &os, Node *n);

public:

  Node() : x_(0), y_(0), priority_(0.0) { }

  Node(int x, int y) : x_(x), y_(y), priority_(0.0) { }

  double Priority() { return priority_; }

  void SetPriority(double priority) {
    priority_ = priority;
  }

protected:
  double priority_;
  int x_;
  int y_;
};

ostream &operator<<(ostream &os, Node &n) {
  os << "(" << n.x_ << "," << n.y_ << "):" << n.Priority();
  return os;
}
bool operator<(Node a, Node b) {
  return (a.Priority() < b.Priority());
}
bool operator==(Node &a, Node &b) {
  return (a.x_ == b.x_ && a.y_ == b.y_);
}
ostream &operator<<(ostream &os, Node *n) {
  os << "(" << n->x_ << "," << n->y_ << "):" << n->Priority();
  return os;
}
struct CompareNodeStar {
  bool operator()(Node *a, Node *b) {
    return (a->Priority() < b->Priority());
  }
};


int main(void) {

    int side = 4;
    int N = side * side;

    vector<double> doubleheap(0);
    vector<Node> nodeheap(0);
    vector<Node *> nodestarheap(0);

    for (int x = 0; x < side; x++) {
      for (int y = 0; y < side; y++) {
 Node node = Node(x, y);
 Node *nodestar = new Node(x, y);
 double priority = (x - side / 2.0) * (y - side / 4.0); 

 node.SetPriority(priority);
 nodestar->SetPriority(priority);

 doubleheap.push_back(priority);
 nodeheap.push_back(node);
 nodestarheap.push_back(nodestar);
      }
    }

    cout << "heap<double>" << endl;
    //PRINT_ELEMENTS(doubleheap, "Initial heap");
    print_heap(doubleheap, less<double>());


    make_heap(doubleheap.begin(), doubleheap.end());
    //PRINT_ELEMENTS(doubleheap, "Made heap");
    print_heap(doubleheap, less<double>());

    doubleheap[4] = 100;
    //PRINT_ELEMENTS(doubleheap, "edited");
    print_heap(doubleheap, less<double>());

    vector<double>::iterator itdouble = doubleheap.begin() + 4;
    update_heap_pos(doubleheap.begin(), doubleheap.end(), itdouble);
    //PRINT_ELEMENTS(doubleheap, "Updated and sorted");
    print_heap(doubleheap, less<double>());



    cout << endl;
    cout << "heap<Node>" << endl;
    print_heap(nodeheap, less<Node>());

    make_heap(nodeheap.begin(), nodeheap.end());
    print_heap(nodeheap, less<Node>());

    nodeheap[4].SetPriority(100);
    print_heap(nodeheap, less<Node>());

    vector<Node>::iterator it = nodeheap.begin() + 4;
    update_heap_pos(nodeheap.begin(), nodeheap.end(), it);
    print_heap(nodeheap, less<Node>());



    cout << endl;
    cout << "heap<Node *>" << endl;
    //PRINT_ELEMENTS(nodestarheap, "Initial heap");
    print_heap(nodestarheap, CompareNodeStar());

    make_heap(nodestarheap.begin(), nodestarheap.end(), CompareNodeStar());
    //PRINT_ELEMENTS(nodestarheap, "Made heap");
    print_heap(nodestarheap, CompareNodeStar());

    nodestarheap[4]->SetPriority(100);
    //PRINT_ELEMENTS(nodestarheap, "edited");
    print_heap(nodestarheap, CompareNodeStar());

    vector<Node *>::iterator itstar = nodestarheap.begin() + 4;
    update_heap_pos(nodestarheap.begin(), nodestarheap.end(), itstar, CompareNodeStar());
    //PRINT_ELEMENTS(nodestarheap, "Updated and sorted");
    print_heap(nodestarheap, CompareNodeStar());


    return 0;
}

And here is the modified heapplus.h file:
// Heap Plus implementation 1.01alpha
// lihw 
// 2006.12.18
// ChangeLog:
//  2009.08.10  - Added update_heap_pos functions so that we can adjust
//                heap values outside of update_heap functions -- e.g., if
//                we have external pointers into the heap entries -- then
//                call update_heap on the position only, regardless of the value.
//                (Danny Tarlow: dtarlow@cs.toronto.edu)
//  2006.12.18  - Initially created (lihw)

#ifndef HEAPPLUS_H_
#define HEAPPLUS_H_

#include <iterator>

namespace std {

template<typename _RandomAccessIterator, typename _Distance, typename _Tp, typename _Compare>
inline void
__up_heap(_RandomAccessIterator __first, _RandomAccessIterator __end, _RandomAccessIterator __pos,
        _Distance, _Tp __value,  _Compare __comp)
{
    _Distance __parent = (__pos - __first - 1) /  2;
    _Distance __index = __pos - __first;

    while (__index > 0 && __comp(*(__first + __parent), __value)) {
        *(__first + __index) = *(__first + __parent);
        __index = __parent;
        __parent = (__parent - 1) / 2;
    }
       
    if (__pos != (__first + __index))   
        *(__first + __index) = __value;
}

template<typename _RandomAccessIterator, typename _Distance, typename _Tp, typename _Compare>
inline void
__down_heap(_RandomAccessIterator __first, _RandomAccessIterator __last, _RandomAccessIterator __pos,
        _Distance, _Tp __value, _Compare __comp)
{
    _Distance __len = __last - __first;
    _Distance __index = __pos - __first;
    _Distance __left = __index * 2 + 1;
    _Distance __right = __index * 2 + 2;
    _Distance __largest;

    while (__index < __len) {
        if (__left < __len && __comp(*(__first + __right), *(__first + __left))) {
            __largest = __left;
        } else {
            __largest = __right;
        }
        if (__largest < __len && __comp(__value, *(__first + __largest))) {
            *(__first + __index) = *(__first + __largest);
            __index = __largest;
            __left = __index * 2 + 1;
            __right = __index * 2 + 2;
        } else
            break;
    }

    if (__pos != (__first + __index))   
        *(__first + __index) = __value;
}

template<typename _RandomAccessIterator, typename _Distance, typename _Tp, 
    typename _Compare>
inline void
__update_heap(_RandomAccessIterator __first, _RandomAccessIterator __last,
        _RandomAccessIterator __pos, _Distance, _Tp __value, _Compare __comp) 
{
    *(__pos) = __value;
    
    _Distance __index = (__pos - __first);
    _Distance __parent = (__index - 1) / 2;

    if (__index > 0 && __comp(*(__first + __parent), __value))
        __up_heap(__first, __last, __pos, _Distance(), __value, __comp);
    else
        __down_heap(__first, __last, __pos, _Distance(), __value, __comp);
}

template<typename _RandomAccessIterator, typename _Distance, typename _Compare>
inline void
__update_heap(_RandomAccessIterator __first, _RandomAccessIterator __last,
        _RandomAccessIterator __pos, _Distance, _Compare __comp) 
{
    _Distance __index = (__pos - __first);
    _Distance __parent = (__index - 1) / 2;

    if (__index > 0 && __comp(*(__first + __parent), *(__pos)))
      __up_heap(__first, __last, __pos, _Distance(), *(__pos), __comp);
    else
      __down_heap(__first, __last, __pos, _Distance(), *(__pos), __comp);
}

template<typename _RandomAccessIterator, typename _Tp>
inline void
update_heap(_RandomAccessIterator __first, _RandomAccessIterator __last,
        _RandomAccessIterator __pos, _Tp __value) 
{
      typedef typename iterator_traits<_RandomAccessIterator>::value_type _ValueType;
      typedef typename iterator_traits<_RandomAccessIterator>::difference_type _DistanceType;

      __update_heap(__first, __last, __pos, _DistanceType(), __value, less<_ValueType>());
}

template<typename _RandomAccessIterator, typename _Tp, typename _Compare>
inline void
update_heap(_RandomAccessIterator __first, _RandomAccessIterator __last,
        _RandomAccessIterator __pos, _Tp __value, _Compare __comp) 
{
      __update_heap(__first, __last, __pos, __value, __comp);
}


template<typename _RandomAccessIterator>
inline void
update_heap_pos(_RandomAccessIterator __first, _RandomAccessIterator __last,
  _RandomAccessIterator __pos) {
      typedef typename iterator_traits<_RandomAccessIterator>::value_type _ValueType;
      typedef typename iterator_traits<_RandomAccessIterator>::difference_type _DistanceType;

      __update_heap(__first, __last, __pos, _DistanceType(), less<_ValueType>());
}


template<typename _RandomAccessIterator, typename _Compare>
inline void
update_heap_pos(_RandomAccessIterator __first, _RandomAccessIterator __last,
  _RandomAccessIterator __pos, _Compare __comp) {
      typedef typename iterator_traits<_RandomAccessIterator>::value_type _ValueType;
      typedef typename iterator_traits<_RandomAccessIterator>::difference_type _DistanceType;

      __update_heap(__first, __last, __pos, _DistanceType(), __comp);
}



}; // namespace std

#endif // !HEAPPLUS_H_

And finally, here is the output from compiling and running the test case:
heap<double>
2 2 -0 1 1 0 0 0 0 -0 -1 -0 -1 -2 -2 -4 
2 2 1 1 0 0 -0 0 0 -0 -0 -1 -1 -2 -2 -4 
2 2 100 1 0 0 -0 0 0 -0 -0 -1 -1 -2 -2 -4 
100 2 2 1 0 0 -0 0 0 -0 -0 -1 -1 -2 -2 -4 

heap<Node>
(0,0):2 (3,3):2 (0,1):-0 (1,0):1 (3,2):1 (2,2):0 (3,1):0 (2,1):0 (2,3):0 (2,0):-0 (3,0):-1 (1,1):-0 (1,2):-1 (0,2):-2 (1,3):-2 (0,3):-4 
(3,3):2 (0,0):2 (3,2):1 (1,0):1 (3,1):0 (2,3):0 (1,1):-0 (2,2):0 (2,1):0 (2,0):-0 (0,1):-0 (1,2):-1 (3,0):-1 (0,2):-2 (1,3):-2 (0,3):-4 
(3,3):2 (0,0):2 (1,0):100 (3,2):1 (3,1):0 (2,3):0 (1,1):-0 (2,2):0 (2,1):0 (2,0):-0 (0,1):-0 (3,0):-1 (1,2):-1 (1,3):-2 (0,2):-2 (0,3):-4 
(1,0):100 (3,3):2 (0,0):2 (3,2):1 (3,1):0 (2,3):0 (1,1):-0 (2,2):0 (2,1):0 (2,0):-0 (0,1):-0 (3,0):-1 (1,2):-1 (1,3):-2 (0,2):-2 (0,3):-4 

heap<Node *>
(0,0):2 (3,3):2 (0,1):-0 (1,0):1 (3,2):1 (2,2):0 (3,1):0 (2,1):0 (2,3):0 (2,0):-0 (3,0):-1 (1,1):-0 (1,2):-1 (0,2):-2 (1,3):-2 (0,3):-4 
(3,3):2 (0,0):2 (3,2):1 (1,0):1 (3,1):0 (2,3):0 (1,1):-0 (2,2):0 (2,1):0 (2,0):-0 (0,1):-0 (1,2):-1 (3,0):-1 (0,2):-2 (1,3):-2 (0,3):-4 
(3,3):2 (0,0):2 (1,0):100 (3,2):1 (3,1):0 (2,3):0 (1,1):-0 (2,2):0 (2,1):0 (2,0):-0 (0,1):-0 (3,0):-1 (1,2):-1 (1,3):-2 (0,2):-2 (0,3):-4 
(1,0):100 (3,3):2 (0,0):2 (3,2):1 (3,1):0 (2,3):0 (1,1):-0 (2,2):0 (2,1):0 (2,0):-0 (0,1):-0 (3,0):-1 (1,2):-1 (1,3):-2 (0,2):-2 (0,3):-4 


Update: Yanbin Lu has taken this code and updated it further. See the software section from his homepage for the details.

5 comments:

Jonathan said...

I'm glad you found something that works (in the past I wrote my own mutable priority heap for the same reason).

I guess you didn't see that I answered your question already: http://stackoverflow.com/questions/1269563/using-boost-graph-library-propertymap-in-boostmutablequeue/1270003

Check out http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.58.2982 for an interesting way of getting fibonacci-heap like performance on top of pretty much any heap.

Danny Tarlow said...

Hey Jonathan,

Thanks for the reference -- that looks very interesting.

I also appreciate the answer over at stackoverflow, but it doesn't really get to the root of the problem. I double-checked to make sure that my definitions of the entry types matched what I was putting in the queue (Node vs Node *), but it was still giving me errors that I couldn't make much sense of.

I'd be curious to know more about your mutable heap. Did you write it completely from scratch? Have you published it anywhere?

fgb said...

One approach I've taken to create mutable priority queues is to subclass the STL priority_queue which gives you the ability to modify the underlying vector and then use make_heap() to maintain the heap property after making any changes to any of the items on the queue.

Danny Tarlow said...

fgb -- What is the complexity of that update, though? My understanding is that make_heap goes through every element of the vector, causing the update to be at least O(N). If you only change one element, you should be able to update the heap in O(log N) time.

pony said...

Thanks for share.
It's very useful for me.