unordered/doc/roadmap.md
2022-06-01 11:49:09 -07:00

6.6 KiB

Refactoring Roadmap

Proof of concept implementation for a fast closed-addressing implementation.

Plan of Refactoring

  • remove ptr_node and ptr_bucket
  • see if the code can survive a lack of the extra_node or maybe we hard-code it in
  • implement bucket groups as they are in fca but don't use them directly yet, add alongside the buckets_ data member in struct table
  • try to remove bucket_info_ from the node structure (breaks all call-sites that use get_bucket() and dependents)
  • make sure fca can successfully handle multi-variants at this stage + supports mutable iterators for map/multimap
  • do a hard-break:
    • update code to no longer use one single linked list across all buckets (each bucket contains its own unique list)
    • integrate the bucket_group<Node> structure into the table (update iterator call-sites to include bucket_iterators)

Blockers:

  • how to handle multi variants with new fca prototype

Implementation Differences

Unordered

Node Type

Bullet Points:

  • reify node type into a single one
  • come up with implementation for multi- variants
  • code that touches get_bucket() and *_in_group() member functions may need updating

There are two node types in Unordered, struct node and struct ptr_node, and the node type is selected conditionally based on the Allocator's pointer type:

template <typename A, typename T, typename NodePtr, typename BucketPtr>
struct pick_node2
{
  typedef boost::unordered::detail::node<A, T> node;
  // ...
};

template <typename A, typename T>
struct pick_node2<A, T, boost::unordered::detail::ptr_node<T>*,
  boost::unordered::detail::ptr_bucket*>
{ 
  typedef boost::unordered::detail::ptr_node<T> node;
  // ...
};

template <typename A, typename T> struct pick_node
{
  typedef typename boost::remove_const<T>::type nonconst;

  typedef boost::unordered::detail::allocator_traits<
    typename boost::unordered::detail::rebind_wrap<A,
      boost::unordered::detail::ptr_node<nonconst> >::type>
    tentative_node_traits;

  typedef boost::unordered::detail::allocator_traits<
    typename boost::unordered::detail::rebind_wrap<A,
      boost::unordered::detail::ptr_bucket>::type>
    tentative_bucket_traits;

  typedef pick_node2<A, nonconst, typename tentative_node_traits::pointer,
    typename tentative_bucket_traits::pointer>
    pick;

  typedef typename pick::node node;
  typedef typename pick::bucket bucket;
  typedef typename pick::link_pointer link_pointer;
};

The node types are identical in terms of interface and the only difference is that node is chosen when the Allocator uses fancy pointers and ptr_node is chosen when the Allocator's pointer type is T*.

Nodes in Unorderd store bucket_info_:

template <typename A, typename T>
struct node : boost::unordered::detail::value_base<T>
{
  link_pointer next_;
  std::size_t bucket_info_;
  node() : next_(), bucket_info_(0) {}
  // ...
};

bucket_info_ maps each node back to its corresponding bucket via the member function:

std::size_t get_bucket() const
{
  return bucket_info_ & ((std::size_t)-1 >> 1);
}

bucket_info_ is also used to demarcate the start of equivalent nodes in the containers via:

// Note that nodes start out as the first in their group, as `bucket_info_` defaults to 0.
std::size_t is_first_in_group() const
{ return !(bucket_info_ & ~((std::size_t)-1 >> 1)); }

void set_first_in_group()
{ bucket_info_ = bucket_info_ & ((std::size_t)-1 >> 1); }

void reset_first_in_group()
{ bucket_info_ = bucket_info_ | ~((std::size_t)-1 >> 1); }

A goal of refactoring is to simply have one node type:

template<class T>
struct node {
  node *next;
  T    value;
};

that is used unconditionally. This also requires updating the code that touches the bucket_info_ along with the code that that touches the *_in_group() member functions.

Bucket Type

Bullet points:

  • reify bucket structure into a single one
  • figure out how to add bucket_groups to the table struct

Buckets are similar to nodes in that there are two variations: template<class NodePointer> struct bucket and struct ptr_bucket.

The buckets exist to contain a pointer to a node, however they contain an enum { extra_node = true }; or enum { extra_node = false } to determine whether or not the code should explicitly allocate a default constructed node whose address assigned as the dummy node at the end of the bucket array.

extra_node is used in the creation and deletion of the bucket array but it is not inherently clear what its intended purpose is.

Iterators

Iterators are currently templated on the type of Node they store. Because fca constructs iterators with two arguments, all the call-sites that instantiate iterators will need to be updated but this a straight-forward mechanical change.

Iterators are selected, as of now, via the detail::map and detail::set class templates.

For example, for unordered_map, iterator is defined as:

typedef boost::unordered::detail::map<A, K, T, H, P> types;
typedef typename types::table table;
typedef typename table::iterator iterator;

The iterator is a member typedef of the table which is types::table. Examining types (aka detail::map<...>), we see:

template <typename A, typename K, typename M, typename H, typename P>
struct map {
  // ...
  typedef boost::unordered::detail::table<types> table;
  // ...
};

Examining the detail::table<types> struct, we see:

template <typename Types>
struct table {
  // ...
  typedef typename Types::iterator iterator;
  // ...
}

Collapsing all of this, we see that our iterator types are defined here:

template <typename A, typename K, typename M, typename H, typename P>
struct map
{
  // ...
  typedef boost::unordered::detail::pick_node<A, value_type> pick;
  typedef typename pick::node node;

  typedef boost::unordered::iterator_detail::iterator<node> iterator;
  typedef boost::unordered::iterator_detail::c_iterator<node> c_iterator;
  typedef boost::unordered::iterator_detail::l_iterator<node> l_iterator;
  typedef boost::unordered::iterator_detail::cl_iterator<node>
    cl_iterator;
  // ...
};

This is similarly designed for detail::set:

typedef boost::unordered::iterator_detail::c_iterator<node> iterator;
typedef boost::unordered::iterator_detail::c_iterator<node> c_iterator;
typedef boost::unordered::iterator_detail::cl_iterator<node> l_iterator;
typedef boost::unordered::iterator_detail::cl_iterator<node>
  cl_iterator;

The only difference here is that set::iterator is always a c_iterator, a const_iterator type.