mirror of
https://github.com/boostorg/unordered.git
synced 2025-05-12 05:51:44 +00:00
Fill in more of the unordered container documentation.
[SVN r3042]
This commit is contained in:
parent
e9e503be3f
commit
f6222b10e2
104
doc/buckets.qbk
104
doc/buckets.qbk
@ -1,28 +1,26 @@
|
|||||||
[section:buckets The Data Structure]
|
[section:buckets The Data Structure]
|
||||||
|
|
||||||
The containers are made up of a number of 'buckets', each of which can contain
|
The containers are made up of a number of 'buckets', each of which can contain
|
||||||
any number of elements. For example, the following
|
any number of elements. For example, the following diagram shows an [classref
|
||||||
diagram shows an [classref boost::unordered_set unordered_set] with 7
|
boost::unordered_set unordered_set] with 7 buckets containing 5 elements, `A`,
|
||||||
buckets containing 5 elements, `A`, `B`, `C`, `D` and `E`
|
`B`, `C`, `D` and `E` (this is just for illustration, in practise containers
|
||||||
(this is just for illustrations, the containers have more buckets, even when
|
will have more buckets).
|
||||||
empty).
|
|
||||||
|
|
||||||
[$../diagrams/buckets.png]
|
[$../diagrams/buckets.png]
|
||||||
|
|
||||||
In order to decide which bucket to place an element in, the container
|
In order to decide which bucket to place an element in, the container applies
|
||||||
applies `Hash` to the element (for maps it applies it to the element's `Key`
|
`Hash` to the element's key (for `unordered_set` and `unordered_multiset` the
|
||||||
part). This gives a `std::size_t`. `std::size_t` has a much greater range of
|
key is the whole element, but this refered to as the key so that the same
|
||||||
values then the number of buckets, so that container applies another
|
terminology can be used for sets and maps). This gives a `std::size_t`.
|
||||||
transformation to that value to choose a bucket (in the case of
|
`std::size_t` has a much greater range of values then the number of buckets, so
|
||||||
[classref boost::unordered_set] this is just the modulous of the number of
|
that container applies another transformation to that value to choose a bucket
|
||||||
buckets).
|
to place the element in.
|
||||||
|
|
||||||
If at a later date the container wants to find an element in the container
|
If at a later date the container wants to find an element in the container it
|
||||||
it just has to apply the same process to the element (or key for maps) to
|
just has to apply the same process to the element's key to discover which
|
||||||
discover which bucket to find it in. This means that you only have to look at
|
bucket to find it in. This means that you only have to look at the elements
|
||||||
the elements within a bucket when searching, and if the hash function has
|
within a single bucket. If the hash function has worked well the elements will
|
||||||
worked well an evenly distributed the elements among the buckets, this should
|
be evenly distributed amongst the buckets.
|
||||||
be a small number.
|
|
||||||
|
|
||||||
You can see in the diagram that `A` & `D` have been placed in the same bucket.
|
You can see in the diagram that `A` & `D` have been placed in the same bucket.
|
||||||
This means that when looking in this bucket, up to 2 comparison have to be
|
This means that when looking in this bucket, up to 2 comparison have to be
|
||||||
@ -44,6 +42,10 @@ fast we try to keep these to a minimum.
|
|||||||
[``size_type bucket_size(size_type n) const``]
|
[``size_type bucket_size(size_type n) const``]
|
||||||
[The number of elements in bucket `n`.]
|
[The number of elements in bucket `n`.]
|
||||||
]
|
]
|
||||||
|
[
|
||||||
|
[``size_type bucket(key_type const& k) const``]
|
||||||
|
[Returns the index of the bucket which would contain k]
|
||||||
|
]
|
||||||
[
|
[
|
||||||
[``
|
[``
|
||||||
local_iterator begin(size_type n);
|
local_iterator begin(size_type n);
|
||||||
@ -65,36 +67,34 @@ The standard gives you two methods to influence the bucket count. First you can
|
|||||||
specify the minimum number of buckets in the constructor, and later, by calling
|
specify the minimum number of buckets in the constructor, and later, by calling
|
||||||
`rehash`.
|
`rehash`.
|
||||||
|
|
||||||
The other method is the `max_load_factor` member function. This lets you
|
The other method is the `max_load_factor` member function. The 'load factor'
|
||||||
/hint/ at the maximum load that the buckets should hold.
|
is the average number of elements per bucket, `max_load_factor` can be used
|
||||||
The 'load factor' is the average number of elements per bucket,
|
to give a /hint/ of a value that the load factor should be kept below. The
|
||||||
the container tries to keep this below the maximum load factor, which is
|
draft standard doesn't actually require the container to pay much attention
|
||||||
initially set to 1.0.
|
to this value. The only time the load factor is /required/ to be less than the
|
||||||
`max_load_factor` tells the container to change the maximum load factor,
|
maximum is following a call to `rehash`. But most implementations will probably
|
||||||
using your supplied hint as a suggestion.
|
try to keep the number of elements below the max load factor, and set the
|
||||||
|
maximum load factor something the same or near to your hint - unless your hint
|
||||||
|
is unreasonably small.
|
||||||
|
|
||||||
The draft standard doesn't actually require the container to pay much attention
|
It is not specified anywhere how member functions other than `rehash` affect
|
||||||
to this value. The only time the load factor is required to be less than the
|
the bucket count, although `insert` is only allowed to invalidate iterators
|
||||||
maximum is following a call to `rehash`.
|
when the insertion causes the load factor to reach the maximum. Which will
|
||||||
|
typically mean that insert will only change the number of buckets when an
|
||||||
|
insert causes this.
|
||||||
|
|
||||||
It is not specified anywhere how other member functions affect the bucket count.
|
In a similar manner to using `reserve` for `vector`s, it can be a good idea
|
||||||
But most implementations will invalidate the iterators whenever they change
|
to call `rehash` before inserting a large number of elements. This will get
|
||||||
the bucket count - which is only allowed when an
|
the expensive rehashing out of the way and let you store iterators, safe in
|
||||||
`insert` causes the load factor to be more than or equal to the maximum.
|
the knowledge that they won't be invalidated. If you are inserting `n`
|
||||||
But it is possible to implement the containers such that the iterators are
|
elements into container `x`, you could first call:
|
||||||
never invalidated.
|
|
||||||
|
|
||||||
(TODO: This might not be right. I'm not sure what is allowed for
|
x.rehash((x.size() + n) / x.max_load_factor() + 1);
|
||||||
std::unordered_set and std::unordered_map when insert is called with enough
|
|
||||||
elements to exceed the maximum, but the maximum isn't exceeded because
|
|
||||||
the elements are already in the container)
|
|
||||||
|
|
||||||
(TODO: Ah, I forgot about local iterators - rehashing must invalidate ranges
|
[blurb Note: `rehash`'s argument is the number of buckets, not the number of
|
||||||
made up of local iterators, right?).
|
elements, which is why the new size is divided by the maximum load factor. The
|
||||||
|
`+ 1` is required because the container is allowed to resize when the load
|
||||||
This all sounds quite gloomy, but it's not that bad. Most implementations
|
factor is equal to the maximum load factor.]
|
||||||
will probably respect the maximum load factor hint. This implementation
|
|
||||||
certainly does.
|
|
||||||
|
|
||||||
[table Methods for Controlling Bucket Size
|
[table Methods for Controlling Bucket Size
|
||||||
[[Method] [Description]]
|
[[Method] [Description]]
|
||||||
@ -119,20 +119,14 @@ certainly does.
|
|||||||
|
|
||||||
]
|
]
|
||||||
|
|
||||||
[h2 Rehash Techniques]
|
[/ I'm not at all happy with this section. So I've commented it out.]
|
||||||
|
|
||||||
If the container has a load factor much smaller than the maximum, `rehash`
|
[/ h2 Rehash Techniques]
|
||||||
|
|
||||||
|
[/If the container has a load factor much smaller than the maximum, `rehash`
|
||||||
might decrease the number of buckets, reducing the memory usage. This isn't
|
might decrease the number of buckets, reducing the memory usage. This isn't
|
||||||
guaranteed by the standard but this implementation will do it.
|
guaranteed by the standard but this implementation will do it.
|
||||||
|
|
||||||
When inserting many elements, it is a good idea to first call `rehash` to
|
|
||||||
make sure you have enough buckets. This will get the expensive rehashing out
|
|
||||||
of the way and let you store iterators, safe in the knowledge that they
|
|
||||||
won't be invalidated. If you are inserting `n` elements into container `x`,
|
|
||||||
you could first call:
|
|
||||||
|
|
||||||
x.rehash((x.size() + n) / x.max_load_factor() + 1);
|
|
||||||
|
|
||||||
If you want to stop the table from ever rehashing due to an insert, you can
|
If you want to stop the table from ever rehashing due to an insert, you can
|
||||||
set the maximum load factor to infinity (or perhaps a load factor that it'll
|
set the maximum load factor to infinity (or perhaps a load factor that it'll
|
||||||
never reach - say `x.max_size()`. As you can only give a 'hint' for the maximum
|
never reach - say `x.max_size()`. As you can only give a 'hint' for the maximum
|
||||||
@ -144,6 +138,6 @@ maybe the implementation should cope with that).
|
|||||||
If you do this and want to make the container rehash, `rehash` will still work.
|
If you do this and want to make the container rehash, `rehash` will still work.
|
||||||
But be careful that you only ever call it with a sufficient number of buckets
|
But be careful that you only ever call it with a sufficient number of buckets
|
||||||
- otherwise it's very likely that the container will decrease the bucket
|
- otherwise it's very likely that the container will decrease the bucket
|
||||||
count to an overly small amount.
|
count to an overly small amount.]
|
||||||
|
|
||||||
[endsect]
|
[endsect]
|
||||||
|
@ -1,8 +1,9 @@
|
|||||||
[section:comparison Comparison to Associative Containers]
|
[section:comparison Comparison with Associative Containers]
|
||||||
|
|
||||||
* The elements in an unordered container are organised into buckets, in an
|
* The elements in an unordered container are organised into buckets, in an
|
||||||
unpredictable order. There are member functions to.... TODO
|
unpredictable order. There are member functions to access these buckets which
|
||||||
* The unordered associative containers don't support the comparison operators.
|
was described earlier.
|
||||||
|
* The unordered associative containers don't support any comparison operators.
|
||||||
* Instead of being parameterized by an ordering relation `Compare`,
|
* Instead of being parameterized by an ordering relation `Compare`,
|
||||||
the unordered associative container are parameterized by a function object
|
the unordered associative container are parameterized by a function object
|
||||||
`Hash` and an equivalence realtion `Pred`. The member types and accessor
|
`Hash` and an equivalence realtion `Pred`. The member types and accessor
|
||||||
|
@ -18,6 +18,118 @@ but not the equality predicate, while if you were to change the behaviour
|
|||||||
of the equality predicate you would have to change the hash function to match
|
of the equality predicate you would have to change the hash function to match
|
||||||
it.
|
it.
|
||||||
|
|
||||||
For example, if you wanted to use
|
For example, if you wanted to use the
|
||||||
|
[@http://www.isthe.com/chongo/tech/comp/fnv/ FNV-1 hash] you could write:
|
||||||
|
|
||||||
|
``[classref boost::unordered_set]``<std::string, hash::fnv_1> words;
|
||||||
|
|
||||||
|
An example implementation of FNV-1, and some other hash functions are supplied
|
||||||
|
in the examples directory.
|
||||||
|
|
||||||
|
Alternatively, you might wish to use a different equality function. If so, make
|
||||||
|
sure you use a hash function that matches it. For example, a
|
||||||
|
case-insensitive dictionary:
|
||||||
|
|
||||||
|
struct iequal_to
|
||||||
|
: std::binary_function<std::string, std::string, bool>
|
||||||
|
{
|
||||||
|
bool operator()(std::string const& x,
|
||||||
|
std::string const& y) const
|
||||||
|
{
|
||||||
|
return boost::algorithm::iequals(x, y);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
struct ihash
|
||||||
|
: std::unary_function<std::string, bool>
|
||||||
|
{
|
||||||
|
bool operator()(std::string const& x) const
|
||||||
|
{
|
||||||
|
std::size_t seed = 0;
|
||||||
|
|
||||||
|
for(std::string::const_iterator it = x.begin();
|
||||||
|
it != x.end(); ++it)
|
||||||
|
{
|
||||||
|
boost::hash_combine(seed, std::tolower(*it));
|
||||||
|
}
|
||||||
|
|
||||||
|
return seed;
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
struct word_info {
|
||||||
|
// ...
|
||||||
|
};
|
||||||
|
|
||||||
|
boost::unordered_map<std::string, word_info, iequal_to, ihash>
|
||||||
|
idictionary;
|
||||||
|
|
||||||
|
[h2 Custom Types]
|
||||||
|
|
||||||
|
Similarly, a custom hash function can be used for custom types:
|
||||||
|
|
||||||
|
struct point {
|
||||||
|
int x;
|
||||||
|
int y;
|
||||||
|
};
|
||||||
|
|
||||||
|
bool operator==(point const& p1, point const& p2)
|
||||||
|
{
|
||||||
|
return p1.x == p2.x && p1.y == p2.y;
|
||||||
|
}
|
||||||
|
|
||||||
|
struct point_hash
|
||||||
|
: std::unary_function<point, std::size_t>
|
||||||
|
{
|
||||||
|
std::size_t operator()(point const& p) const
|
||||||
|
{
|
||||||
|
std::size_t seed = 0;
|
||||||
|
boost::hash_combine(seed, p.x);
|
||||||
|
boost::hash_combine(seed, p.y);
|
||||||
|
return seed;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
boost::unordered_multiset<point, std::equal_to<point>, point_hash>
|
||||||
|
points;
|
||||||
|
|
||||||
|
Although, customizing Boost.Hash is probably a better solution:
|
||||||
|
|
||||||
|
struct point {
|
||||||
|
int x;
|
||||||
|
int y;
|
||||||
|
};
|
||||||
|
|
||||||
|
bool operator==(point const& p1, point const& p2)
|
||||||
|
{
|
||||||
|
return p1.x == p2.x && p1.y == p2.y;
|
||||||
|
}
|
||||||
|
|
||||||
|
std::size_t hash_value(point const& x) {
|
||||||
|
std::size_t seed = 0;
|
||||||
|
boost::hash_combine(seed, p.x);
|
||||||
|
boost::hash_combine(seed, p.y);
|
||||||
|
return seed;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Now the default functions work.
|
||||||
|
boost::unordered_multiset<point> points;
|
||||||
|
|
||||||
|
See the Boost.Hash documentation for more detail on how to do this. Remember
|
||||||
|
that it relies on extensions to the draft standard - so it won't work on other
|
||||||
|
implementations of the unordered associative containers.
|
||||||
|
|
||||||
|
[table Methods for accessing the hash and euqality functions.
|
||||||
|
[[Method] [Description]]
|
||||||
|
|
||||||
|
[
|
||||||
|
[``hasher hash_function() const``]
|
||||||
|
[Returns the container's hash function.]
|
||||||
|
]
|
||||||
|
[
|
||||||
|
[``key_equal key_eq() const``]
|
||||||
|
[Returns the container's key equality function.]
|
||||||
|
]
|
||||||
|
]
|
||||||
|
|
||||||
[endsect]
|
[endsect]
|
||||||
|
@ -20,9 +20,8 @@ on average. The worst case complexity is linear, but that occurs rarely and
|
|||||||
with some care, can be avoided.
|
with some care, can be avoided.
|
||||||
|
|
||||||
Also, the existing containers require a 'less than' comparison object
|
Also, the existing containers require a 'less than' comparison object
|
||||||
to order their elements. For some data types this is impracticle.
|
to order their elements. For some data types this is impossible to implement
|
||||||
It might be slow to calculate, or even impossible. On the other hand, in a hash
|
or isn't practicle. For a hash table you need an equality function
|
||||||
table, then elements aren't ordered - but you need an equality function
|
|
||||||
and a hash function for the key.
|
and a hash function for the key.
|
||||||
|
|
||||||
So the __tr1__ introduced the unordered associative containers, which are
|
So the __tr1__ introduced the unordered associative containers, which are
|
||||||
|
Loading…
x
Reference in New Issue
Block a user