This repository has been archived on 2022-06-22. You can view files and clone it, but cannot push or open issues or pull requests.
usaco-guide/content/4_Gold/Faster_Hash.mdx
Benjamin Qi 088d0260e1 Gold
2020-07-18 20:22:28 -04:00

146 lines
No EOL
4.3 KiB
Text

---
id: faster-hashmap
title: A Faster Hash Table in C++
author: Benjamin Qi
description: "Introduces gp_hash_table."
frequency: 1
prerequisites:
- unordered
---
import { Problem } from "../models";
export const metadata = {
problems: {
three: [
new Problem("Gold", "3SUM", "994", "Normal", false, [], ""),
],
}
};
<Resources>
<Resource source="CF" url="https://codeforces.com/blog/entry/60737" title="Chilli - Order of magnitude faster hash tables" starred>Introduces gp_hash_table</Resource>
<Resource source="GCC" url="https://gcc.gnu.org/onlinedocs/libstdc++/ext/pb_ds/gp_hash_table.html#Resize_Policy566860465" title="gp_hash_table Interface">documentation</Resource>
<Resource source="Benq (from KACTL)" title="HashMap" url="https://github.com/bqi343/USACO/blob/master/Implementations/content/data-structures/STL%20(5)/HashMap.h" starred> </Resource>
</Resources>
<br />
Read / writes are much faster than `unordered_map`. Its actual size is always a power of 2. The documentation is rather confusing, so I'll just summarize the most useful functions here.
```cpp
#include <ext/pb_ds/assoc_container.hpp>
using namespace __gnu_pbds;
```
## Unordered Set
`gp_hash_table<K,null_type>` functions similarly to `unordered_set<K>`.
## Hacking
`gp_hash_table` is also vulnerable to hacking.
(example of hash function that fails)
To avoid this, we can use one of the custom hash functions mentioned in the Bronze module.
```cpp
template<class K,class V> using ht = gp_hash_table<K,V,chash>;
```
## Resizing
Unordered map has [`reserve`](http://www.cplusplus.com/reference/unordered_map/unordered_map/reserve/). Calling this function before inserting any elements can result in a constant factor speedup.
We can modify the declaration of `gp_hash_table` so that it supports the `resize` function, which operates similarly.
```cpp
template<class K,class V> using ht = gp_hash_table<
K,
null_type,
hash<K>,
equal_to<K>,
direct_mask_range_hashing<>,
linear_probe_fn<>,
hash_standard_resize_policy<
hash_exponential_size_policy<>,
hash_load_check_resize_trigger<>,
true
>
>;
```
These are the same template arguments as the default `gp_hash_table`, except `false` has been changed to `true`. This modification allows us to change the actual size of the hash table.
```cpp
int main() {
ht<int,null_type> g; g.resize(5);
cout << g.get_actual_size() << "\n"; // 8
cout << g.size() << "\n"; // 0
}
```
When calling `g.resize(x)`, `x` is rounded up to the nearest power of 2. Then the actual size of `g` is changed to be equal to `x` (unless `x < g.size()`, in which case an error is thrown).
<Resources>
<Resource source="GCC" url="https://gcc.gnu.org/onlinedocs/libstdc++/ext/pb_ds/hash_standard_resize_policy.html" title="Resize Policy">documentation</Resource>
</Resources>
Furthermore, if we construct `g` with the following arguments:
```cpp
ht<int,null_type> g({},{},{},{},{1<<16});
```
then the actual size of `g` is always at least `1<<16` (regardless of calls to `resize`). The last argument **must** be a power of 2 (or else errors will be thrown).
### Solving ThreeSum
<Problems problems={metadata.problems.three} />
You're supposed to use array since values are small :|
```cpp
#include <bits/stdc++.h>
using namespace std;
void setIO(string name) {
ios_base::sync_with_stdio(0); cin.tie(0);
freopen((name+".in").c_str(),"r",stdin);
freopen((name+".out").c_str(),"w",stdout);
}
#include <ext/pb_ds/assoc_container.hpp>
using namespace __gnu_pbds;
int N,Q;
long long ans[5000][5000];
vector<int> A;
int main() {
setIO("threesum");
cin >> N >> Q;
A.resize(N); for (int i = 0; i < N; ++i) cin >> A[i];
for (int i = 0; i < N; ++i) {
gp_hash_table<int,int> g({},{},{},{},{1<<13});
// initialize with certain capacity, must be power of 2
for (int j = i+1; j < N; ++j) {
int res = -A[i]-A[j];
auto it = g.find(res);
if (it != end(g)) ans[i][j] = it->second;
g[A[j]] ++;
}
}
for (int i = N-1; i >= 0; --i) for (int j = i+1; j < N; ++j)
ans[i][j] += ans[i+1][j]+ans[i][j-1]-ans[i+1][j-1];
for (int i = 0; i < Q; ++i) {
int a,b; cin >> a >> b;
cout << ans[a-1][b-1] << "\n";
}
// you should actually read the stuff at the bottom
}
```
<IncompleteSection />