This repository has been archived on 2022-06-22. You can view files and clone it, but cannot push or open issues or pull requests.
usaco-guide/content/6_Plat/Merging.mdx
Benjamin Qi 5935881a8d I/O
2020-07-03 20:54:13 -04:00

183 lines
6.1 KiB
Text

---
id: merging
title: "Small-To-Large Merging"
author: Michael Cao, Benjamin Qi
prerequisites:
- Silver - Depth First Search
- Gold - Point Update Range Sum
description: "?"
frequency: 1
---
import { Problem } from "../models";
export const metadata = {
problems: {
sam: [
new Problem("CSES", "Distinct Colors", "1139", "Intro", false, ["Merging"]),
],
general: [
new Problem("Silver", "Wormhole Sort", "992", "Easy", false, ["Merging"]),
new Problem("CF", "Lomsat gelral", "contest/600/problem/E", "Normal", false, ["Merging"]),
new Problem("Plat", "Promotion Counting", "696", "Normal", false, ["Merging", "Indexed Set"], ""),
new Problem("Plat", "Disruption", "842", "Normal", false, ["Merging"]),
new Problem("POI", "Tree Rotations", "https://szkopul.edu.pl/problemset/problem/sUe3qzxBtasek-RAWmZaxY_p/site/?key=statement", "Normal", false, ["Merging", "Indexed Set"], ""),
new Problem("Gold", "Favorite Colors", "1042", "Hard", false, ["DSU"], "Small to large merging is mentioned in the editorial, but we were unable to break solutions that just merged naively. Alternatively, just merge linked lists in $O(1)$ time."),
],
}
};
## Additional Reading
<resources>
<resource source="CPH" title="18.4 - Merging Data Structures"></resource>
<resource source="CF" title="Arpa - Sack (DSU on Tree)" url="blog/entry/44351"></resource>
<resource source="CF" title="tuwuna - Explaining DSU on Trees" url="blog/entry/67696"></resource>
</resources>
## Merging Data Structures
Obviously [linked lists](http://www.cplusplus.com/reference/list/list/splice/) can be merged in $O(1)$ time. But what about sets or vectors?
<problems-list problems={metadata.problems.sam} />
Let's consider a tree rooted at node $1$, where each node has a color.
For each node, let's store a set containing only that node, and we want to merge the sets in the nodes subtree together such that each node has a set consisting of all colors in the nodes subtree. Doing this allows us to solve a variety of problems, such as query the number of distinct colors in each subtree.
### Naive Solution
Suppose that we want merge two sets $a$ and $b$ of sizes $n$ and $m$, respectively. One possiblility is the following:
```cpp
for (int x: b) a.insert(x);
```
which runs in $O(m\log (n+m))$ time, yielding a runtime of $O(N^2\log N)$ in the worst case. If we instead maintain $a$ and $b$ as sorted vectors, we can merge them in $O(n+m)$ time, but $O(N^2)$ is also too slow.
### Better Solution
With just one additional line of code, we can significantly speed this up.
```cpp
if (a.size() < b.size()) swap(a,b);
for (int x: b) a.insert(x);
```
Note that [swap](http://www.cplusplus.com/reference/utility/swap/) exchanges two sets in $O(1)$ time. Thus, merging a smaller set of size $m$ into the larger one of size $n$ takes $O(m\log n)$ time.
**Claim:** The solution runs in $O(N\log^2N)$ time.
**Proof:** When merging two sets, you move from the smaller set to the larger set. If the size of the smaller set is $X$, then the size of the resulting set is at least $2X$. Thus, an element that has been moved $Y$ times will be in a set of size at least $2^Y$, and since the maximum size of a set is $N$ (the root), each element will be moved at most $O(\log N$) times.
<spoiler title="Full Code">
```cpp
#include <bits/stdc++.h>
using namespace std;
const int MX = 200005;
vector<int> adj[MX]; set<int> col[MX]; long long ans[MX];
void dfs(int v, int p){
for(int e : adj[v]){
if(e != p){
dfs(e, v);
if(col[v].size() < col[e].size()){
swap(col[v], col[e]);
}
for(int a : col[e]){
col[v].insert(a);
}
col[e].clear();
}
}
ans[v] = col[v].size();
}
int main() {
ios::sync_with_stdio(false);
cin.tie(0);
int n; cin >> n;
for(int i = 0; i < n; i++){
int x; cin >> x;
col[i].insert(x);
}
for(int i = 0; i < n - 1; i++){
int u,v; cin >> u >> v;
u--; v--;
adj[u].push_back(v); adj[v].push_back(u);
}
dfs(0,-1);
for(int i = 0; i < n; i++){
cout << ans[i] << " ";
}
}
```
</spoiler>
## Generalizing
A set doesn't have to be an `std::set`. Many data structures can be merged, such as `std::map` or `std:unordered_map`. However, `std::swap` doesn't necessarily work in $O(1)$ time; for example, swapping two [arrays](http://www.cplusplus.com/reference/array/array/swap/) takes time linear in the sum of the sizes of the arrays, and the same goes for indexed sets. For two indexed sets `a` and `b` we can use `a.swap(b)` in place of `swap(a,b)` (documentation?).
## Problems
<problems-list problems={metadata.problems.general} />
<spoiler title="Solution to Promotion Counting">
```cpp
#include <bits/stdc++.h>
#include <ext/pb_ds/tree_policy.hpp>
#include <ext/pb_ds/assoc_container.hpp>
using namespace std;
using namespace __gnu_pbds;
template<class T> using Tree = tree<T,null_type,less<T>,rb_tree_tag,tree_order_statistics_node_update>;
const int MX = 1e5+5;
#define sz(x) (int)(x).size()
int N, a[MX], ind[MX], ans[MX], ret;
vector<int> child[MX];
Tree<int> d[MX];
void comb(int a, int b) {
if (sz(d[a]) < sz(d[b])) d[a].swap(d[b]);
for (int i: d[b]) d[a].insert(i);
}
void dfs(int x) {
ind[x] = x;
for (int i: child[x]) {
dfs(i);
comb(x,i);
}
ans[x] = sz(d[x])-d[x].order_of_key(a[x]);
d[x].insert(a[x]);
}
int main() {
freopen("promote.in","r",stdin);
freopen("promote.out","w",stdout);
cin >> N; for (int i = 1; i <= N; ++i) cin >> a[i];
for (int i = 2; i <= N; ++i) {
int p; cin >> p;
child[p].push_back(i);
}
dfs(1);
for (int i = 1; i <= N; ++i) cout << ans[i] << "\n";
}
```
</spoiler>
(also: same solution w/o indexed set)
<optional-content title="Faster Merging">
It's easy to merge two sets of sizes $n\ge m$ in $O(n+m)$ or $(m\log n)$ time, but sometimes $O\left(m\log \left(1+\frac{n}{m}\right)\right)$ can be significantly better than both of these. Check "Advanced - Treaps" for more details. Also see [this link](https://codeforces.com/blog/entry/49446) regarding merging segment trees.
</optional-content>