usaco-guide/content/6_Plat/Merging.mdx

---
id: merging
title: "Small-To-Large Merging"
author: Michael Cao
prerequisites: 
 - Silver - Depth First Search
 - Gold - Point Update Range Sum
description: "?"
frequency: 1
---

import { Problem } from "../models";

export const metadata = {
  problems: {
    sam: [
      new Problem("CSES", "Distinct Colors", "1139", "Intro", false, ["Merging"]),
    ],   
    general: [
      new Problem("CF", "Lomsat gelral", "contest/600/problem/E", "Normal", false, ["Merging"]),
      new Problem("Plat", "Promotion Counting", "696", "Normal", false, ["Merging", "Indexed Set"], ""),
      new Problem("Plat", "Disruption", "842", "Normal", false, ["Merging"]),
      new Problem("POI", "Tree Rotations", "https://szkopul.edu.pl/problemset/problem/sUe3qzxBtasek-RAWmZaxY_p/site/?key=statement", "Normal", false, ["Merging", "Indexed Set"], ""),
      new Problem("Gold", "Favorite Colors", "1042", "Hard", false, ["DSU"], "Small to large merging is mentioned in the editorial, but we were unable to break solutions that just merged naively. Alternatively, just merge linked lists in $O(1)$ time."),
    ],
    treeRot: [
      new Problem("POI", "Tree Rotations 2", "https://szkopul.edu.pl/problemset/problem/b0BM0al2crQBt6zovEtJfOc6/site/?key=statement", "Very Hard", false, [], ""),
    ]
  }
};

## Additional Reading

<resources>
  <resource source="CPH" title="18.4 - Merging Data Structures"></resource>
  <resource source="CF" title="Arpa - Sack (DSU on Tree)" url="blog/entry/44351"></resource>
  <resource source="CF" title="tuwuna - Explaining DSU on Trees" url="blog/entry/67696"></resource>
</resources>

## Merging Sets

<problems-list problems={metadata.problems.sam} />

Let's consider a tree rooted at node $1$, where each node has a color.

For each node, let's store a set containing only that node, and we want to merge the sets in the nodes subtree together such that each node has a set consisting of all colors in the nodes subtree. Doing this allows us to solve a variety of problems, such as query the number of distinct colors in each subtree. Doing this naively, however, yields a runtime complexity of $O(N^2)$. 

However, with just a few lines of code, we can significantly speed this up. Note that [swap](http://www.cplusplus.com/reference/utility/swap/) exchanges two sets in $O(1)$ time.

```cpp
if(a.size() < b.size()){ //for two sets a and b
  swap(a,b);
}
``` 

By merging the smaller set into the larger one, the runtime complexity becomes $O(N\log^2N)$.

### Proof

When merging two sets, you move from the smaller set to the larger set. If the size of the smaller set is $X$, then the size of the resulting set is at least $2X$. Thus, an element that has been moved $Y$ times will be in a set of size $2^Y$, and since the maximum size of a set is $N$ (the root), each element will be moved at most $O(\log N$) times leading to a total complexity of $O(N\log N)$.

<spoiler title="Full Code">

```cpp
#include <bits/stdc++.h>

using namespace std;

const int MX = 200005;

vector<int> adj[MX]; set<int> col[MX]; long long ans[MX];
void dfs(int v, int p){
    for(int e : adj[v]){
        if(e != p){
           dfs(e, v);
           if(col[v].size() < col[e].size()){
               swap(col[v], col[e]);
           }
           for(int a : col[e]){
               col[v].insert(a);
           }
           col[e].clear();
        }
    }
    ans[v] = col[v].size();
}
int main() {
    ios::sync_with_stdio(false);
    cin.tie(0);
    int n; cin >> n;
    for(int i = 0; i < n; i++){
        int x; cin >> x;
        col[i].insert(x);
    }
    for(int i = 0; i < n - 1; i++){
        int u,v; cin >> u >> v;
        u--; v--;
        adj[u].push_back(v); adj[v].push_back(u);
    }
    dfs(0,-1);
    for(int i = 0; i < n; i++){
        cout << ans[i] << " ";
    }
}
```

</spoiler>

## Generalizing

A set doesn't have to be an `std::set`. Many data structures can be merged, such as `std::map` or `std:unordered_map`. However, `std::swap` doesn't necessarily work in $O(1)$ time; for example, swapping two [arrays](http://www.cplusplus.com/reference/array/array/swap/) takes time linear in the sum of the sizes of the arrays, and the same goes for indexed sets. For two indexed sets `a` and `b` we can use `a.swap(b)` in place of `swap(a,b)` (documentation?).

<info-block title="Challenge">

Prove that if you instead merge sets that have size equal to the depths of the subtrees, then small to large merging does $O(N)$ insert calls.

(be specific about what this means?)

</info-block>

## Problems

<problems-list problems={metadata.problems.general} />

<spoiler title="Solution to Promotion Counting">

```cpp
#include <bits/stdc++.h>
#include <ext/pb_ds/tree_policy.hpp>
#include <ext/pb_ds/assoc_container.hpp>

using namespace std;
using namespace __gnu_pbds;

template<class T> using Tree = tree<T,null_type,less<T>,rb_tree_tag,tree_order_statistics_node_update>;

const int MX = 1e5+5;
#define sz(x) (int)(x).size()

int N, a[MX], ind[MX], ans[MX], ret;
vector<int> child[MX];
Tree<int> d[MX];

void comb(int a, int b) {
  if (sz(d[a]) < sz(d[b])) d[a].swap(d[b]);
  for (int i: d[b]) d[a].insert(i);
}

void dfs(int x) {
  ind[x] = x; 
  for (int i: child[x]) {
    dfs(i);
    comb(x,i);
  }
  ans[x] = sz(d[x])-d[x].order_of_key(a[x]);
  d[x].insert(a[x]);
}

int main() {
  freopen("promote.in","r",stdin);
  freopen("promote.out","w",stdout);
  cin >> N; for (int i = 1; i <= N; ++i) cin >> a[i];
  for (int i = 2; i <= N; ++i) {
    int p; cin >> p;
    child[p].push_back(i);
  }
  dfs(1); 
  for (int i = 1; i <= N; ++i) cout << ans[i] << "\n";
}
```

</spoiler>

(also: same solution w/o indexed set)

## Faster Merging (Optional)

It's easy to merge two sets of sizes $n\ge m$ in $O(n+m)$ or $(m\log n)$ time, but sometimes $O\left(m\log \frac{n}{m}\right)$ can be significantly better than both of these.

<resources>
  <resource source="CF" title="Splitting & Merging Segment Trees" url="https://codeforces.com/blog/entry/49446"></resource>
  <resource source="CF" title="Splitting & Merging BSTs" url="https://codeforces.com/blog/entry/67980"></resource>
</resources>

Requires knowledge of BSTs such as treaps or splay trees.

<problems-list problems={metadata.problems.treeRot} />
rearrange all, transfer descriptions 2020-06-22 14:26:06 +00:00			`---`
			`id: merging`
			`title: "Small-To-Large Merging"`
			`author: Michael Cao`
			`prerequisites:`
change prerequisites format 2020-06-22 20:51:12 +00:00			`- Silver - Depth First Search`
convert resources 2020-06-28 02:11:09 +00:00			`- Gold - Point Update Range Sum`
migrate some problems to new format 2020-06-24 21:28:57 +00:00			`description: "?"`
freqs 2020-06-26 18:00:32 +00:00			`frequency: 1`
rearrange all, transfer descriptions 2020-06-22 14:26:06 +00:00			`---`

migrate some problems to new format 2020-06-24 21:28:57 +00:00			`import { Problem } from "../models";`

			`export const metadata = {`
			`problems: {`
convert resources 2020-06-28 02:11:09 +00:00			`sam: [`
migrate some problems to new format 2020-06-24 21:28:57 +00:00			`new Problem("CSES", "Distinct Colors", "1139", "Intro", false, ["Merging"]),`
add example to slope trick, + DSU trick, merging 2020-06-28 19:59:33 +00:00			`],`
convert resources 2020-06-28 02:11:09 +00:00			`general: [`
migrate some problems to new format 2020-06-24 21:28:57 +00:00			`new Problem("CF", "Lomsat gelral", "contest/600/problem/E", "Normal", false, ["Merging"]),`
convert resources 2020-06-28 02:11:09 +00:00			`new Problem("Plat", "Promotion Counting", "696", "Normal", false, ["Merging", "Indexed Set"], ""),`
add example to slope trick, + DSU trick, merging 2020-06-28 19:59:33 +00:00			`new Problem("Plat", "Disruption", "842", "Normal", false, ["Merging"]),`
			`new Problem("POI", "Tree Rotations", "https://szkopul.edu.pl/problemset/problem/sUe3qzxBtasek-RAWmZaxY_p/site/?key=statement", "Normal", false, ["Merging", "Indexed Set"], ""),`
			`new Problem("Gold", "Favorite Colors", "1042", "Hard", false, ["DSU"], "Small to large merging is mentioned in the editorial, but we were unable to break solutions that just merged naively. Alternatively, just merge linked lists in $O(1)$ time."),`
			`],`
			`treeRot: [`
			`new Problem("POI", "Tree Rotations 2", "https://szkopul.edu.pl/problemset/problem/b0BM0al2crQBt6zovEtJfOc6/site/?key=statement", "Very Hard", false, [], ""),`
migrate some problems to new format 2020-06-24 21:28:57 +00:00			`]`
			`}`
			`};`

+ PAPS 2020-06-23 01:00:35 +00:00			`## Additional Reading`

more conversion 2020-06-27 03:07:31 +00:00			`<resources>`
			`<resource source="CPH" title="18.4 - Merging Data Structures"></resource>`
			`<resource source="CF" title="Arpa - Sack (DSU on Tree)" url="blog/entry/44351"></resource>`
			`<resource source="CF" title="tuwuna - Explaining DSU on Trees" url="blog/entry/67696"></resource>`
			`</resources>`
+ PAPS 2020-06-23 01:00:35 +00:00
convert resources 2020-06-28 02:11:09 +00:00			`## Merging Sets`
rearrange all, transfer descriptions 2020-06-22 14:26:06 +00:00
convert resources 2020-06-28 02:11:09 +00:00			`<problems-list problems={metadata.problems.sam} />`

			`Let's consider a tree rooted at node $1$, where each node has a color.`
rearrange all, transfer descriptions 2020-06-22 14:26:06 +00:00
			`For each node, let's store a set containing only that node, and we want to merge the sets in the nodes subtree together such that each node has a set consisting of all colors in the nodes subtree. Doing this allows us to solve a variety of problems, such as query the number of distinct colors in each subtree. Doing this naively, however, yields a runtime complexity of $O(N^2)$.`

convert resources 2020-06-28 02:11:09 +00:00			`However, with just a few lines of code, we can significantly speed this up. Note that [swap](http://www.cplusplus.com/reference/utility/swap/) exchanges two sets in $O(1)$ time.`
added full code to distinct colors, moved binjump to plat 2020-06-26 18:22:13 +00:00
rearrange all, transfer descriptions 2020-06-22 14:26:06 +00:00			```cpp
			`if(a.size() < b.size()){ //for two sets a and b`
lots 2020-06-23 02:17:59 +00:00			`swap(a,b);`
rearrange all, transfer descriptions 2020-06-22 14:26:06 +00:00			`}`
convert resources 2020-06-28 02:11:09 +00:00			```
+ PAPS 2020-06-23 01:00:35 +00:00
add example to slope trick, + DSU trick, merging 2020-06-28 19:59:33 +00:00			`By merging the smaller set into the larger one, the runtime complexity becomes $O(N\log^2N)$.`
rearrange all, transfer descriptions 2020-06-22 14:26:06 +00:00
convert resources 2020-06-28 02:11:09 +00:00			`### Proof`
rearrange all, transfer descriptions 2020-06-22 14:26:06 +00:00
convert resources 2020-06-28 02:11:09 +00:00			`When merging two sets, you move from the smaller set to the larger set. If the size of the smaller set is $X$, then the size of the resulting set is at least $2X$. Thus, an element that has been moved $Y$ times will be in a set of size $2^Y$, and since the maximum size of a set is $N$ (the root), each element will be moved at most $O(\log N$) times leading to a total complexity of $O(N\log N)$.`
rearrange all, transfer descriptions 2020-06-22 14:26:06 +00:00
convert resources 2020-06-28 02:11:09 +00:00			`<spoiler title="Full Code">`
added full code to distinct colors, moved binjump to plat 2020-06-26 18:22:13 +00:00
			```cpp
			`#include <bits/stdc++.h>`

			`using namespace std;`

			`const int MX = 200005;`

			`vector<int> adj[MX]; set<int> col[MX]; long long ans[MX];`
			`void dfs(int v, int p){`
			`for(int e : adj[v]){`
			`if(e != p){`
			`dfs(e, v);`
			`if(col[v].size() < col[e].size()){`
			`swap(col[v], col[e]);`
			`}`
			`for(int a : col[e]){`
			`col[v].insert(a);`
			`}`
			`col[e].clear();`
			`}`
			`}`
			`ans[v] = col[v].size();`
			`}`
			`int main() {`
			`ios::sync_with_stdio(false);`
			`cin.tie(0);`
			`int n; cin >> n;`
			`for(int i = 0; i < n; i++){`
			`int x; cin >> x;`
			`col[i].insert(x);`
			`}`
			`for(int i = 0; i < n - 1; i++){`
			`int u,v; cin >> u >> v;`
			`u--; v--;`
			`adj[u].push_back(v); adj[v].push_back(u);`
			`}`
			`dfs(0,-1);`
			`for(int i = 0; i < n; i++){`
			`cout << ans[i] << " ";`
			`}`
			`}`
			```

convert resources 2020-06-28 02:11:09 +00:00			`</spoiler>`

			`## Generalizing`

			A set doesn't have to be an `std::set`. Many data structures can be merged, such as `std::map` or `std:unordered_map`. However, `std::swap` doesn't necessarily work in $O(1)$ time; for example, swapping two [arrays](http://www.cplusplus.com/reference/array/array/swap/) takes time linear in the sum of the sizes of the arrays, and the same goes for indexed sets. For two indexed sets `a` and `b` we can use `a.swap(b)` in place of `swap(a,b)` (documentation?).
added full code to distinct colors, moved binjump to plat 2020-06-26 18:22:13 +00:00
rearrange all, transfer descriptions 2020-06-22 14:26:06 +00:00			`<info-block title="Challenge">`

			`Prove that if you instead merge sets that have size equal to the depths of the subtrees, then small to large merging does $O(N)$ insert calls.`

			`(be specific about what this means?)`

			`</info-block>`

			`## Problems`

added full code to distinct colors, moved binjump to plat 2020-06-26 18:22:13 +00:00			`<problems-list problems={metadata.problems.general} />`
add example to slope trick, + DSU trick, merging 2020-06-28 19:59:33 +00:00
			`<spoiler title="Solution to Promotion Counting">`

			```cpp
			`#include <bits/stdc++.h>`
			`#include <ext/pb_ds/tree_policy.hpp>`
			`#include <ext/pb_ds/assoc_container.hpp>`

			`using namespace std;`
			`using namespace __gnu_pbds;`

			`template<class T> using Tree = tree<T,null_type,less<T>,rb_tree_tag,tree_order_statistics_node_update>;`

			`const int MX = 1e5+5;`
			`#define sz(x) (int)(x).size()`

			`int N, a[MX], ind[MX], ans[MX], ret;`
			`vector<int> child[MX];`
			`Tree<int> d[MX];`

			`void comb(int a, int b) {`
			`if (sz(d[a]) < sz(d[b])) d[a].swap(d[b]);`
			`for (int i: d[b]) d[a].insert(i);`
			`}`

			`void dfs(int x) {`
			`ind[x] = x;`
			`for (int i: child[x]) {`
			`dfs(i);`
			`comb(x,i);`
			`}`
			`ans[x] = sz(d[x])-d[x].order_of_key(a[x]);`
			`d[x].insert(a[x]);`
			`}`

			`int main() {`
			`freopen("promote.in","r",stdin);`
			`freopen("promote.out","w",stdout);`
			`cin >> N; for (int i = 1; i <= N; ++i) cin >> a[i];`
			`for (int i = 2; i <= N; ++i) {`
			`int p; cin >> p;`
			`child[p].push_back(i);`
			`}`
			`dfs(1);`
			`for (int i = 1; i <= N; ++i) cout << ans[i] << "\n";`
			`}`
			```

			`</spoiler>`

			`(also: same solution w/o indexed set)`

			`## Faster Merging (Optional)`

			`It's easy to merge two sets of sizes $n\ge m$ in $O(n+m)$ or $(m\log n)$ time, but sometimes $O\left(m\log \frac{n}{m}\right)$ can be significantly better than both of these.`

			`<resources>`
			`<resource source="CF" title="Splitting & Merging Segment Trees" url="https://codeforces.com/blog/entry/49446"></resource>`
			`<resource source="CF" title="Splitting & Merging BSTs" url="https://codeforces.com/blog/entry/67980"></resource>`
			`</resources>`

			`Requires knowledge of BSTs such as treaps or splay trees.`

			`<problems-list problems={metadata.problems.treeRot} />`