usaco-guide/content/6_Plat/Merging.mdx

---
id: merging
title: "Small-To-Large Merging"
author: Michael Cao
prerequisites:
 - Silver - Depth First Search
description: "?"
frequency: 1
---

import { Problem } from "../models";

export const metadata = {
  problems: {
    general: [
      new Problem("CSES", "Distinct Colors", "1139", "Intro", false, ["Merging"]),
      new Problem("CF", "Lomsat gelral", "contest/600/problem/E", "Normal", false, ["Merging"]),
      new Problem("Gold", "Favorite Colors", "1042", "Hard", false, ["DSU"], "Small to large merging is mentioned in the editorial, but we were unable to break solutions that just merged naively."),
      new Problem("Plat", "Disruption", "842", "Hard", false, ["Merging"]),
      new Problem("Plat", "Promotion Counting", "696", "Hard", false, ["Merging"], "Merge indexed sets"),
    ]
  }
};

## Additional Reading

  - CPH 18.4 - Merging Data Structures
  - CF Blogs
    - [Arpa](https://codeforces.com/blog/entry/44351)
    - [tuwuna](https://codeforces.com/blog/entry/67696)

# Merging Sets

Let's consider a tree, rooted at node $1$, where each node has a color (see [CSES Distinct Colors](https://cses.fi/problemset/task/1139)).

For each node, let's store a set containing only that node, and we want to merge the sets in the nodes subtree together such that each node has a set consisting of all colors in the nodes subtree. Doing this allows us to solve a variety of problems, such as query the number of distinct colors in each subtree. Doing this naively, however, yields a runtime complexity of $O(N^2)$.

However, with just a few lines of code, we can significantly speed this up.

```cpp
if(a.size() < b.size()){ //for two sets a and b
  swap(a,b);
}
```
In other words, by merging the smaller set into the larger one, the runtime complexity becomes $O(N\log N).$

<details>
<summary> Proof </summary>

When merging two sets, you move from the smaller set to the larger set. If the size of the smaller set is $X$, then the size of the resulting set is at least $2X$. Thus, an element that has been moved $Y$ times will be in a set of size $2^Y$, and since the maximum size of a set is $N$ (the root), each element will be moved at most $O(\log N$) times leading to a total complexity of $O(N\log N)$.
</details>

Additionally, a set doesn't have to be an `std::set`. Many data structures can be merged, such as `std::map` or even adjacency lists.

<details>
<summary> Full Code </summary>

```cpp
#include <bits/stdc++.h>

using namespace std;

const int MX = 200005;

vector<int> adj[MX]; set<int> col[MX]; long long ans[MX];
void dfs(int v, int p){
    for(int e : adj[v]){
        if(e != p){
           dfs(e, v);
           if(col[v].size() < col[e].size()){
               swap(col[v], col[e]);
           }
           for(int a : col[e]){
               col[v].insert(a);
           }
           col[e].clear();
        }
    }
    ans[v] = col[v].size();
}
int main() {
    ios::sync_with_stdio(false);
    cin.tie(0);
    int n; cin >> n;
    for(int i = 0; i < n; i++){
        int x; cin >> x;
        col[i].insert(x);
    }
    for(int i = 0; i < n - 1; i++){
        int u,v; cin >> u >> v;
        u--; v--;
        adj[u].push_back(v); adj[v].push_back(u);
    }
    dfs(0,-1);
    for(int i = 0; i < n; i++){
        cout << ans[i] << " ";
    }
}
```

</details>

<info-block title="Challenge">

Prove that if you instead merge sets that have size equal to the depths of the subtrees, then small to large merging does $O(N)$ insert calls.

(be specific about what this means?)

</info-block>

## Problems

(note about .swap() vs swap)

<problems-list problems={metadata.problems.general} />