--- id: merging title: "Small-To-Large Merging" author: Michael Cao, Benjamin Qi prerequisites: - Silver - Depth First Search - Gold - Point Update Range Sum description: "?" frequency: 1 --- import { Problem } from "../models"; export const metadata = { problems: { sam: [ new Problem("CSES", "Distinct Colors", "1139", "Intro", false, ["Merging"]), ], general: [ new Problem("Silver", "Wormhole Sort", "992", "Easy", false, ["Merging"]), new Problem("CF", "Lomsat gelral", "contest/600/problem/E", "Normal", false, ["Merging"]), new Problem("Plat", "Promotion Counting", "696", "Normal", false, ["Merging", "Indexed Set"], ""), new Problem("Plat", "Disruption", "842", "Normal", false, ["Merging"]), new Problem("POI", "Tree Rotations", "https://szkopul.edu.pl/problemset/problem/sUe3qzxBtasek-RAWmZaxY_p/site/?key=statement", "Normal", false, ["Merging", "Indexed Set"], ""), new Problem("Gold", "Favorite Colors", "1042", "Hard", false, ["DSU"], "Small to large merging is mentioned in the editorial, but we were unable to break solutions that just merged naively. Alternatively, just merge linked lists in $O(1)$ time."), ], } }; ## Additional Reading ## Merging Data Structures Obviously [linked lists](http://www.cplusplus.com/reference/list/list/splice/) can be merged in $O(1)$ time. But what about sets or vectors? Let's consider a tree rooted at node $1$, where each node has a color. For each node, let's store a set containing only that node, and we want to merge the sets in the nodes subtree together such that each node has a set consisting of all colors in the nodes subtree. Doing this allows us to solve a variety of problems, such as query the number of distinct colors in each subtree. ### Naive Solution Suppose that we want merge two sets $a$ and $b$ of sizes $n$ and $m$, respectively. One possiblility is the following: ```cpp for (int x: b) a.insert(x); ``` which runs in $O(m\log (n+m))$ time, yielding a runtime of $O(N^2\log N)$ in the worst case. If we instead maintain $a$ and $b$ as sorted vectors, we can merge them in $O(n+m)$ time, but $O(N^2)$ is also too slow. ### Better Solution With just one additional line of code, we can significantly speed this up. ```cpp if (a.size() < b.size()) swap(a,b); for (int x: b) a.insert(x); ``` Note that [swap](http://www.cplusplus.com/reference/utility/swap/) exchanges two sets in $O(1)$ time. Thus, merging a smaller set of size $m$ into the larger one of size $n$ takes $O(m\log n)$ time. **Claim:** The solution runs in $O(N\log^2N)$ time. **Proof:** When merging two sets, you move from the smaller set to the larger set. If the size of the smaller set is $X$, then the size of the resulting set is at least $2X$. Thus, an element that has been moved $Y$ times will be in a set of size at least $2^Y$, and since the maximum size of a set is $N$ (the root), each element will be moved at most $O(\log N$) times. ```cpp #include using namespace std; const int MX = 200005; vector adj[MX]; set col[MX]; long long ans[MX]; void dfs(int v, int p){ for(int e : adj[v]){ if(e != p){ dfs(e, v); if(col[v].size() < col[e].size()){ swap(col[v], col[e]); } for(int a : col[e]){ col[v].insert(a); } col[e].clear(); } } ans[v] = col[v].size(); } int main() { ios::sync_with_stdio(false); cin.tie(0); int n; cin >> n; for(int i = 0; i < n; i++){ int x; cin >> x; col[i].insert(x); } for(int i = 0; i < n - 1; i++){ int u,v; cin >> u >> v; u--; v--; adj[u].push_back(v); adj[v].push_back(u); } dfs(0,-1); for(int i = 0; i < n; i++){ cout << ans[i] << " "; } } ``` ## Generalizing A set doesn't have to be an `std::set`. Many data structures can be merged, such as `std::map` or `std:unordered_map`. However, `std::swap` doesn't necessarily work in $O(1)$ time; for example, swapping two [arrays](http://www.cplusplus.com/reference/array/array/swap/) takes time linear in the sum of the sizes of the arrays, and the same goes for indexed sets. For two indexed sets `a` and `b` we can use `a.swap(b)` in place of `swap(a,b)` (documentation?). ## Problems ```cpp #include #include #include using namespace std; using namespace __gnu_pbds; template using Tree = tree,rb_tree_tag,tree_order_statistics_node_update>; const int MX = 1e5+5; #define sz(x) (int)(x).size() int N, a[MX], ind[MX], ans[MX], ret; vector child[MX]; Tree d[MX]; void comb(int a, int b) { if (sz(d[a]) < sz(d[b])) d[a].swap(d[b]); for (int i: d[b]) d[a].insert(i); } void dfs(int x) { ind[x] = x; for (int i: child[x]) { dfs(i); comb(x,i); } ans[x] = sz(d[x])-d[x].order_of_key(a[x]); d[x].insert(a[x]); } int main() { freopen("promote.in","r",stdin); freopen("promote.out","w",stdout); cin >> N; for (int i = 1; i <= N; ++i) cin >> a[i]; for (int i = 2; i <= N; ++i) { int p; cin >> p; child[p].push_back(i); } dfs(1); for (int i = 1; i <= N; ++i) cout << ans[i] << "\n"; } ``` (also: same solution w/o indexed set) It's easy to merge two sets of sizes $n\ge m$ in $O(n+m)$ or $(m\log n)$ time, but sometimes $O\left(m\log \left(1+\frac{n}{m}\right)\right)$ can be significantly better than both of these. Check "Advanced - Treaps" for more details. Also see [this link](https://codeforces.com/blog/entry/49446) regarding merging segment trees.