new Problem("Gold", "Favorite Colors", "1042", "Hard", false, ["DSU"], "Small to large merging is mentioned in the editorial, but we were unable to break solutions that just merged naively. Alternatively, just merge linked lists in $O(1)$ time."),
For each node, let's store a set containing only that node, and we want to merge the sets in the nodes subtree together such that each node has a set consisting of all colors in the nodes subtree. Doing this allows us to solve a variety of problems, such as query the number of distinct colors in each subtree. Doing this naively, however, yields a runtime complexity of $O(N^2)$.
However, with just a few lines of code, we can significantly speed this up. Note that [swap](http://www.cplusplus.com/reference/utility/swap/) exchanges two sets in $O(1)$ time.
When merging two sets, you move from the smaller set to the larger set. If the size of the smaller set is $X$, then the size of the resulting set is at least $2X$. Thus, an element that has been moved $Y$ times will be in a set of size $2^Y$, and since the maximum size of a set is $N$ (the root), each element will be moved at most $O(\log N$) times leading to a total complexity of $O(N\log N)$.
A set doesn't have to be an `std::set`. Many data structures can be merged, such as `std::map` or `std:unordered_map`. However, `std::swap` doesn't necessarily work in $O(1)$ time; for example, swapping two [arrays](http://www.cplusplus.com/reference/array/array/swap/) takes time linear in the sum of the sizes of the arrays, and the same goes for indexed sets. For two indexed sets `a` and `b` we can use `a.swap(b)` in place of `swap(a,b)` (documentation?).
template<class T> using Tree = tree<T,null_type,less<T>,rb_tree_tag,tree_order_statistics_node_update>;
const int MX = 1e5+5;
#define sz(x) (int)(x).size()
int N, a[MX], ind[MX], ans[MX], ret;
vector<int> child[MX];
Tree<int> d[MX];
void comb(int a, int b) {
if (sz(d[a]) < sz(d[b])) d[a].swap(d[b]);
for (int i: d[b]) d[a].insert(i);
}
void dfs(int x) {
ind[x] = x;
for (int i: child[x]) {
dfs(i);
comb(x,i);
}
ans[x] = sz(d[x])-d[x].order_of_key(a[x]);
d[x].insert(a[x]);
}
int main() {
freopen("promote.in","r",stdin);
freopen("promote.out","w",stdout);
cin >> N; for (int i = 1; i <= N; ++i) cin >> a[i];
for (int i = 2; i <= N; ++i) {
int p; cin >> p;
child[p].push_back(i);
}
dfs(1);
for (int i = 1; i <= N; ++i) cout << ans[i] << "\n";
}
```
</spoiler>
(also: same solution w/o indexed set)
## Faster Merging (Optional)
It's easy to merge two sets of sizes $n\ge m$ in $O(n+m)$ or $(m\log n)$ time, but sometimes $O\left(m\log \frac{n}{m}\right)$ can be significantly better than both of these.