usaco-guide/content/6_Plat/Bitsets.mdx

---
id: bitsets
title: "Bitsets"
author: Benjamin Qi
description: "Several examples of how bitsets give some unintended solutions on recent USACO problems."
prerequisites:
 - Errichto - Bitwise Operations Pt 1
frequency: 2
---

import { Problem } from "../models";

export const metadata = {
  problems: {
    school: [
      new Problem("CSES", "School Excursion", "1706", "Easy", false, ["Knapsack", "Bitset"], ""),
    ],
    cow: [
      new Problem("Gold", "Cowpatibility", "862", "Normal", false, ["PIE", "Bitset"], ""),
    ],
    lots: [
      new Problem("Plat", "Lots of Triangles", "672", "Normal", false, ["Geometry", "Bitset"], ""),
    ],
    bfs: [
      new Problem("CSA", "Substring Restrictions", "substring-restrictions", "Hard", false, ["DSU"], "")
    ],
    ad: [
      new Problem("Plat", "Equilateral Triangles", "1021", "Normal", false, ["Bitset, Sliding Window"], "Again, the intended solution runs in $O(N^3)$. Of course, it is still possible to pass $O(N^4)$ solutions with bitset! See the analysis [here](http://www.usaco.org/current/data/sol_triangles_platinum_feb20.html)."),
      new Problem("CSES", "BOI Nautilus", "https://cses.fi/247/submit/B", "Normal", false, ["Bitset"], ""),
    ],
  }
};

## Tutorial

tl;dr some operations are 32x-64x faster compared to a boolean array. See the [C++ Reference](http://www.cplusplus.com/reference/bitset/bitset/) for the operations you can perform.

<resources>
  <resource source="CF" title="Errichto - Bitwise Operations Pt 2" url="blog/entry/73558"></resource>
</resources>

## Knapsack

<problems-list problems={metadata.problems.school} />

Of course, the first step is to generate the sizes of each connected component.

<spoiler title="Input">

```cpp
#include <bits/stdc++.h>
using namespace std;

struct DSU {
	vector<int> e; void init(int N) { e = vector<int>(N,-1); }
	int get(int x) { return e[x] < 0 ? x : e[x] = get(e[x]); }
	bool sameSet(int a, int b) { return get(a) == get(b); }
	int size(int x) { return -e[get(x)]; }
	bool unite(int x, int y) { // union by size
		x = get(x), y = get(y); if (x == y) return 0;
		if (e[x] > e[y]) swap(x,y);
		e[x] += e[y]; e[y] = x; return 1;
	}
};

DSU D;
int n,m;
vector<int> comps;

void init() {
	cin >> n >> m; D.init(n);
	for (int i = 0; i < m; ++i) {
		int a,b; cin >> a >> b;
		D.unite(a-1,b-1);
	}
	for (int i = 0; i < n; ++i) if (D.get(i) == i)
		comps.push_back(D.size(i));
}
```

</spoiler>

A naive knapsack solution would be as follows. For each $0\le i\le \texttt{comps.size()}$, let $\texttt{dp}[i][j]=1$ if there exists a subset of the first $i$ components whose sizes sum to $j$. Then the answer will be stored in $\texttt{dp}[i]$. This runs in $O(N^2)$ and is too slow if implemented naively, but we can use bitset to speed it up!

Note: you can't store all $N$ bitsets in memory at the same time (more on that below).

<spoiler title="Full Solution">

```cpp
int main() {
	init();
	bitset<100001> posi; posi[0] = 1;
	for (int t: comps) posi |= posi<<t;
	for (int i = 1; i <= n; ++i) cout << posi[i];
	cout << "\n";
}
```

</spoiler>

**Challenge**: This solution runs in $\approx 0.3\text{s}$ when $N=10^5$ and there are no edges. Find a faster solution which can also be sped up with bitset (my solution runs in 0.03s).

## Cowpatibility (Gold)

<problems-list problems={metadata.problems.cow} />

Label the cows from $0\ldots N-1$. For two cows $x$ and $y$ set `adj[x][y]=1` if they share a common flavor. Then the number of pairs of cows that are compatible (counting each pair where $x$ and $y$ are distinct twice) is equal to the sum of `adj[x].count()` over all $x$. It remains to compute `adj[x]` for all $x$.

Unfortunately, storing $N$ bitsets each with $N$ bits takes up $\frac{50000^2}{32}\cdot 4=312.5\cdot 10^6$ bytes of memory, which is greater than USACO's $256$ megabyte limit. We can reduce the memory usage by half in exchange for a slight increase in time by first computing the adjacency bitsets for all $x\in [0,N/2)$, and then for all $x\in [N/2,N)$ afterwards.

First, we read in all of the flavors.

<spoiler title="Input">

```cpp
#include <bits/stdc++.h>
using namespace std;

typedef long long ll;
typedef bitset<50000> B;
const int HALF = 25000;

int N;
B adj[HALF];
vector<int> flav[1000001];
ll ans;

void input() {
	ios_base::sync_with_stdio(0); cin.tie(0);
	freopen("cowpatibility.in","r",stdin);
	freopen("cowpatibility.out","w",stdout);
	cin >> N;
	for (int i = 0; i < N; ++i)
		for (int j = 0; j < 5; ++j) {
			int x; cin >> x;
			flav[x].push_back(i);
		}
}
```

</spoiler>

Then for each flavor, we can look at all pairs of cows that share that flavor and update the adjacency lists for those $x\in [0,HALF)$.

```cpp
int main() {
	input();
	for (int i = 1; i <= 1000000; ++i)
		for (int x: flav[i]) if (x < HALF) for (int y: flav[i]) adj[x][y] = 1;
	for (int i = 0; i < HALF; ++i) ans += adj[i].count();
}
```

`adj[i].count()` runs quickly enough since its runtime is divided by the bitset constant. However, looping over all cows in `flav[i]` is too slow if say, `flav[i]` contains all cows. Then the nested loop could take $\Theta(N^2)$ time! Of course, we can instead write the nested loop in a way that takes advantage of fast bitset operations once again.

```cpp
for (int i = 1; i <= 1000000; ++i) if (flav[i].size() > 0) {
	B b; for (int x: flav[i]) b[x] = 1;
	for (int x: flav[i]) if (x < HALF) adj[x] |= b;
}
```

The full main function is as follows:

<spoiler title="Full Solution">

```cpp
int main() {
	input();
	for (int i = 1; i <= 1000000; ++i) if (flav[i].size() > 0) {
		B b; for (int x: flav[i]) b[x] = 1;
		for (int x: flav[i]) if (x < HALF) adj[x] |= b;
	}
	for (int i = 0; i < HALF; ++i) ans += adj[i].count();
	for (int i = 0; i < HALF; ++i) adj[i].reset();
	for (int i = 1; i <= 1000000; ++i) if (flav[i].size() > 0) {
		B b; for (int x: flav[i]) b[x] = 1;
		for (int x: flav[i]) if (x >= HALF) adj[x-HALF] |= b;
	}
	for (int i = 0; i < HALF; ++i) ans += adj[i].count();
	cout << ((ll)N*N-ans)/2 << "\n";
}
```

</spoiler>

Apparently no test case contains more than $25000$ distinct colors, so we don't actually need to split the calculation into two halves.

## Lots of Triangles

<problems-list problems={metadata.problems.lots} />

First, we read in the input data. `cross(a,b,c)` is positive iff `c` lies to the left of the line from `a` to `b`.

<spoiler title="Input">

```cpp
#include <bits/stdc++.h>
using namespace std;

typedef long long ll;
typedef pair<ll,ll> P;

#define f first
#define s second

ll cross(P a, P b, P c) {
	b.f -= a.f, b.s -= a.s;
	c.f -= a.f, c.s -= a.s;
	return b.f*c.s-b.s*c.f;
}

vector<P> v;
int N;

void input() {
	ios_base::sync_with_stdio(0); cin.tie(0);
	freopen("triangles.in","r",stdin);
	freopen("triangles.out","w",stdout);
	cin >> N; v.resize(N);
	for (P& p: v) cin >> p.f >> p.s;
}
```

</spoiler>

There are $O(N^3)$ possible lots. Trying all possible lots and counting the number of trees that lie within each in $O(N)$ for a total time complexity of $O(N^4)$ should solve somewhere between 2 and 5 test cases. Given a triangle `t[0], t[1], t[2]` with positive area, tree `x` lies within it iff `x` is to the left of each of sides `(t[0],t[1])`,` (t[1],t[2])`, and `(t[2],t[0])`.

<spoiler title="Slow Solution">

```cpp
int main() {
	input();
	vector<int> res(N-2);
	for (int i = 0; i < N; ++i)
		for (int j = i+1; j < N; ++j)
			for (int k = j+1; k < N; ++k) {
				vector<int> t = {i,j,k};
				if (cross(v[t[0]],v[t[1]],v[t[2]]) < 0) swap(t[1],t[2]);
				int cnt = 0;
				for (int x = 0; x < N; ++x) {
					if (cross(v[t[0]],v[t[1]],v[x]) <= 0) continue;
					if (cross(v[t[1]],v[t[2]],v[x]) <= 0) continue;
					if (cross(v[t[2]],v[t[0]],v[x]) <= 0) continue;
					cnt ++;
				}
				res[cnt] ++;
			}
	for (int i = 0; i < N-2; ++i) cout << res[i] << "\n";
}
```

</spoiler>

The analysis describes how to count the number of trees within a lot in $O(1)$, which is sufficient to solve the problem. However, $O(N)$ is actually sufficient as long as we divide by the bitset constant. Let `b[i][j][k]=1` if `k` lies to the left of side `(i,j)`. Then `x` lies within triangle `(t[0],t[1],t[2])` as long as `b[t[0]][t[1]][x]=b[t[1]][t[2]][x]=b[t[2]][t[0]][x]=1`. We can count the number of `x` such that this holds true by taking the bitwise AND of the bitsets for all three sides and then counting the number of bits in the result.

<spoiler title="Fast Solution">

```cpp
bitset<300> b[300][300];

int main() {
	input();
	for (int i = 0; i < N; ++i)
		for (int j = 0; j < N; ++j) if (j != i)
			for (int k = 0; k < N; ++k) if (cross(v[i],v[j],v[k]) > 0)
				b[i][j][k] = 1;
	vector<int> res(N-2);
	for (int i = 0; i < N; ++i)
		for (int j = i+1; j < N; ++j)
			for (int k = j+1; k < N; ++k) {
				vector<int> t = {i,j,k};
				if (cross(v[t[0]],v[t[1]],v[t[2]]) < 0) swap(t[1],t[2]);
				auto z = b[t[0]][t[1]]&b[t[1]][t[2]]&b[t[2]][t[0]];
				res[z.count()] ++;
			}
	for (int i = 0; i < N-2; ++i) cout << res[i] << "\n";
}
```
</spoiler>

## Knapsack Again (GP of Bytedance 2020 F)

> Given $n$ ($n\le 2\cdot 10^4$) positive integers $a_1,\ldots,a_n$ ($a_i\le 2\cdot 10^4$), find the max possible sum of a subset of $a_1,\ldots,a_n$ whose sum does not exceed $c$.

Consider the case when $\sum a_i\ge c$. The intended solution runs in $O(n\cdot \max(a_i))$; see [here](https://github.com/bqi343/USACO/blob/master/Implementations/content/various/Knapsack.h) for more information. However, we'll solve it with bitset instead.

As with the first problem in this module, let $\texttt{dp}[i][j]=1$ if there exists a subset of the first numbers components that sums to $j$. This solution runs in $O(n\cdot \sum a_i)$ time, which is too slow even if we use bitset.

Taking inspiration from [this](https://codeforces.com/blog/entry/67664) CF blog post, we'll first shuffle the integers randomly and perform the DP with the following modification:

 - If $\left|\frac{ci}{n}-j\right| \ge X$ for some $X$ that we choose, then set $\texttt{dp}[i][j]=0$.

Since we only need to keep track of $2X+1$ values for each $i$, this solution runs in $O(nX)$ time, which runs in time with $X=5\cdot 10^5$ using bitset.

Intuitively, the random shuffle reduces the optimal subset to some random walk which should have variance at most $\max a_i\cdot \sqrt N$, so it suffices to take $X\approx \max a_i\cdot \sqrt N$. (Though I'm not completely convinced that this works, does anyone know how to bound the failure probability of this algorithm precisely?)

<spoiler title="Solution">

```cpp
#include <bits/stdc++.h>
using namespace std;

typedef long long ll;

int n,c;
const int Z = 1000000;
mt19937 rng;

int solve() {
	cin >> n >> c;
	vector<int> a(n); int sum = 0;
	for (int& x: a) {
		cin >> x;
		sum += x;
	}
	if (sum <= c) return sum;
	shuffle(begin(a),end(a),rng);
	bitset<Z> B; B[Z/2] = 1;
	ll lst = 0;
	for (int i = 0; i < n; ++i) {
		ll cur = (ll)(i+1)*c/n;
		int dif = cur-lst; lst = cur;
		auto tmp = B>>dif;
		ll wut = a[i]-dif;
		if (wut >= 0) B = tmp|(B<<wut);
		else B = tmp|(B>>(-wut));
	}
	for (int i = Z/2; i >= 0; --i) if (B[i] == 1) return c-(Z/2-i);
	return 0;
}

int main() {
	int T; cin >> T;
	for (int i = 0; i < T; ++i) cout << solve() << "\n";
}
```
</spoiler>

## Other Applications

Use to speed up the following:

 - Gaussian Elimination in $O(N^3)$
 - Bipartite matching in $O(N^3)$
 - BFS in $O(N^2)$

Operations such as `_Find_first()` and `_Find_next()` mentioned in Errichto's blog are helpful. (are these documented?)

Regarding the last application:

<problems-list problems={metadata.problems.bfs} />

In USACO Camp, this problem appeared with $N\le 10^5$ and a large time limit ...

## Additional Problems

<problems-list problems={metadata.problems.ad} />