website/content/posts/installing-every-arch-package.md
Anthony Wang 39f89c3734
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Add failed to commit transaction error
2022-01-30 22:17:54 -06:00

5.7 KiB

title date description type tags
Installing Every Arch Package 2022-01-26T21:52:58-06:00 Using algorithms and Julia to install as many packages as possible from the Arch Linux official repositories post
linux
fun
algorithms
computer-science

A stupid idea on Matrix

Challenge accepted. Let's do it!

First things first, let's generate a list of all official Arch Linux packages. Fortunately, pacman, the best pragmatic package manager in existence, makes this a breeze.

pacman -Sql

Great, now let's install it all!

pacman -Sql | xargs sudo pacman -S

10 seconds later, you'll find yourself with... unresolvable package conflicts detected?

OK, fine, let's disable dependency checking then:

pacman -Sql | xargs sudo pacman -Sdd

Nope, didn't work. We have to do something about the conflicting packages!

We could resolve all the conflicts manually with an hour of work... or we could write a program!

Automation

Time for some algorithms!

It's time to put our algorithms knowledge to good use. This is just a graph We can think of each package as a node in a graph and each conflict is an edge. Since we don't care about dependency checks (which would make for a likely broken system), we don't need to add any other edges to the graph.

For each edge, we need to pick at most one package, but not both. That sounds a lot like a maximum independent set!

Wait... it's NP hard though? And we have up to 12000 nodes, so we'll never be able to find the answer before the heat death of the universe, right?

Well, do we have 12000 connected nodes? No, since the largest connected component is probably only a few nodes. We aren't going to have hundreds or thousands of packages all conflicting with each other.

Implementing this in Julia

We're going to use Julia for implementing this algorithm, since Julia is Python but better. We first need to get a list of all packages:

pkgname = split(read(`pacman -Sql`, String))

N = length(pkgname)

Now, we'll get info about each package, using multithreading to speed things up:

struct Package
    provides::Vector{String}
    conflicts::Vector{String}
    size::Float64
end

pkginfo = Vector{Package}(undef, N)

Threads.@threads for i = 1:N
    pkg = pkgname[i]
    info = map(x -> split(replace(split(x, "\n")[1], "None" => "")), split(read(`pacman -Si $pkg`, String), " : "))
    push!(info[10], pkg)
    pkginfo[i] = Package(info[10], info[13], parse(Float64, info[16][1]))
end

We need special handling for virtual packages:

providedby = Dict{String, Vector{Int}}()

for i = 1:N
	for p in pkginfo[i].provides
		p = split(p, "=")[1]
		if !(p in keys(providedby))
			providedby[p] = Vector{Int}()
		end
		push!(providedby[p], i)
	end
end

We can use this to construct the graph:

G = [Set{Int}() for i = 1:N]

for i = 1:N
	for p in pkginfo[i].conflicts
		if p in keys(providedby)
			for j in providedby[p]
				if j != i
					push!(G[i], j)
					push!(G[j], i)
				end
			end
		end
	end
end

Now we can find each connected component using BFS, and brute-force the maximum independent set by trying every subset of the nodes in that component. It's implemented here using some bit manipulation trickery.

ans = BitSet(1:N)

used = BitSet()

for i = 1:N
	if !(i in used)
		push!(used, i)
		component = Vector{Int}()
		queue = Vector{Int}([i])
		while !isempty(queue)
			u = popfirst!(queue)
			push!(component, u)
			for v in G[u]
				if !(v in used)
					push!(used, v)
					push!(queue, v)
				end
			end
		end

		M = length(component)
		best = (0, 0.0, 0)
		for m = 1:(1<<M)-1
			good = true
			for j = 1:M
				if (m>>(j-1))&1 == 1
					for k = j+1:M
						if (m>>(k-1))&1 == 1 && component[j] in G[component[k]]
							good = false
						end
					end
				end
			end
			if !good
				continue
			end

			cnt = length([j for j = 1:M if (m>>(j-1))&1 == 1])
			size = sum([pkginfo[component[j]].size for j = 1:M if (m>>(j-1))&1 == 1])
			best = max((cnt, size, m), best)
		end

		for j = 1:M
			if (best[3]>>(j-1))&1 != 1
				delete!(ans, component[j])
			end
		end
	end
end

Let's save it to a file:

open("out", "w") do f
	for i in ans
		println(f, pkgname[i])
	end
end

Alright, time to install everything! This takes about 60 minutes depending on your internet connection. Make sure you have the multilib repository enabled, and you may manually need to install iptables-nft before running this command.

cat out | xargs sudo pacman -Sdd --noconfirm

At the time of this writing, I'm not done installing everything quite yet, but I'll update this post when I'm done.

Update: I got an error!

error: failed to commit transaction (conflicting files)
/usr/lib/python3.10/site-packages/tests/__init__.py exists in both 'python-pybtex' and 'python-wiktionaryparser'
/usr/lib/python3.10/site-packages/tests/__pycache__/__init__.cpython-310.opt-1.pyc exists in both 'python-pybtex' and 'python-wiktionaryparser'
/usr/lib/python3.10/site-packages/tests/__pycache__/__init__.cpython-310.pyc exists in both 'python-pybtex' and 'python-wiktionaryparser'
/usr/bin/sl exists in both 'python-softlayer' and 'sl'
/usr/bin/singularity exists in both 'singularity' and 'singularity-container'
/usr/lib/SoapySDR/modules0.8/libairspySupport.so exists in both 'soapyairspy' and 'soapyosmo'
/usr/share/tessdata/osd.traineddata exists in both 'tesseract' and 'tesseract-data-osd'
Errors occurred, no packages were upgraded.

I'll fix it later. Stay tuned.