8.8 KiB
title | date | draft | description | type | tags | ||||
---|---|---|---|---|---|---|---|---|---|
All of Unicode on One Poster | 2023-09-13T00:18:52Z | true | The surprisingly long journey of making a large poster with every Unicode character | post |
|
Hello everyone, and welcome back to this episode of The World's Worst Ideas! Maybe you remember when I installed every Arch package. Well today, we're going to continue the theme of collecting 'em all, this time, Unicode characters.
So a while back, I wrote a website with every Unicode character (warning: may crash your web browser). Now that was fun and all, but you can't really see all the characters at the same time, and also this page kills web browsers. Today, I stumbled across a 1400-page book with every Unicode 4.0 character. (There's also an updated Unicode 5.0 version but they wisely decided to leave out CJK characters and Hangul syllables so that book is half the size.) But there aren't that many Unicode characters, less than 150,000 as of Unicode 15.0, so what if I just printed them all out on a large enough poster?
Fortunately, I happen to have a large canvas, roughly 85 by 130 centimeters. That's around 2.7 by 2.7 milimeters (sqrt(85*130cm^2/150000)) for each of the 150,000 characters, which is still large enough to see. The canvas fits 18 letter-sized pieces of paper, 6 across and 3 down. So, my plan is to create 18 images using a Python script, print them out, and tape them to the canvas to stitch together the ultimate Unicode poster.
Python conveniently has a very obscure module called unicodedata
in its standard library, which can tell you if a codepoint is a valid Unicode character and the character's category. Unfortunately, the latest Python release, 3.11's version of that module only supports Unicode 14.0, not the latest and greatest 15.0, so there's a handful of characters and weird emojis that won't be on the poster! No!
Python 3.12.0rc2 though does support Unicode 15.0, so I built and installed it from the AUR. This went surprisingly smoothly. Alright, let's make a list of all valid Unicode characters and their categories.
import pickle
import unicodedata
print(unicodedata.unidata_version) # should be 15.0.0
cnt = {}
lc = []
for i in range(0, 2**18):
c = unicodedata.category(chr(i))
if c not in cnt:
cnt[c] = 0
cnt[c] += 1
if c[0] != 'C':
lc.append((i, c))
print(cnt)
print('Total characters:', len(lc))
pickle.dump(lc, open('lc', 'wb'))
One problem down, what next? So now I need to create those 18 images each with 8320 characters. I initially tried using Pillow, but it doesn't support font fallback, so most of the characters show up as empty rectangles. Also, Pillow is kind of slow. That's no good.
To make a long story short, fonts are a huge pain. I finally found a library with font fallback called imagetext-py, but there wasn't a prebuilt binary for Python 3.12. I built one myself, but it didn't work very well, so let's just go back to using Python 3.11.
OK, this is getting uglier and uglier, I give up (for now). Here's the code so far for your viewing enjoyment.
import pickle
import imagetext_py
lc = pickle.load(open('lc', 'rb'))
imagetext_py.FontDB.LoadSystemFonts()
imagetext_py.FontDB.LoadFromDir('.')
# ls /usr/share/fonts/noto/ | grep Sans | grep Regular | sed 's/.ttf//g' | xargs echo | fish_clipboard_copy
font = imagetext_py.FontDB.Query('NotoSansAdlam-Regular NotoSansAdlamUnjoined-Regular NotoSansAnatolianHieroglyphs-Regular NotoSansArabic-Regular NotoSansArmenian-Regular NotoSansAvestan-Regular NotoSansBalinese-Regular NotoSansBamum-Regular NotoSansBassaVah-Regular NotoSansBatak-Regular NotoSansBengali-Regular NotoSansBengaliUI-Regular NotoSansBhaiksuki-Regular NotoSansBrahmi-Regular NotoSansBuginese-Regular NotoSansBuhid-Regular NotoSansCanadianAboriginal-Regular NotoSansCarian-Regular NotoSansCaucasianAlbanian-Regular NotoSansChakma-Regular NotoSansCham-Regular NotoSansCherokee-Regular NotoSansChorasmian-Regular NotoSansCoptic-Regular NotoSansCuneiform-Regular NotoSansCypriot-Regular NotoSansCyproMinoan-Regular NotoSansDeseret-Regular NotoSansDevanagari-Regular NotoSansDevanagariUI-Regular NotoSansDuployan-Regular NotoSansEgyptianHieroglyphs-Regular NotoSansElbasan-Regular NotoSansElymaic-Regular NotoSansEthiopic-Regular NotoSansGeorgian-Regular NotoSansGlagolitic-Regular NotoSansGothic-Regular NotoSansGrantha-Regular NotoSansGujarati-Regular NotoSansGujaratiUI-Regular NotoSansGunjalaGondi-Regular NotoSansGurmukhi-Regular NotoSansGurmukhiUI-Regular NotoSansHanifiRohingya-Regular NotoSansHanunoo-Regular NotoSansHatran-Regular NotoSansHebrew-Regular NotoSansImperialAramaic-Regular NotoSansIndicSiyaqNumbers-Regular NotoSansInscriptionalPahlavi-Regular NotoSansInscriptionalParthian-Regular NotoSansJavanese-Regular NotoSansKaithi-Regular NotoSansKannada-Regular NotoSansKannadaUI-Regular NotoSansKawi-Regular NotoSansKayahLi-Regular NotoSansKharoshthi-Regular NotoSansKhmer-Regular NotoSansKhojki-Regular NotoSansKhudawadi-Regular NotoSansLaoLooped-Regular NotoSansLao-Regular NotoSansLepcha-Regular NotoSansLimbu-Regular NotoSansLinearA-Regular NotoSansLinearB-Regular NotoSansLisu-Regular NotoSansLycian-Regular NotoSansLydian-Regular NotoSansMahajani-Regular NotoSansMalayalam-Regular NotoSansMalayalamUI-Regular NotoSansMandaic-Regular NotoSansManichaean-Regular NotoSansMarchen-Regular NotoSansMasaramGondi-Regular NotoSansMath-Regular NotoSansMayanNumerals-Regular NotoSansMedefaidrin-Regular NotoSansMeeteiMayek-Regular NotoSansMendeKikakui-Regular NotoSansMeroitic-Regular NotoSansMiao-Regular NotoSansModi-Regular NotoSansMongolian-Regular NotoSansMono-Regular NotoSansMro-Regular NotoSansMultani-Regular NotoSansMyanmar-Regular NotoSansNabataean-Regular NotoSansNagMundari-Regular NotoSansNandinagari-Regular NotoSansNewa-Regular NotoSansNewTaiLue-Regular NotoSansNKo-Regular NotoSansNKoUnjoined-Regular NotoSansNushu-Regular NotoSansOgham-Regular NotoSansOlChiki-Regular NotoSansOldHungarian-Regular NotoSansOldItalic-Regular NotoSansOldNorthArabian-Regular NotoSansOldPermic-Regular NotoSansOldPersian-Regular NotoSansOldSogdian-Regular NotoSansOldSouthArabian-Regular NotoSansOldTurkic-Regular NotoSansOriya-Regular NotoSansOsage-Regular NotoSansOsmanya-Regular NotoSansPahawhHmong-Regular NotoSansPalmyrene-Regular NotoSansPauCinHau-Regular NotoSansPhags-Pa-Regular NotoSansPhoenician-Regular NotoSansPsalterPahlavi-Regular NotoSans-Regular NotoSansRejang-Regular NotoSansRunic-Regular NotoSansSamaritan-Regular NotoSansSaurashtra-Regular NotoSansSharada-Regular NotoSansShavian-Regular NotoSansSiddham-Regular NotoSansSignWriting-Regular NotoSansSinhala-Regular NotoSansSinhalaUI-Regular NotoSansSogdian-Regular NotoSansSoraSompeng-Regular NotoSansSoyombo-Regular NotoSansSundanese-Regular NotoSansSylotiNagri-Regular NotoSansSymbols2-Regular NotoSansSymbols-Regular NotoSansSyriacEastern-Regular NotoSansSyriac-Regular NotoSansSyriacWestern-Regular NotoSansTagalog-Regular NotoSansTagbanwa-Regular NotoSansTaiLe-Regular NotoSansTaiTham-Regular NotoSansTaiViet-Regular NotoSansTakri-Regular NotoSansTamil-Regular NotoSansTamilSupplement-Regular NotoSansTamilUI-Regular NotoSansTangsa-Regular NotoSansTelugu-Regular NotoSansTeluguUI-Regular NotoSansTest-Regular NotoSansThaana-Regular NotoSansThaiLooped-Regular NotoSansThai-Regular NotoSansTifinaghAdrar-Regular NotoSansTifinaghAgrawImazighen-Regular NotoSansTifinaghAhaggar-Regular NotoSansTifinaghAir-Regular NotoSansTifinaghAPT-Regular NotoSansTifinaghAzawagh-Regular NotoSansTifinaghGhat-Regular NotoSansTifinaghHawad-Regular NotoSansTifinagh-Regular NotoSansTifinaghRhissaIxa-Regular NotoSansTifinaghSIL-Regular NotoSansTifinaghTawellemmet-Regular NotoSansTirhuta-Regular NotoSansUgaritic-Regular NotoSansVai-Regular NotoSansVithkuqi-Regular NotoSansWancho-Regular NotoSansWarangCiti-Regular NotoSansYi-Regular NotoSansZanabazarSquare-Regular SourceHanSansCN-Regular PlangothicP2-Regular') # Rip
black = imagetext_py.Paint.Color((0, 0, 0, 255))
for j in range(3):
for i in range(6):
cv = imagetext_py.Canvas(2560, 3328, (255, 255, 255, 255))
for k in range(80 * 104):
x = k % 80
y = k // 80
p = 80 * i + x + (104 * j + y) * 80 * 6 # this works, trust me
if p < len(lc):
imagetext_py.draw_text(canvas=cv,
text=chr(lc[p][0]),
x=32*x, y=32*y,
size=48,
fill=black,
font=font,
draw_emojis=True)
cv.save(str(6 * j + i) + '.png')
Maybe I'll try just printing every emoji or something.