It’s all Greek to me

13 Aug 2024
Computer Stuff

A little update to the bar receipt encoding mystery: I was looking at the wrong code page! While I’ve studied Ancient Greek at school, we didn’t learn about ancient Greek 8-bit encodings – thanks for nothing, German education system!

It turns out that code page 737 is the common Greek 8-bit encoding, not code page 869. Using that, we can reconstruct better what happened to the receipts full of question marks.

Example

Let’s compare a good and a messed-up receipt: Φ.Π.Α. turns into ”.?.€.

When we look at the code pages involved:

Code page 737
80	Α	Β	Γ	Δ	Ε	Ζ	Η	Θ	Ι	Κ	Λ	Μ	Ν	Ξ	Ο	Π	8F
90	Ρ	Σ	Τ	Υ	Φ	Χ	Ψ	Ω	α	β	γ	δ	ε	ζ	η	θ	9F
Code page 1252
80	€		‚	ƒ	„	…	†	‡	ˆ	‰	Š	‹	Œ		Ž		8F
90		‘	’	“	”	•	–	—	˜	™	š	›	œ		ž	Ÿ	9F

…we can trace the conversions of each character.

Letters Φ and Α are encoded as 94 and 80 (hexadecimal) in code page 737
When bytes 94 and 80 get parsed as 1252 data, they map to ” and €
The dot . is at 2E in both code pages and stays intact
Letter Π is 8F in 737
But 8F is not assigned in code page 1252 (red gap in the table above)
It gets replaced with a ?
Result: ”.?.€.

Something is still missing

Or rather: Too much is missing! In other examples of good v. mixed-up texts, there are more question marks than we would expect.

Original	`Ξ Ε Ν Ο Δ Ο Χ Ε Ι Α Κ Ω` (`Ν`)	`Σ Υ Ν Ο Λ Ο`
Expected	`? „ Œ Ž ƒ Ž • „ ˆ € ‰ —`	`‘ “ Œ Ž Š Ž`
Actual	`? „ ? ? ƒ ? ? „ ? € ? ?`	`? “ ? ? ? ?`

So the “target” code page cannot be the 1252 encoding we know today. It must be a variant with more gaps, i.e. unassigned byte positions, leading to more question marks in the output.

Uppercase Greek letters	`ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ`
737 interpreted as 1252	`€?‚ƒ„…†‡ˆ‰Š‹Œ?Ž??‘’“”•–—`
Observed in the examples (incomplete)	`€ ƒ„ ‡???????????“”? ?`
737 interpreted as 1253 (wild guess)	`€?‚ƒ„…†‡?‰?‹?????‘’“”•–—`

While code page 1253 (1252-variant for Greek) matches somewhat better, it’s not a full match. Capital Kappa Κ maps to ‰, but it should be a ?, etc.

Phew! So we’re searching for a 1252-like encoding…

that contains € ƒ ‡ “ ”
but not Œ Š ‹ Ž ‰ • — ‘ ’

I think I’ll start looking at the beach, with a drink that has ‰! :)

Missing Evoke Lotus holidays

It’s all Greek to me

Example

Something is still missing

What the heckmeck?

Blog

Categories