Start

heckmeck!

Nerd content and
cringe since 1999
Alexander Grupe
Losso/AttentionWhore

A little update to the bar receipt encoding mystery: I was looking at the wrong code page! While I’ve studied Ancient Greek at school, we didn’t learn about ancient Greek 8-bit encodings – thanks for nothing, German education system!

It turns out that code page 737 is the common Greek 8-bit encoding, not code page 869. Using that, we can reconstruct better what happened to the receipts full of question marks.

Example

Let’s compare a good and a messed-up receipt: Φ.Π.Α. turns into ”.?.€.

When we look at the code pages involved:

Code page 737
80 Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο Π 8F
90 Ρ Σ Τ Υ Φ Χ Ψ Ω α β γ δ ε ζ η θ 9F
Code page 1252
80   ƒ ˆ Š Œ   Ž   8F
90   ˜ š œ   ž Ÿ 9F

…we can trace the conversions of each character.

  • Letters Φ and Α are encoded as 94 and 80 (hexadecimal) in code page 737
  • When bytes 94 and 80 get parsed as 1252 data, they map to and
  • The dot . is at 2E in both code pages and stays intact
  • Letter Π is 8F in 737
  • But 8F is not assigned in code page 1252 (red gap in the table above)
  • It gets replaced with a ?
  • Result: ”.?.€.

Something is still missing

Or rather: Too much is missing! In other examples of good v. mixed-up texts, there are more question marks than we would expect.

Original Ξ Ε Ν Ο Δ Ο Χ Ε Ι Α Κ Ω (Ν) Σ Υ Ν Ο Λ Ο
Expected ? „ Œ Ž ƒ Ž • „ ˆ € ‰ — ‘ “ Œ Ž Š Ž
Actual ? „ ? ? ƒ ? ? „ ? € ? ? ? “ ? ? ? ?

So the “target” code page cannot be the 1252 encoding we know today. It must be a variant with more gaps, i.e. unassigned byte positions, leading to more question marks in the output.

Uppercase Greek letters ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ
737 interpreted as 1252 €?‚ƒ„…†‡ˆ‰Š‹Œ?Ž??‘’“”•–—
Observed in the examples (incomplete) €  ƒ„  ‡???????????“”? ?
737 interpreted as 1253 (wild guess) €?‚ƒ„…†‡?‰?‹?????‘’“”•–—

While code page 1253 (1252-variant for Greek) matches somewhat better, it’s not a full match. Capital Kappa Κ maps to , but it should be a ?, etc.

Phew! So we’re searching for a 1252-like encoding…

  • that contains € ƒ ‡ “ ”
  • but not Œ Š ‹ Ž ‰ • — ‘ ’

I think I’ll start looking at the beach, with a drink that has ‰! :)

previous next close
eie