Tuesday 21 October 2008

The problem with С in programming

The problem with С in programming is, that this Delphi example actually compiles and executes without problems:

procedure TForm3.Button4Click(Sender: TObject);
Assert (c<>с);

The first c is the latin letter, and the second с is the cyrillic letter. Both are on the same position in the U.S. and Russian keyboard layouts, and I have seen these two been mixed up several times.


Anonymous said...

That's why I'm a fan of ASCII identifiers.

Anonymous said...

That's why it was also a big security problem to allow Unicode characters in domain names. You could easily register a web site with a name that is indistinguishable from an existing web site and draw people to it with a phony link.

Lars D said...

I don't consider that to be a huge problem, because most people don't check the domain name, anyway. Getting the right URL is a problem that should be solved in a different way than by the choice of character set.

It may be of inconvenience to you, but unicode domain names are very convenient to those that have no idea how to transliterate to latin letters.

Anonymous said...

Even russian programmers do not use cyrillic letters for naming variables. So it should not be a big problem for professionals.

Anonymous said...

Unicode domain names? What a joke!

The primary argument used to advocate Unicode was no longer seeing nonsensical characters when you go on to non-latin web pages (useful for those with the linguistic skill to read the foreign language.)

Now we discover we probably cannot access them (easily) in any case, because those characters are not on standard latin keyboards! The internet world will still remain divided into linguistic regions.

We should have stayed with codepages. The minimal advantages of Unicode are not worth the pain.

Lars D said...

In Denmark, it works this way: Let's imagine you want to make a beer commercial, and you want to redirect to BEER.COM in a Danish version, it will become ØL.DK.

The browser will translate this domain name to XN--L-4GA.dk and open that webpage for you.

What is the alternative? The recommended method is to use OEL.DK, but almost nobody does that. Børsten chose BORSEN.DK, and they have big trouble to explain that in their commercials.

In Denmark, we only have 3 non-ascii letters (ÆØÅ), but what about russian? If your company name is ТЯЖЁЛЫЕ, what domain name would you buy? I would transliterate it to TJADSJOLIGE.RU. Others might transliterate to TYAZHYOLIYE.RU or TAZHELOJE.RU. Try to explain which one to use in a commercial, to a population that doesn't do latin letters a lot. I guess ТЯЖЁЛЫЕ.RU makes more sense, and even more, ТЯЖЁЛЫЕ.РУ. The last solution means that you don't have to switch keyboard layout to type the address.

The dream of having a world communicate with one language and only the ascii character set is cute, but unrealistic. :-)

As a personal note, I can tell you that people in my country are reminded of the lack of non-ascii domain name support in MSIE several times a day in commercials and newspapers. It's annoying.

Anonymous said...

The Russian companies I know generally have accepted latin character versions of their names.

In the English speaking programming world, I always write the "ö" in my name as just "o".

By the way, long ago I inherited a solid green wooden box from my father. On the side it says, in red letters, "OL". (where O is that danish O/ thing you wrote). Well, for the first time in my life, I know what was in there! And the idea of calling Beer, "Oil" (German: Oel) tickles my funnybone!

Lars D said...

When you need to type Russian and English on the same keyboard, you need to switch keyboard layout, a lot. In order to ease that, you will usually choose an easy keyboard combination. Some choose Shift+Control, and some just choose Control. If you accidentally hit that combination at the wrong time, and then type C, then you don't recognize that it's the wrong type of C.

As I said, I have encountered this problem multiple times, because people actually make this error, even when they try to write pure ascii.

Lars D said...

Is this the kind of wooden box you have?


Anonymous said...

Seems Russians have long used the K-8 or KOI-8, something like that, codepages. IIRC, the standard 127 ASCII chars were where they always are, and the Cyrillic characters in the remaining positions. Drivers made switching easy.

I think that's the box, yes, but mine's in much better condition. Quite small for a beer case. Surprising, considering what my experience with Danes and Alcohol has been :)

Lars D said...

The wooden beer cases went out of the market around 1970. Today, a good, original beer case like that is considered a collector's object, so take good care of it :-)