Thursday 11 September 2008

Why you need Delphi 2009

Many developers don't consider Unicode necessary. After all, the ansistring type takes half the amount of memory and is often guilty of making Delphi apps much faster than .net and java apps, and you don't think that you need those additional characters anyway.

However, there are a few characters that may already have caused you problems. For instance, the euro symbol (€). It does not exist in the ISO-8859-1 character set! In Windows-1252, it is encoded as #$80, and in ISO 8859-15 it is encoded as #$A4. You may even use an utf-8 converter, that basically converts the ansi bits into utf-8 bits, without repositioning the euro symbol - and that's also wrong, because in utf-8, you need to use the Unicode code point #$20AC. If you're exchanging CSV or ansi files, but want to include the euro symbol, you may experience problems with other character sets. For instance, in cyrillic character sets, euro has position #$88. All this gets much easier with Delphi 2009.

Does it get really, really easy? That depends on your ambitions. Many believe that Unicode is about a 16-bit character encoding, but that's wrong. That would only cover 16 bits, and Unicode has more than that. It is possible to have special symbols with more than 16-bit, and in order to handle that, Windows (and .net and Java) uses UTF-16 encoding, where one symbol may use 2 or 4 bytes. Most programmers will ignore that fact, and your program will still work nicely with Unicode. CodeGear even implemented a widechar which is only 16-bit, and uses that in Delphi 2009 VCL. The Unicode standard ensures that all normal characters are encoded using 16 bit, and will usually do. In almost all normal systems, one character is 16 bit, and that's really, really easy.

One example of a group of characters that is not 16-bit could be musical symbols.

No comments: