Compas Pascal: October 2008

Saturday, 25 October 2008

Is software creating financial bubbles?

Alan Greenspan has long praised computer technology as a tool that can be used to limit risks in financial markets, but recently he acknowledged that the data fed into financial systems was often a case of garbage in, garbage out, indicating that this has led to huge trouble. Have bad IT systems been deployed elsewhere on this planet? Yes. Will the world continue to do so? Yes. Why?

If you look at a number of ideas for software systems, then some of these will definitely not make sense, some of them will make a lot of sense, and then there is a huge group in between. In this middle group, it is difficult to evaluate them, sometimes even after deploying the software. A famous person once said, it is often easy to measure things, but difficult to understand what is being measured, and this applies very well to software.

The dot com boom was based on the assumption, that the productivity gains in Software are so huge, that the value of many things would go up, a lot. Expectations were too high. Why? Because software doesn't deliver that kind of results, that fast. Resources for Software projects are allocated for the wrong projects, and many projects are doing something wrong.

Why cannot we just do it the right way? There are many reasons, but the single most important reason is, that there is no single right way that fits all purposes, and it is therefore impossible to make one recommendation for all. The best "single right way" that I have seen so far is real "Agile", meaning that you need to adapt all the time. In other words, a very difficult concept to teach. And now we're at the core of why not all Software projects are a huge success:

It's difficult.

Humankind is unable to do everything right. We will never get rid of that middle group of software systems, where we don't really know if they were a success or not.

Software is not much different than other technologies, like chemistry or electronics: Some people are making huge progress, others not, and when the good ideas get deployed, world productivity improves. Some software is great, but software as such is not a silver bullet by itself.

I think we should start to try to identify the biggest successes, in order to learn from these. Maybe we should have a Nobel prize for Software?

Wednesday, 22 October 2008

Funny comment in system.pas

In Delphi 2009's system.pas, line 1475, I found this comment:

TTextBuf = array[0..127] of AnsiChar; // TODO: change to WideChar

I wonder why they didn't check their todo items before shipping.

Tuesday, 21 October 2008

The problem with С in programming

The problem with С in programming is, that this Delphi example actually compiles and executes without problems:

procedure TForm3.Button4Click(Sender: TObject);
var
  c:integer;
  с:integer;
begin
  c:=2;
  с:=3;
  Assert (c<>с);
end;

The first c is the latin letter, and the second с is the cyrillic letter. Both are on the same position in the U.S. and Russian keyboard layouts, and I have seen these two been mixed up several times.

Thursday, 16 October 2008

Widestring 4545 times slower than unicodestring

I noticed that several people, in comments and in other blogs, compared the number of seconds that was spent for each benchmark in my previous post. I presented both the time spent, the number of iterations and the number of iterations per second, and it is the last number that is interesting. In order to fix that, I have now removed the time measurements from that post.

For that same reason, several people wondered why I did not like widestring. The main reason why I recommend not to use widestring is this one:

// approx. 25 million iterations per second
u:='';
for i:=0 to 100000000 do begin
  u:=u+' ';
end;

// approx. 0.0055 million iterations per second
w:='';
for i:=0 to 100000 do begin
  w:=w+' ';
end;

Note, how widestring is extremely slow for this specific test. This is the kind of stuff that can make a well made application perform really bad. A TCP/IP ping request between two servers on a good network uses less time than it takes to add a space to a widestring on my reasonably fast laptop.

Tuesday, 14 October 2008

Delphi 2009 string type performance benchmark

This code was run on a Intel Core 2 laptop, and shows the difference in performance very well. Compiler options used: Code optimization disabled, all checks on.

procedure TForm3.Button3Click(Sender: TObject);
var
  a:ansistring;
  r:rawbytestring;
  u:string;
  w:widestring;
  i:integer;
  s:shortstring;
  c:char; // widechar
  ac:ansichar;
begin
  screen.Cursor:=crHourGlass;
  try
    // approx. 222 million iterations per second
    s:='This is a test';
    for i:=0 to 1000000000 do begin
      ac:=s[4];
      s[4]:=s[5];
      s[5]:=ac;
    end;

    // approx. 43 million iterations per second
    a:='This is a test';
    for i:=0 to 1000000000 do begin
      ac:=a[4];
      a[4]:=a[5];
      a[5]:=ac;
    end;

    // approx. 40 million iterations per second
    u:='This is a test';
    for i:=0 to 1000000000 do begin
      c:=u[4];
      u[4]:=u[5];
      u[5]:=c;
    end;

    // approx. 71 million iterations per second
    w:='This is a test';
    for i:=0 to 1000000000 do begin
      c:=w[4];
      w[4]:=w[5];
      w[5]:=c;
    end;

    // ****************************

    // approx. 40 million iterations per second
    for i:=0 to 100000000 do begin
      u:='This is € test';
    end;

    // approx. 5.5 million iterations per second
    for i:=0 to 100000000 do begin
      a:='This is € test';
      u:=a;
    end;

    // approx. 5.5 million iterations per second
    for i:=0 to 100000000 do begin
      u:='This is € test';
      a:=u;
    end;

    // approx. 3.7 million iterations per second
    for i:=0 to 100000000 do begin
      u:='This is € test';
      w:=u;
    end;

    // ****************************

    // approx. 3.7 million iterations per second
    s:='';
    for i:=0 to 100000000 do begin
      s:=copy(s+' ',1,50);
    end;

    // approx. 4.2 million iterations per second
    a:='';
    for i:=0 to 100000000 do begin
      a:=copy(a+' ',1,50);
    end;

    // approx. 2.5 million iterations per second
    u:='';
    for i:=0 to 100000000 do begin
      u:=copy(u+' ',1,50);
    end;

    // approx. 1.6 million iterations per second
    w:='';
    for i:=0 to 10000000 do begin
      w:=copy(w+' ',1,50);
    end;

    // ****************************

    // approx. 25 million iterations per second
    r:='';
    for i:=0 to 100000000 do begin
      r:=r+' ';
    end;

    // approx. 25 million iterations per second
    a:='';
    for i:=0 to 100000000 do begin
      a:=a+' ';
    end;

    // approx. 25 million iterations per second
    u:='';
    for i:=0 to 100000000 do begin
      u:=u+' ';
    end;

    // approx. 0.0055 million iterations per second
    w:='';
    for i:=0 to 100000 do begin
      w:=w+' ';
    end;
  finally
    screen.Cursor:=crDefault;
  end;
end;

Conclusion:
* Avoid widestring and shortstring.
* UnicodeString is a huge improvement to WideString.

Raw binary data in Delphi 2009 strings, by example

This code snippet explains by example how you can use binary data in strings in Delphi 2009:

const
  AllByteValues=
    #$00#$01#$02#$03#$04#$05#$06#$07#$08#$09#$0a#$0b#$0c#$0d#$0e#$0f+
    #$10#$11#$12#$13#$14#$15#$16#$17#$18#$19#$1a#$1b#$1c#$1d#$1e#$1f+
    #$20#$21#$22#$23#$24#$25#$26#$27#$28#$29#$2a#$2b#$2c#$2d#$2e#$2f+
    #$30#$31#$32#$33#$34#$35#$36#$37#$38#$39#$3a#$3b#$3c#$3d#$3e#$3f+
    #$40#$41#$42#$43#$44#$45#$46#$47#$48#$49#$4a#$4b#$4c#$4d#$4e#$4f+
    #$50#$51#$52#$53#$54#$55#$56#$57#$58#$59#$5a#$5b#$5c#$5d#$5e#$5f+
    #$60#$61#$62#$63#$64#$65#$66#$67#$68#$69#$6a#$6b#$6c#$6d#$6e#$6f+
    #$70#$71#$72#$73#$74#$75#$76#$77#$78#$79#$7a#$7b#$7c#$7d#$7e#$7f+
    #$80#$81#$82#$83#$84#$85#$86#$87#$88#$89#$8a#$8b#$8c#$8d#$8e#$8f+
    #$90#$91#$92#$93#$94#$95#$96#$97#$98#$99#$9a#$9b#$9c#$9d#$9e#$9f+
    #$a0#$a1#$a2#$a3#$a4#$a5#$a6#$a7#$a8#$a9#$aa#$ab#$ac#$ad#$ae#$af+
    #$b0#$b1#$b2#$b3#$b4#$b5#$b6#$b7#$b8#$b9#$ba#$bb#$bc#$bd#$be#$bf+
    #$c0#$c1#$c2#$c3#$c4#$c5#$c6#$c7#$c8#$c9#$ca#$cb#$cc#$cd#$ce#$cf+
    #$d0#$d1#$d2#$d3#$d4#$d5#$d6#$d7#$d8#$d9#$da#$db#$dc#$dd#$de#$df+
    #$e0#$e1#$e2#$e3#$e4#$e5#$e6#$e7#$e8#$e9#$ea#$eb#$ec#$ed#$ee#$ef+
    #$f0#$f1#$f2#$f3#$f4#$f5#$f6#$f7#$f8#$f9#$fa#$fb#$fc#$fd#$fe#$ff;
  RawByteTest=
    RawByteString(AllByteValues);
  GreekTest=
    GreekString(AllByteValues);
  AnsiTest=
    ansistring(AllByteValues);

procedure TForm3.Button2Click(Sender: TObject);
var
  i:0..255;
  ErrorList:string;
  c:char;
  ac:ansichar;
  utf16:string;
begin
  Assert (length(AllByteValues)=256,'The number of characters is just like in Delphi 2006');
  Assert (sizeof(AllByteValues)=4,'This is a pointer');
  Assert (sizeof(AllByteValues[1])=2,'But each character is now 2 bytes');
  Assert (AllByteValues[1]=#0);
  Assert (length(RawByteTest)=256);
  Assert (sizeof(RawByteTest)=4,'This is a pointer');
  Assert (sizeof(RawByteTest[1])=1,'Using RawByteString in a const the bytes stay as they are');
  Assert (RawByteTest[1]=#0);
  Assert (RawByteTest[1]=char(0));
  Assert (RawByteTest[1]=chr(0));
  ac:=#0;
  Assert (RawByteTest[1]=ac);
  c:=#0;
  // Assert (RawByteTest[1]=c);    // This line does not compile! - AnsiChar and Char are absolutely not compatible in any way.
  Assert (ord(RawByteTest[1])=ord(c));    // This compiles nicely

  // Demonstrate how #128..#159 does not exist in Unicode and therefore causes big trouble!
  ErrorList:='';
  for i:=0 to 255 do begin
    if ord(AllByteValues[i+1])<>i then
      ErrorList:=ErrorList+IntToStr(i)+' ';
  end;
  Assert (ErrorList='128 130 131 132 133 134 135 136 137 138 139 '+
    '140 142 145 146 147 148 149 150 151 152 153 154 155 156 158 159 ',
    'These values are not saved in a string in the way you would expect!!');

  // GreekString also destroys constants with binary data
  ErrorList:='';
  for i:=0 to 255 do begin
    if ord(GreekTest[i+1])<>i then
      ErrorList:=ErrorList+IntToStr(i)+' ';
  end;
  Assert (ErrorList='136 138 140 142 152 154 156 158 159 161 162 170 175 '+
    '180 184 185 186 188 190 191 192 193 194 195 196 197 198 199 200 201 '+
    '202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 '+
    '219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 '+
    '236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 '+
    '253 254 255 ',
    'These values are not saved in a string in the way you would expect!!');

  // RawByteString stores all bytes correctly
  for i:=0 to 255 do begin
    Assert (ord(RawByteTest[i+1])=i);
  end;

  // Ansistring also stores all bytes correctly (tested on a Windows-1252 machine)
  for i:=0 to 255 do begin
    Assert (ord(AnsiTest[i+1])=i);
  end;

  // Most common ansistring stuff works as expected
  Assert (ord(AnsiTest[129])=128);
  Assert (AnsiTest[129]=#128);
  Assert (copy(AnsiTest,129,1)=#128);
  Assert (MidStr(AnsiTest,129,1)=#128);
  Assert (pos(#128,AnsiTest)=129);

  // The same functions using UnicodeString
  utf16:=AllByteValues;
  Assert (ord(utf16[129])=8364);
  Assert (utf16[129]=#8364);
  Assert (copy(utf16,129,1)=#8364);
  Assert (MidStr(utf16,129,1)=#8364);
  Assert (pos(#128,utf16)=129);   // #128 is converted to #8364 before calling the widestring version of pos()
  Assert (pos(#8364,utf16)=129);

  // Don't copy raw binary data into an utf-16 string type!
  utf16:=RawByteTest;
  ErrorList:='';
  for i:=0 to 255 do begin
    if ord(utf16[i+1])<>i then
      ErrorList:=ErrorList+IntToStr(i)+' ';
  end;
  Assert (ErrorList='128 130 131 132 133 134 135 136 137 138 139 140 142 145 '+
    '146 147 148 149 150 151 152 153 154 155 156 158 159 ',
    'These values are not saved in a string in the way you would expect!!');

  // Windows automatically handles unsupported byte values in strange ways.
  c:=#128;
  Assert (ord(c)<>128);
  Assert (ord(c)=8364);
  Assert (c='€');

  ac:=#128;
  Assert (ord(ac)=128);
  Assert (ord(ac)<>8364);
  Assert (ac='€','Here, ac is converted to a utf-16 string type using local character set');

  // Don't use inc() or dec() with utf-16. It works, but it's not good
  utf16:=#127;
  Assert (ord(utf16[1])=127);
  inc (utf16[1]);
  Assert (ord(utf16[1])=128);
  Assert (utf16[1]<>#128); // Because #128 becomes #8364
  Assert (#128=#8364);     // as you can see here
end;

Conclusion: Always use RawByteString or AnsiString for binary data, and never store binary data in other string types.

Wednesday, 8 October 2008

Delphi 2009 strings explained by example

This code snippet explains by example how the new string types work:

type
  OemCp437=type ansistring(437);
  CyrillicString=type ansistring(1251);
  DanishString=type ansistring(1252);
  GreekString=type ansistring(1253);
  usascii=type ansistring(20127);
  Iso88591String=type ansistring(28591);
  Iso885915String=type ansistring(28605);
  utf7string=type ansistring(65000);

  // These will not work, but will compile
  utf16le_string=type ansistring(1200);
  utf16be_string=type ansistring(1201);
  utf32_string=type ansistring(12000);
  utf32be_string=type ansistring(12001);

procedure TForm3.Button1Click(Sender: TObject);
var
  utf16:string;
  local:ansistring;
  raw:rawbytestring;
  utf8:utf8string;
  utf7:utf7string;
  cyrillic:CyrillicString;
  danish:DanishString;
  greek:GreekString;
  iso88591:Iso88591String;
  iso885915:Iso885915String;
  Cp437:OemCp437;
  ascii:usascii;
  utf32:utf32_string;
begin
  // Ansistring cannot be used for utf16 and utf32
  utf32:='asdf';
  Assert (utf32='');

  // Demonstrating what UTF-16 is
  utf16:=#$1D160;            // This is a musical note (000011101000101100000), see http://unicode.org/charts/PDF/U1D100.pdf
  Assert (length(utf16)=2);  // This character occupies 2 positions in UTF-16
  Assert (utf16[1]=#$D834);  // 110110 0000110100 First half of the symbol
  Assert (utf16[2]=#$DD60);  // 110111 0101100000 Second half of the symbol
  utf8:=utf16;
  Assert (length(utf8)=4);
  Assert (utf8[1]=#$F0);   // 11110 000
  Assert (utf8[2]=#$9D);   // 10 011101
  Assert (utf8[3]=#$85);   // 10 000101
  Assert (utf8[4]=#$A0);   // 10 100000
  danish:=utf16;
  Assert (danish='??');    // Note how Windows incorrectly converts to two letters!
  Assert (length(danish)=2);
  danish:=utf8;
  Assert (danish='??');    // Note how Windows incorrectly converts to two letters!
  Assert (length(danish)=2);

  // Demonstrating the euro character
  utf16:='€';
  danish:=utf16;
  cyrillic:=utf16;
  greek:=utf16;
  iso88591:=utf16;
  iso885915:=utf16;
  Cp437:=utf16;
  ascii:=utf16;
  utf8:=utf16;
  utf7:=utf16;
  Assert (length(utf16)=1);
  Assert (length(danish)=1);
  Assert (length(cyrillic)=1);
  Assert (length(greek)=1);
  Assert (length(iso88591)=1);
  Assert (length(iso885915)=1);
  Assert (length(Cp437)=1);
  Assert (length(ascii)=1);
  Assert (length(utf7)=5);
  Assert (length(utf8)=3);
  Assert (ord(utf16[1])=8364);
  Assert (ord(danish[1])=128);
  Assert (ord(cyrillic[1])=136);
  Assert (ord(greek[1])=128);
  Assert (ord(iso885915[1])=164);
  Assert (iso88591='?');
  Assert (ascii='?');
  Assert (Cp437='?');
  Assert (greek=utf16);
  Assert (danish=utf16);
  Assert (cyrillic=utf16);
  Assert (utf7=utf16);
  Assert (utf7=utf8);
  Assert (iso885915=utf16);
  Assert (iso88591<>utf16);
  Assert (Cp437<>utf16);
  Assert (ascii<>utf16);
  Assert (cyrillic=danish);

  // Convert from Unicode to special character sets
  utf16:='abc ÆØÅ рыба'; // s uses utf-16
  local:=utf16;  // Converts to local 8-bit character set
  raw:=utf16;    // Converts to local 8-bit character set
  utf8:=utf16;   // Converts to utf-8
  cyrillic:=utf16;
  danish:=utf16;
  greek:=utf16;
  Cp437:=utf16;
  ascii:=utf16;
  utf7:=utf16;
  Assert (cyrillic='abc ?OA рыба');
  Assert (danish='abc ÆØÅ ????');
  Assert (greek='abc ?OA ????');
  Assert (greek='abc ?OA ????');   // Æ => ?
  Assert (Cp437='abc ÆOÅ ????');   // Ø does not exist
  Assert (ascii='abc AOA ????');   // Æ => A
  Assert (length(utf16)=12);
  Assert (length(local)=12);
  Assert (length(raw)=12);
  Assert (length(utf8)=19);
  Assert (length(utf7)=28);
  Assert (length(Cp437)=12);
  Assert (length(cyrillic)=12);
  Assert (length(danish)=12);
  Assert (length(greek)=12);
  Assert (length(ascii)=12);

  // Converts to Unicode
  utf16:=danish;
  Assert (utf16='abc ÆØÅ ????');
  Assert (length(utf16)=12);
  utf16:=cyrillic;
  Assert (utf16='abc ?OA рыба');
  Assert (length(utf16)=12);
  utf16:=utf8;
  Assert (utf16='abc ÆØÅ рыба');
  Assert (length(utf16)=12);

  // The following lines only work correctly if your local character set
  // is Windows-1252!
  utf16:=raw;
  Assert (utf16='abc ÆØÅ ????');
  Assert (length(utf16)=12);

  raw:=cyrillic;
  local:=cyrillic;
  Assert (local='abc ?OA ????');
  Assert (raw<>local);   // raw preserves cyrillic letters and the character set
  Assert (length(raw)=12);

  raw:=danish;
  local:=danish;
  Assert (raw=local);
  Assert (raw='abc ÆØÅ ????');
  Assert (local='abc ÆØÅ ????');
  Assert (length(raw)=12);

  raw:=greek;
  local:=greek;
  Assert (raw='abc ?OA ????');
  Assert (local='abc ?OA ????');
  Assert (raw=local); // This is only true because the string doesn't contain greek letters
  Assert (length(raw)=12);
end;

If you are in doubt about how to use ansistring and RawByteString, use this guideline:

* Use the normal (unicode) string type as much as you can.
* Use ansistring for texts in local 8-bit character sets. Usually it is only used for I/O.
* Use RawByteString for parameters to functions that have to work on all kinds of ansistrings, without triggering character set conversions, like I/O functions. This is really only necessary if you mix various character sets, which is rarely the case. Most programmers will only very rarely use RawByteString.
* Use RawByteString for storing binary data - but ansistring also works. Make sure that you don't assign binary data to/from UnicodeString=string. Note that most string manipulation functions now expect the unicode string type, so you may need to implement some things yourself.

If you want to make code work with both Delphi 2009 and previous, you can insert this into your source:

{$ifndef UNICODE}
type UnicodeString=widestring;
type RawByteString=ansistring;
{$endif}

Use UnicodeString wherever you used widestring before, unless it's really widestring that you want to use (for BSTR compatibility). Program the rest using string wherever you can, and ansistring in some I/O operations. Most of the VCL already defaults to ansistring for non-Unicode I/O, making things very backwards compatible.

Monday, 6 October 2008

Menus or Office 2007 toolbars?

I notice that Google just changed their spreadsheet user interface from the Office 2007 toolbar style to the good old TMainMenu-like user interface. Nice. I guess I made the right choice, when I chose not to install the specially licensed Office 2007 components with my Delphi 2009.

Saturday, 4 October 2008

High performance apps in Delphi

Poul-Henning Kamp just made a very good presentation on how he developed Varnish, a http accelerator that is much faster that using Squid in front of a slow CMS system.

Most of the methods, that PHK describes are very easy to implement in Delphi, so it's worth having a look at. Unfortunately, I only found this Danish language presentation, which can hopefully understood by most Scandinavians - but I know that he has presented it in other languages, too - so if somebody has a link to an English version, please provide the link.