Compas Pascal: 2008

Saturday 27 December 2008

Derivatives in IT: SaaS

One of the major causes of the financial bubbles is derivatives. In theory, they're a good thing, but even though they seem very easy to understand, they're sometimes too complicated when things go wrong, and they can bring down entire companies.

We have something similar in IT: Software as a Service. In theory, it's a good thing. You can build entire companies on top of it, but even though it seems very easy to understand, it's sometimes too complicated when things go wrong, and it can bring down entire companies.

Many things can go wrong:

* Your provider may go bust. Needhost.dk just went bankrupt, and customers have no access to their data any more. There was no warning.

* Network connections deteriorate to a level where the software does not respond well enough to be usable. Almost no company provide enough metrics to be able to specify a minimum service level that solves all thinkable problems.

* Network connection bandwidth may deteriorate to a level where your data can no longer be transferred quickly enough to make backups or to move to another provider. And who knows, maybe your data is physically stored in Siberia?

* Power outage may hit one of the many connection points between server and client.

* The provider may not be able to allocate enough employees to help you out, especially if a problem hits all the provider's customers.

Usually, SaaS contracts are based on historical performance, like "our network has 99.9% uptime". They should write "had", because if uptime drops, the customer has a problem, no matter what the contract says.

Businesses usually rely on the fact, that if something goes wrong, there is usually a workaround. However, broken cables in the Mediterranean sea, or censorship can block entire ranges of IP addresses. Court decisions may block DNS names. If your country loses the connection to your SaaS provider, can your company continue without?

The solution to all this is easy: Make sure that your company can continue, if your SaaS provider stops its operations with no warning.

Wednesday 24 December 2008

Top down programming using refinements

In 1984, I enjoyed learning the Elan programming language. It had a nice syntactic element: Refinements. These are used for top-down programming, which means that you decide what your function should do, first, and then write the code, later.

Refinements couldn't do anything that procedures and functions cannot do - but it was a nice element of the language, that its syntax actually focused on the principle of top-down programming.

It works like this in Delphi: Let's write a procedure that sorts a file, line by line. Here is our first code:

procedure SortFile (filename:string);
begin
  // Read file into memory
  // Sort the lines
  // Write file out of memory
end;

In Elan, the comments would be refinements, and you would write the actual implementation elsewhere. Delphi doesn't have refinements, but you could write procedure calls instead.

Another method is to implement each of these comments by actually writing the code, letting the comments group the lines:

procedure SortFile (filename:string);
var
  sl:TStringList;
begin
  sl:=TStringList.Create;
  try
    // Read file into memory
    sl.LoadFromFile (filename);

    // Sort the lines
    sl.CaseSensitive:=False;
    sl.Sort;

    // Write file out of memory
    sl.SaveToFile (filename);
  finally
    FreeAndNil (sl);
  end;
end;

This approach usually leads to very readable procedures. Usually, it also means that complex algorithms are written using several small and simpler algorithms.

Sunday 21 December 2008

Touch typing leads to a different choice of tools

It is a disgrace, that the amount of people that are able to touch type, is not increasing. Many user interfaces today optimize the input of data by providing a context-based list of choices, so that the user can easily pick what is needed. Mobile phones excel at this, but if you want to be really efficient with a computer, you need a higher bandwidth between you and the computer.

Without touch typing, a programmer often does about 20 words per minute. A touch typing programmer easily approaches 80 words per minute or more. However, the really big difference, is that the touch typer focuses on the task, and not on the keyboard. Touch typers are usually able to involve themselves in a discussion about something else, while touch typing.

Touch typing also decides the choice of tools. A good example is computer games. Everybody can sit down with a playstation and play some games. it takes much more to learn 25 different keys in a complicated PC game, but those people that like PC games, often dislike console games, because of the lack of bandwidth between the user and the PC.

However, touch typers also program differently and use applications differently. It's not just about speed, but also about how problems are solved. As you can see in the video, it takes less than 3 minutes to type this blog entry.

Saturday 6 December 2008

Delphi in the Cloud, Virtualization and other new inventions

Buzzwords change all the time, and the newest is the Cloud. The cloud provides several benefits: deployment, load balancing and costs improve a lot. Where does this put Delphi?

What we see now is the first generation of cloud systems. Some clouds have specific programming languages, very specific APIs and even specific databases which are rather primitive. As a former technical responsible for a webhotel, I can certainly understand why they don't introduce full SQL support from the start. However, as time goes, things becomes more feature-rich. We will see more support for SQL, more support for various programming languages, and maybe even support for virtual machines in clouds. It will be possible to create a virtual machine and put it into the cloud, replicated as many times as necessary.

Delphi/Win32 is very much about GUI client creation. It does not need a cloud, but it connects easily to a cloud. Even the Windows Azure client technologies are directly available in Win32.

Delphi Prism works with Microsoft Cloud technologies, but basically works with any mono-compatible cloud technology.

The only thing that is missing in this piece, is a good super-scalable cloud database that treats transactions just like Firebird does. Delphi has a good history of being able to handle many kinds of database semantics, and therefore works very well with almost any type of database in the cloud. However, the age-old problem of distributed transactions has not been solved with the cloud, and therefore we will still see many apps with centralized databases.

The current state in many organizations is, that they're trying to consolidate and virtualize. The cloud is nowhere near reality in most organizations yet, and we will see a lot of improvement in Cloud technology before it becomes mainstream in business.

Sunday 30 November 2008

Delphi for i386, PowerPC, s390, Arm, Mips, x86-64, Itanium...

The people at CodeGear have done something cool: They now officially support the mono project, which supports a wide range of CPU architectures.

I was one of those that evaluated the Kylix product, and actually used it for production code (non-GUI, server side), which still runs. Kylix's main problem was, that it only supported i386, and the basic concept in Kylix did not indicate that this would change any time soon. It was really cool to use Kylix for non-GUI apps on Linux, because the apps were fast, the tool was productive, and everything worked with Unicode (utf-8). However, it was clear from the start that the concept was flawed, and I only used Kylix because there was no realistic alternative for that very special project.

Now, CodeGear delivers cross platform technology, supporting both Microsoft and Open Source platforms. This means that the king of GUI development now supports many kinds of server-side and embedded development, and that CodeGear has a cool platform for things to come. We can even make our source code run on a Mac or Nintendo Wii now.

This should not be considered to be a minor product. In order to exploit a business opportunity, you need to be in a position from where you can do it, and Delphi Prism seems to provide a good position for Embarcadero.

Friday 28 November 2008

The smallest Hello, World in Delphi

This is the source code:

{$APPTYPE CONSOLE}
program p;
begin
  Writeln ('Hello, World');
end.

Program info (Delphi 2009):

* Code size 14284 bytes
* Data size 12988 bytes
* Initial stack size 16384 bytes
* File size 21504 bytes

Benchmark:

* Using a standard .cmd batch file, this application can be started 10000 times in 77 seconds on a standard Core 2 laptop using Windows XP. That means 7,7 milliseconds per run.

A minimal GUI app:

program p;
uses
  Windows;
begin
  MessageBox (0,'Message','Hello, World',MB_OK);
end.

Program info (Delphi 2009):

* Code size 11744 bytes
* Data size 12984 bytes
* Initial stack size 16384 bytes
* File size 18432 bytes

Both run on:

* Windows 95 and later
* Windows NT 3 and later
* Linux using Wine

In order to understand the relation to other languages, you can have a look at this article about Java (translations tools here).

Thursday 13 November 2008

In-band vs. out-of-band signaling in programming

In telecommunications, you need to transmit data, but also signals that indicate information about connections - signals like "create a connection", "close a connection" etc. This information can be transferred in multiple ways: inside the data channel, or outside. This is usually called "in-band signaling" or "out-of-band signaling". DTMF is a good example of in-band signaling. You press a number on your phone, the phone generates a sound, and the sound can be heard in the other end. This is different from HTTP, where the protocol wraps the content, so that the content cannot change the HTTP information.

In programming, we have something similar. Out-of-band signaling looks like this:

procedure StringToInt (s:string;var i:integer; 
  var error:boolean);

Here, the error code is delivered separately from the return value. An in-band signaling type example is:

function StringToInt (s:string):integer;
// Returns 0 if s is not a valid integer

Here, a default value is returned, if the string is not readable. In other words, the return value can be an error code or a value, and the error code could strictly be interpreted as a value.

The basic properties of in-band signalling are:

* A sends at least 2 kinds of communication to B through system C
* System C cannot distinguish the different kinds of communication

A good example is the utf-8 character encoding system. Most ISO-8859-1 applications can handle utf-8, even though they were not designed for it. Kylix was not designed for utf-8, but you could write utf-8 source code, and create utf-8 applications using Kylix, fully unicode enabled. Also, many text-file command-line tools that were designed for ISO-8859-1, work perfectly with utf-8, unmodified.

XML is another good example. XML files can be transported, filtered, stored, reformatted and handled in many ways, without specifying what kinds of information the XML file actually contains. You can transmit different kinds of XML files through a generic XML communications channel, and only the sender and the receiver understand the difference.

One of the oldest and most well known examples is probably ASCII. The character A is a letter, with the code 65. Ringing the bell has code 7, and returning the carriage has code 13. All you need is to transmit a 7-bit code, and you can make the remote device make sounds, feed paper, print multiple characters on top of each other etc.

Does in-band signaling make sense? Yes. Many of the most successful data formats use in-band signaling. Does it always make sense? No. I can mention lots of reasons why, but the list is too long to mention here. One is very important, though: In-band signaling is often mentioned as a security problem - think of SQL injection.

Try to imagine programming, where a string could not contain multiple lines. How would programming have evolved? We wouldn't have TStrings.Text, we wouldn't have the ability to save multiline texts in a string field. Maybe we had solved this differently. Or not.

Tuesday 11 November 2008

Corrections to "Working with Delphi 2009"

My article about "Working with Delphi 2009" contained 2 serious flaws that need a correcting article.

First, I mentioned that #0 would not be stored in a unicodestring/string. That's not correct. I did have a problem related to the #0, but I identified the wrong cause. I haven't identified the correct cause, yet. Thanks to Andreas Hausladen for reporting this.

Second, I mentioned that insert(), delete() did not exist in ansistring versions. This is actually what delphi reports inside the IDE, but it is not correct. If you try to use them with ansistring, they will compile and work. Thanks to "PhiS" for reporting this one.

A search in system.pas reveals, that insert() works with shortstring, ansistring, widestring, unicodestring, but not RawByteString. However, I have tried all kinds of combinations with utf8string, ansistring and other kinds of strings, and it seems to work backwards-compatible in all cases. A utf8string does not get converted to local character set when used as an ansistring var-parameter to insert(). It's not logical, but it works.

Sunday 9 November 2008

Anders Hejlsberg and Delphi's future

Anders Hejlsberg made a nice presentation during his latest visit to Denmark, see it here. He said a lot of things, but one of his points actually explains the power in Delphi: When you look at the amount of learning that a new programmer has to do, the tools and libraries are now much bigger than the language. Delphi has good tools and libraries, and that's important.

Anders Hejlsberg also mentioned, that most languages will be static and dynamic in the future. I totally disagree. The main reason why Python and PHP are so popular, is that the learning curve is not steep. If python would become a .net language, it's tools and library would explode in size, and if static typing would be applied, language complexity would increase, making it unsuitable for a part of its audience. It seems that Anders Hejlsberg has forgotten usability - it also applies to programming languages. There is a need for entry-level languages.

Anders also mentions that we're working towards an ever increasing level of abstractions, and that we need to continuously invent new programming languages to test new methods. That's fine with me, as long as I don't need to invest in source code based on these obviously temporary languages. Adding more and more features to C# makes it bloated. In other words, Microsoft is facing the choice between bloat or many languages, and seems to pick both, just to be safe.

Anders's presentation is tainted by his employment at Microsoft. Therefore, we need to remember the background for his presentation. The world of general-purpose programming languages today largely consists of these groups (based on the TIOBE index):

* Open Source compilers: C, C++, Java
* Microsoft .net: C#, VB
* Delphi
* Low performance scripting languages

In order to really understand the difference between these groups, you need to look at the forces behind.

Open Source compilers will preserve backwards compatibility. They are cross-platform which makes some things more complicated (Write once test anywhere). You can do anything you want with C and C++, but development costs are huge. Java is very strong and probably the most widely used today. However, Java also has problems.

Microsoft compilers always face the threat, that Microsoft can earn more money by not being loyal to their programmers. They have done that multiple time in the past, and every time it meant big expenses for developers.

Delphi is owned by Embarcadero, and Delphi is one of their main products. They need to make Delphi good and be loyal to developers. And yes, we can still compile 25 year old Turbo Pascal code.

Saturday 8 November 2008

Working with Delphi 2009

My main work is with Delphi 2006, but more and more of our source code now compiles with Delphi 2009, and I also created some tools with Delphi 2009.

Converting existing source code is extremely simple, as long as the source code is written nicely, and is about databases, user interfaces etc. However, direct Windows API calls need to be checked - especially where you have an "array of char" and pass the sizeof(array) as parameter and similar constructs... the char is now widechar and sizeof(array) is no longer the character count. Often, you will find this kind of code in 3rd party components or small code snippets that you get from other people. Sometimes you can fix things by replacing char with ansichar, sometimes you want to use the Unicode API and therefore need to replace sizeof(array) with length(array).

CodeGear has done a lot of make simple I/O simple. Many things just work, but some things don't. TStream.Write (str[1],length(str)) will fail, because the second parameter is the byte count. Rewrite to TStream.Write (str[1], sizeof(char)*length(str)) or make str an ansistring. In other words, you will need to fix Windows API calls and advanced I/O.

If you have previously created an application, that handled unicode, you may have stored utf-8 encoded stuff in TStrings objects, and used various character sets in various parts of your program. You can still do that, but some functions now only work with string and not ansistring, and the easiest solution is often to make everything use unicode, and convert to/from unicode at I/O and APIs. When fixing all this, it feels really good - it's like cleaning up your desk and the source code gets simpler.

You may think: I can just replace all "string" with "ansistring". No, you cannot. Many functions like copy(), insert() etc. will no longer work with ansistring. Delphi will convert your ansistring to string first, assumping that your ansistring contains text in the local character set, and that assumption is not always correct.

Also, there is one special character that has stopped working in string types: #0

If you assign s:='Hello'#0'World', then s will contain 'Hello'. The reason is, that string is now strictly for text purposes, and #0 is not text - it's a binary code. This was probably the most tricky problem that I have encountered, because I had to convert some code that did this. Fortunately, it only took about 5 minutes to fix. If you're searching for a replacement code, consider #12 (Form feed). I don't think anybody uses that code today, and it gets converted nicely between character sets. However note, that TCharacter.IsWhitespace() will treat it as whitespace, and that #12 is not the end of a PChar string, in case you're doing complicated byte gymnastics.

It's not plug & play to use old source code in Delphi 2009, but conversion is fairly easy, and normally you will not need to be familiar with source code in order to convert it easily. This is good, because it means that you can easily take another person's source code, and convert it.

When writing new apps in Delphi 2009, it feels really, really good. The quick startup feels good, texts and raw binary data is automatically separated into different datatypes, and mixing these up unintentionally gives nice warnings. It feels as if Delphi is helping you more now, than before. If you need advanced ways to save data, the new Generics features can make the source code much more readable than in previous versions. Simply derive from TList in the Generics.Collections unit:

type
  TPairTitleId=
    class
      ComboBoxTitle:string;
      DatabaseId:integer;
    end;
  TPairTitleIdList=Generics.Collections.TList<TPairTitleId>;

Now you can refer to things like:

function dummy (list:TPairTitleIdList);
var
  i:integer;
begin
  i:=list.Items[3].DatabaseId;
end;

It is a significant productivity enhancement if used wisely. There is no need to use the familiar "TStrings.Objects[i] as TMyClass" any more.

Delphi 2009 is a significant step forward. It does everything your old Delphi does, but significantly better and easier.

Friday 7 November 2008

C# - C Doublecross

I wonder how many English-language people actually know what C# is in other languages. In Danish, you could either adopt the Microsoft-English "C Sharp" version, or use the normal localized word for # and get "C havelåge", which basically means "C garden gate".

I have no clue how you can make the symbol "#" become "Sharp" - to me, it looks more like a pillow. Based on various languages, it could also be:

* C garden fence (from German)
* C pig's fence
* C Carpet
* C Gnarl

But why not just call the symbol what it is? C doublecross.

Wednesday 5 November 2008

New PO file editor: Gorm

In case you are using PO files for internationalizing your applications, there is now a new Open Source editor around: Gorm. Features include:

* Labelling/tagging of translations, so that you can make a translator translate a subset instead of everything
* Automatic error discovery (like when format-string translations are not correct or when the translation has different spacing than the original)
* Spaces are shown using dots
* Display of secondary translation when writing texts for other translations. This is very useful if one person knows 3 languages, and wants to cross-check, or if the translation of one application should be a guide for the translation of another application.
* Integrates with Google Translate
* Text filtering of items
* Separate display of source code position and programmer comments
* No installation required at translator - the exe file runs directly when double-clicking it.
* Focus on usability for non-programmers.
* Tag-based statistics

The application is still under development, but ready for production use. Because it is still under development, you need to go to the GNU gettext for Delphi forum to get information about it.

Gorm was named after Gorm the Old, King of Denmark 900-940 A.D., father of Harold Bluetooth, whose name got famous when it was used for the Bluetooth technology in mobile telephones.

Gorm was made using Delphi 2009, which made Unicode easy.

Tuesday 4 November 2008

Layers in programming

I just got inspired by David Intersimone's latest blog posts about the history of Delphi:

Modern programming is like Onions. It stinks? Yes. No. They make you cry? Yes. NO. Layers. Onions have layers. Modern programming has layers. Onions have layers. You get it?

(Which reminds me that I need to buy the latest Shrek movie)

When adding layers, always remember that layers add latency.

Monday 3 November 2008

Delphi bigger than C#? The TComPort history

Just a few years back Delphi's future was uncertain, Borland focused on other products and Delphi was on its way down. Now, the TIOBE index for November indicates that Delphi is number 8, just slightly below C#. Is that really correct?

The TIOBE index is basically a web search index, and some may argue that it basically reflects a very strong Delphi community. However, there are many other signs of a new strength in Delphi.

One of them is the comport project, a 3rd party component for RS232 for Delphi. Originally developed by Dejan Crnila, he eventually moved on in his life, abandoning his original project, and stopped paying for the hosting of his project homepage. Like many other 3rd party components in Delphi, the source code was available, and I asked Dejan if I could move it to SourceForge and save it for those that were still using it.

I moved it, and everything was very quiet for a very long time. Then, the company TurboPower was bought by another company who wanted the developers, abandoning their products. They made really good 3rd party components, and this seemed like a real problem for many, but the good and clever guys at TurboPower managed to Open Source the components. TurboPower Async Pro was one of them, and it did almost the same as the Comport component. I started to recommend the use of Async Pro, in order to avoid spending more time on this - but I kept the sourceforge project page up in order to help out existing projects.

Then, something happened. CodeGear was split off, and put up for sale. Focus on Delphi reemerged. I started to receive e-mails about the comport component. Embarcardero bought CodeGear, and Embarcardero is a company that keeps focusing on Delphi as a product. The number of e-mails about the comport component is continuing to go up. Why would anybody use an old, abandoned component, where the maintainer recommends another product? The reasons seem to be: It's simple, it works, it's free. It solves real-life problems. And even more important: The programmers are starting up new projects based on Delphi.

We now have a new maintainer on the comport component, and a 4.0 beta release.

Saturday 25 October 2008

Is software creating financial bubbles?

Alan Greenspan has long praised computer technology as a tool that can be used to limit risks in financial markets, but recently he acknowledged that the data fed into financial systems was often a case of garbage in, garbage out, indicating that this has led to huge trouble. Have bad IT systems been deployed elsewhere on this planet? Yes. Will the world continue to do so? Yes. Why?

If you look at a number of ideas for software systems, then some of these will definitely not make sense, some of them will make a lot of sense, and then there is a huge group in between. In this middle group, it is difficult to evaluate them, sometimes even after deploying the software. A famous person once said, it is often easy to measure things, but difficult to understand what is being measured, and this applies very well to software.

The dot com boom was based on the assumption, that the productivity gains in Software are so huge, that the value of many things would go up, a lot. Expectations were too high. Why? Because software doesn't deliver that kind of results, that fast. Resources for Software projects are allocated for the wrong projects, and many projects are doing something wrong.

Why cannot we just do it the right way? There are many reasons, but the single most important reason is, that there is no single right way that fits all purposes, and it is therefore impossible to make one recommendation for all. The best "single right way" that I have seen so far is real "Agile", meaning that you need to adapt all the time. In other words, a very difficult concept to teach. And now we're at the core of why not all Software projects are a huge success:

It's difficult.

Humankind is unable to do everything right. We will never get rid of that middle group of software systems, where we don't really know if they were a success or not.

Software is not much different than other technologies, like chemistry or electronics: Some people are making huge progress, others not, and when the good ideas get deployed, world productivity improves. Some software is great, but software as such is not a silver bullet by itself.

I think we should start to try to identify the biggest successes, in order to learn from these. Maybe we should have a Nobel prize for Software?

Wednesday 22 October 2008

Funny comment in system.pas

In Delphi 2009's system.pas, line 1475, I found this comment:

TTextBuf = array[0..127] of AnsiChar; // TODO: change to WideChar

I wonder why they didn't check their todo items before shipping.

Tuesday 21 October 2008

The problem with С in programming

The problem with С in programming is, that this Delphi example actually compiles and executes without problems:

procedure TForm3.Button4Click(Sender: TObject);
var
  c:integer;
  с:integer;
begin
  c:=2;
  с:=3;
  Assert (c<>с);
end;

The first c is the latin letter, and the second с is the cyrillic letter. Both are on the same position in the U.S. and Russian keyboard layouts, and I have seen these two been mixed up several times.

Thursday 16 October 2008

Widestring 4545 times slower than unicodestring

I noticed that several people, in comments and in other blogs, compared the number of seconds that was spent for each benchmark in my previous post. I presented both the time spent, the number of iterations and the number of iterations per second, and it is the last number that is interesting. In order to fix that, I have now removed the time measurements from that post.

For that same reason, several people wondered why I did not like widestring. The main reason why I recommend not to use widestring is this one:

// approx. 25 million iterations per second
u:='';
for i:=0 to 100000000 do begin
  u:=u+' ';
end;

// approx. 0.0055 million iterations per second
w:='';
for i:=0 to 100000 do begin
  w:=w+' ';
end;

Note, how widestring is extremely slow for this specific test. This is the kind of stuff that can make a well made application perform really bad. A TCP/IP ping request between two servers on a good network uses less time than it takes to add a space to a widestring on my reasonably fast laptop.

Tuesday 14 October 2008

Delphi 2009 string type performance benchmark

This code was run on a Intel Core 2 laptop, and shows the difference in performance very well. Compiler options used: Code optimization disabled, all checks on.

procedure TForm3.Button3Click(Sender: TObject);
var
  a:ansistring;
  r:rawbytestring;
  u:string;
  w:widestring;
  i:integer;
  s:shortstring;
  c:char; // widechar
  ac:ansichar;
begin
  screen.Cursor:=crHourGlass;
  try
    // approx. 222 million iterations per second
    s:='This is a test';
    for i:=0 to 1000000000 do begin
      ac:=s[4];
      s[4]:=s[5];
      s[5]:=ac;
    end;

    // approx. 43 million iterations per second
    a:='This is a test';
    for i:=0 to 1000000000 do begin
      ac:=a[4];
      a[4]:=a[5];
      a[5]:=ac;
    end;

    // approx. 40 million iterations per second
    u:='This is a test';
    for i:=0 to 1000000000 do begin
      c:=u[4];
      u[4]:=u[5];
      u[5]:=c;
    end;

    // approx. 71 million iterations per second
    w:='This is a test';
    for i:=0 to 1000000000 do begin
      c:=w[4];
      w[4]:=w[5];
      w[5]:=c;
    end;

    // ****************************

    // approx. 40 million iterations per second
    for i:=0 to 100000000 do begin
      u:='This is € test';
    end;

    // approx. 5.5 million iterations per second
    for i:=0 to 100000000 do begin
      a:='This is € test';
      u:=a;
    end;

    // approx. 5.5 million iterations per second
    for i:=0 to 100000000 do begin
      u:='This is € test';
      a:=u;
    end;

    // approx. 3.7 million iterations per second
    for i:=0 to 100000000 do begin
      u:='This is € test';
      w:=u;
    end;

    // ****************************

    // approx. 3.7 million iterations per second
    s:='';
    for i:=0 to 100000000 do begin
      s:=copy(s+' ',1,50);
    end;

    // approx. 4.2 million iterations per second
    a:='';
    for i:=0 to 100000000 do begin
      a:=copy(a+' ',1,50);
    end;

    // approx. 2.5 million iterations per second
    u:='';
    for i:=0 to 100000000 do begin
      u:=copy(u+' ',1,50);
    end;

    // approx. 1.6 million iterations per second
    w:='';
    for i:=0 to 10000000 do begin
      w:=copy(w+' ',1,50);
    end;

    // ****************************

    // approx. 25 million iterations per second
    r:='';
    for i:=0 to 100000000 do begin
      r:=r+' ';
    end;

    // approx. 25 million iterations per second
    a:='';
    for i:=0 to 100000000 do begin
      a:=a+' ';
    end;

    // approx. 25 million iterations per second
    u:='';
    for i:=0 to 100000000 do begin
      u:=u+' ';
    end;

    // approx. 0.0055 million iterations per second
    w:='';
    for i:=0 to 100000 do begin
      w:=w+' ';
    end;
  finally
    screen.Cursor:=crDefault;
  end;
end;

Conclusion:
* Avoid widestring and shortstring.
* UnicodeString is a huge improvement to WideString.

Raw binary data in Delphi 2009 strings, by example

This code snippet explains by example how you can use binary data in strings in Delphi 2009:

const
  AllByteValues=
    #$00#$01#$02#$03#$04#$05#$06#$07#$08#$09#$0a#$0b#$0c#$0d#$0e#$0f+
    #$10#$11#$12#$13#$14#$15#$16#$17#$18#$19#$1a#$1b#$1c#$1d#$1e#$1f+
    #$20#$21#$22#$23#$24#$25#$26#$27#$28#$29#$2a#$2b#$2c#$2d#$2e#$2f+
    #$30#$31#$32#$33#$34#$35#$36#$37#$38#$39#$3a#$3b#$3c#$3d#$3e#$3f+
    #$40#$41#$42#$43#$44#$45#$46#$47#$48#$49#$4a#$4b#$4c#$4d#$4e#$4f+
    #$50#$51#$52#$53#$54#$55#$56#$57#$58#$59#$5a#$5b#$5c#$5d#$5e#$5f+
    #$60#$61#$62#$63#$64#$65#$66#$67#$68#$69#$6a#$6b#$6c#$6d#$6e#$6f+
    #$70#$71#$72#$73#$74#$75#$76#$77#$78#$79#$7a#$7b#$7c#$7d#$7e#$7f+
    #$80#$81#$82#$83#$84#$85#$86#$87#$88#$89#$8a#$8b#$8c#$8d#$8e#$8f+
    #$90#$91#$92#$93#$94#$95#$96#$97#$98#$99#$9a#$9b#$9c#$9d#$9e#$9f+
    #$a0#$a1#$a2#$a3#$a4#$a5#$a6#$a7#$a8#$a9#$aa#$ab#$ac#$ad#$ae#$af+
    #$b0#$b1#$b2#$b3#$b4#$b5#$b6#$b7#$b8#$b9#$ba#$bb#$bc#$bd#$be#$bf+
    #$c0#$c1#$c2#$c3#$c4#$c5#$c6#$c7#$c8#$c9#$ca#$cb#$cc#$cd#$ce#$cf+
    #$d0#$d1#$d2#$d3#$d4#$d5#$d6#$d7#$d8#$d9#$da#$db#$dc#$dd#$de#$df+
    #$e0#$e1#$e2#$e3#$e4#$e5#$e6#$e7#$e8#$e9#$ea#$eb#$ec#$ed#$ee#$ef+
    #$f0#$f1#$f2#$f3#$f4#$f5#$f6#$f7#$f8#$f9#$fa#$fb#$fc#$fd#$fe#$ff;
  RawByteTest=
    RawByteString(AllByteValues);
  GreekTest=
    GreekString(AllByteValues);
  AnsiTest=
    ansistring(AllByteValues);

procedure TForm3.Button2Click(Sender: TObject);
var
  i:0..255;
  ErrorList:string;
  c:char;
  ac:ansichar;
  utf16:string;
begin
  Assert (length(AllByteValues)=256,'The number of characters is just like in Delphi 2006');
  Assert (sizeof(AllByteValues)=4,'This is a pointer');
  Assert (sizeof(AllByteValues[1])=2,'But each character is now 2 bytes');
  Assert (AllByteValues[1]=#0);
  Assert (length(RawByteTest)=256);
  Assert (sizeof(RawByteTest)=4,'This is a pointer');
  Assert (sizeof(RawByteTest[1])=1,'Using RawByteString in a const the bytes stay as they are');
  Assert (RawByteTest[1]=#0);
  Assert (RawByteTest[1]=char(0));
  Assert (RawByteTest[1]=chr(0));
  ac:=#0;
  Assert (RawByteTest[1]=ac);
  c:=#0;
  // Assert (RawByteTest[1]=c);    // This line does not compile! - AnsiChar and Char are absolutely not compatible in any way.
  Assert (ord(RawByteTest[1])=ord(c));    // This compiles nicely

  // Demonstrate how #128..#159 does not exist in Unicode and therefore causes big trouble!
  ErrorList:='';
  for i:=0 to 255 do begin
    if ord(AllByteValues[i+1])<>i then
      ErrorList:=ErrorList+IntToStr(i)+' ';
  end;
  Assert (ErrorList='128 130 131 132 133 134 135 136 137 138 139 '+
    '140 142 145 146 147 148 149 150 151 152 153 154 155 156 158 159 ',
    'These values are not saved in a string in the way you would expect!!');

  // GreekString also destroys constants with binary data
  ErrorList:='';
  for i:=0 to 255 do begin
    if ord(GreekTest[i+1])<>i then
      ErrorList:=ErrorList+IntToStr(i)+' ';
  end;
  Assert (ErrorList='136 138 140 142 152 154 156 158 159 161 162 170 175 '+
    '180 184 185 186 188 190 191 192 193 194 195 196 197 198 199 200 201 '+
    '202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 '+
    '219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 '+
    '236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 '+
    '253 254 255 ',
    'These values are not saved in a string in the way you would expect!!');

  // RawByteString stores all bytes correctly
  for i:=0 to 255 do begin
    Assert (ord(RawByteTest[i+1])=i);
  end;

  // Ansistring also stores all bytes correctly (tested on a Windows-1252 machine)
  for i:=0 to 255 do begin
    Assert (ord(AnsiTest[i+1])=i);
  end;

  // Most common ansistring stuff works as expected
  Assert (ord(AnsiTest[129])=128);
  Assert (AnsiTest[129]=#128);
  Assert (copy(AnsiTest,129,1)=#128);
  Assert (MidStr(AnsiTest,129,1)=#128);
  Assert (pos(#128,AnsiTest)=129);

  // The same functions using UnicodeString
  utf16:=AllByteValues;
  Assert (ord(utf16[129])=8364);
  Assert (utf16[129]=#8364);
  Assert (copy(utf16,129,1)=#8364);
  Assert (MidStr(utf16,129,1)=#8364);
  Assert (pos(#128,utf16)=129);   // #128 is converted to #8364 before calling the widestring version of pos()
  Assert (pos(#8364,utf16)=129);

  // Don't copy raw binary data into an utf-16 string type!
  utf16:=RawByteTest;
  ErrorList:='';
  for i:=0 to 255 do begin
    if ord(utf16[i+1])<>i then
      ErrorList:=ErrorList+IntToStr(i)+' ';
  end;
  Assert (ErrorList='128 130 131 132 133 134 135 136 137 138 139 140 142 145 '+
    '146 147 148 149 150 151 152 153 154 155 156 158 159 ',
    'These values are not saved in a string in the way you would expect!!');

  // Windows automatically handles unsupported byte values in strange ways.
  c:=#128;
  Assert (ord(c)<>128);
  Assert (ord(c)=8364);
  Assert (c='€');

  ac:=#128;
  Assert (ord(ac)=128);
  Assert (ord(ac)<>8364);
  Assert (ac='€','Here, ac is converted to a utf-16 string type using local character set');

  // Don't use inc() or dec() with utf-16. It works, but it's not good
  utf16:=#127;
  Assert (ord(utf16[1])=127);
  inc (utf16[1]);
  Assert (ord(utf16[1])=128);
  Assert (utf16[1]<>#128); // Because #128 becomes #8364
  Assert (#128=#8364);     // as you can see here
end;

Conclusion: Always use RawByteString or AnsiString for binary data, and never store binary data in other string types.

Wednesday 8 October 2008

Delphi 2009 strings explained by example

This code snippet explains by example how the new string types work:

type
  OemCp437=type ansistring(437);
  CyrillicString=type ansistring(1251);
  DanishString=type ansistring(1252);
  GreekString=type ansistring(1253);
  usascii=type ansistring(20127);
  Iso88591String=type ansistring(28591);
  Iso885915String=type ansistring(28605);
  utf7string=type ansistring(65000);

  // These will not work, but will compile
  utf16le_string=type ansistring(1200);
  utf16be_string=type ansistring(1201);
  utf32_string=type ansistring(12000);
  utf32be_string=type ansistring(12001);

procedure TForm3.Button1Click(Sender: TObject);
var
  utf16:string;
  local:ansistring;
  raw:rawbytestring;
  utf8:utf8string;
  utf7:utf7string;
  cyrillic:CyrillicString;
  danish:DanishString;
  greek:GreekString;
  iso88591:Iso88591String;
  iso885915:Iso885915String;
  Cp437:OemCp437;
  ascii:usascii;
  utf32:utf32_string;
begin
  // Ansistring cannot be used for utf16 and utf32
  utf32:='asdf';
  Assert (utf32='');

  // Demonstrating what UTF-16 is
  utf16:=#$1D160;            // This is a musical note (000011101000101100000), see http://unicode.org/charts/PDF/U1D100.pdf
  Assert (length(utf16)=2);  // This character occupies 2 positions in UTF-16
  Assert (utf16[1]=#$D834);  // 110110 0000110100 First half of the symbol
  Assert (utf16[2]=#$DD60);  // 110111 0101100000 Second half of the symbol
  utf8:=utf16;
  Assert (length(utf8)=4);
  Assert (utf8[1]=#$F0);   // 11110 000
  Assert (utf8[2]=#$9D);   // 10 011101
  Assert (utf8[3]=#$85);   // 10 000101
  Assert (utf8[4]=#$A0);   // 10 100000
  danish:=utf16;
  Assert (danish='??');    // Note how Windows incorrectly converts to two letters!
  Assert (length(danish)=2);
  danish:=utf8;
  Assert (danish='??');    // Note how Windows incorrectly converts to two letters!
  Assert (length(danish)=2);

  // Demonstrating the euro character
  utf16:='€';
  danish:=utf16;
  cyrillic:=utf16;
  greek:=utf16;
  iso88591:=utf16;
  iso885915:=utf16;
  Cp437:=utf16;
  ascii:=utf16;
  utf8:=utf16;
  utf7:=utf16;
  Assert (length(utf16)=1);
  Assert (length(danish)=1);
  Assert (length(cyrillic)=1);
  Assert (length(greek)=1);
  Assert (length(iso88591)=1);
  Assert (length(iso885915)=1);
  Assert (length(Cp437)=1);
  Assert (length(ascii)=1);
  Assert (length(utf7)=5);
  Assert (length(utf8)=3);
  Assert (ord(utf16[1])=8364);
  Assert (ord(danish[1])=128);
  Assert (ord(cyrillic[1])=136);
  Assert (ord(greek[1])=128);
  Assert (ord(iso885915[1])=164);
  Assert (iso88591='?');
  Assert (ascii='?');
  Assert (Cp437='?');
  Assert (greek=utf16);
  Assert (danish=utf16);
  Assert (cyrillic=utf16);
  Assert (utf7=utf16);
  Assert (utf7=utf8);
  Assert (iso885915=utf16);
  Assert (iso88591<>utf16);
  Assert (Cp437<>utf16);
  Assert (ascii<>utf16);
  Assert (cyrillic=danish);

  // Convert from Unicode to special character sets
  utf16:='abc ÆØÅ рыба'; // s uses utf-16
  local:=utf16;  // Converts to local 8-bit character set
  raw:=utf16;    // Converts to local 8-bit character set
  utf8:=utf16;   // Converts to utf-8
  cyrillic:=utf16;
  danish:=utf16;
  greek:=utf16;
  Cp437:=utf16;
  ascii:=utf16;
  utf7:=utf16;
  Assert (cyrillic='abc ?OA рыба');
  Assert (danish='abc ÆØÅ ????');
  Assert (greek='abc ?OA ????');
  Assert (greek='abc ?OA ????');   // Æ => ?
  Assert (Cp437='abc ÆOÅ ????');   // Ø does not exist
  Assert (ascii='abc AOA ????');   // Æ => A
  Assert (length(utf16)=12);
  Assert (length(local)=12);
  Assert (length(raw)=12);
  Assert (length(utf8)=19);
  Assert (length(utf7)=28);
  Assert (length(Cp437)=12);
  Assert (length(cyrillic)=12);
  Assert (length(danish)=12);
  Assert (length(greek)=12);
  Assert (length(ascii)=12);

  // Converts to Unicode
  utf16:=danish;
  Assert (utf16='abc ÆØÅ ????');
  Assert (length(utf16)=12);
  utf16:=cyrillic;
  Assert (utf16='abc ?OA рыба');
  Assert (length(utf16)=12);
  utf16:=utf8;
  Assert (utf16='abc ÆØÅ рыба');
  Assert (length(utf16)=12);

  // The following lines only work correctly if your local character set
  // is Windows-1252!
  utf16:=raw;
  Assert (utf16='abc ÆØÅ ????');
  Assert (length(utf16)=12);

  raw:=cyrillic;
  local:=cyrillic;
  Assert (local='abc ?OA ????');
  Assert (raw<>local);   // raw preserves cyrillic letters and the character set
  Assert (length(raw)=12);

  raw:=danish;
  local:=danish;
  Assert (raw=local);
  Assert (raw='abc ÆØÅ ????');
  Assert (local='abc ÆØÅ ????');
  Assert (length(raw)=12);

  raw:=greek;
  local:=greek;
  Assert (raw='abc ?OA ????');
  Assert (local='abc ?OA ????');
  Assert (raw=local); // This is only true because the string doesn't contain greek letters
  Assert (length(raw)=12);
end;

If you are in doubt about how to use ansistring and RawByteString, use this guideline:

* Use the normal (unicode) string type as much as you can.
* Use ansistring for texts in local 8-bit character sets. Usually it is only used for I/O.
* Use RawByteString for parameters to functions that have to work on all kinds of ansistrings, without triggering character set conversions, like I/O functions. This is really only necessary if you mix various character sets, which is rarely the case. Most programmers will only very rarely use RawByteString.
* Use RawByteString for storing binary data - but ansistring also works. Make sure that you don't assign binary data to/from UnicodeString=string. Note that most string manipulation functions now expect the unicode string type, so you may need to implement some things yourself.

If you want to make code work with both Delphi 2009 and previous, you can insert this into your source:

{$ifndef UNICODE}
type UnicodeString=widestring;
type RawByteString=ansistring;
{$endif}

Use UnicodeString wherever you used widestring before, unless it's really widestring that you want to use (for BSTR compatibility). Program the rest using string wherever you can, and ansistring in some I/O operations. Most of the VCL already defaults to ansistring for non-Unicode I/O, making things very backwards compatible.

Monday 6 October 2008

Menus or Office 2007 toolbars?

I notice that Google just changed their spreadsheet user interface from the Office 2007 toolbar style to the good old TMainMenu-like user interface. Nice. I guess I made the right choice, when I chose not to install the specially licensed Office 2007 components with my Delphi 2009.

Saturday 4 October 2008

High performance apps in Delphi

Poul-Henning Kamp just made a very good presentation on how he developed Varnish, a http accelerator that is much faster that using Squid in front of a slow CMS system.

Most of the methods, that PHK describes are very easy to implement in Delphi, so it's worth having a look at. Unfortunately, I only found this Danish language presentation, which can hopefully understood by most Scandinavians - but I know that he has presented it in other languages, too - so if somebody has a link to an English version, please provide the link.

Monday 29 September 2008

GNU gettext for Delphi 2009

I just spent a little time testing dxgettext with Delphi 2009, and it was easy to make it work with the new string types, even though this code is really exploiting a lot of the things you can do with character sets. This also means that there is new stuff on the homepage. It is nice to see that CodeGear is back in business, providing seriously useful enhancements while keeping backwards compatibility.

Friday 26 September 2008

Why I like Google Chrome as a software developer

Chrome has a number of caveats, which have been described elsewhere very well. However, it also has a lot of very good features. First, the obvious, which many other reviews tell about, too:

* It starts very, very fast, faster than MSIE and Firefox on my PC.
* When a web page loads the Java runtime for the first time, it doesn't slow down other tabs.
* When something crashes, only one tab crashes. I only tried this once, but was very thankful that I didn't lose much.
* Javascript websites are fast. Really fast. If you ever use Google Docs or similar, you will never go back to MSIE or Firefox (unless they improve, of course).
* It autoupdates, just like other Google apps. This is very convenient for your old grandmother who could be scared by upgrade notifications.
* Reopen a tab that was accidentally closed is easy.
* Start page includes most visited sites, with screenshots.

Now the less obvious:

* It can create a "program shortcut", which starts Chome in app mode. This is interesting for software developers, because it makes a web page behave more like an application. No navigation buttons, no dragging into other Chrome windows.
* Incognito mode is very useful to testing websites, because it allows logins and cookies, but doesn't use the same logins and cookies as the normal part of the browser.
* Incognito mode can be very useful for presentations, if you don't want the audience to know what websites you have recently been using.
* When searching inside a web page (Control-F), the scrollbar indicates, where on the web page the search results are.
* I have two screens, and if I have a chrome on each, I can drag a tab from one screen to the other. Extremely nice. This is how most programs should work.
* It is extremely easy to add a custom search. For instance, I want easy access to our Mantis Bug Tracker, and it is extremely easy to configure chrome, so that I can just type "mantis 123" in the URL, and it will show me the issue number 123 in my mantis system. Even the start page automatically gets a mantis search box. In similar ways, you can easily add search engines for your CRM system, your intranet wiki etc. Type "wiki phone" and you have your phone list, or "crm smith" and you have a list of the customers named Smith.
* It automatically creates custom search engines for you. If you have visited dk.php.net and made a search, you can just type "dk.php.net fopen" in the url, and it will activate a search on dk.php.net for the search term "fopen". I renamed that search engine to php, so that I just need to type "php fopen" to get the fopen page.

Important tips when using Chrome:

* Use Alt-Home to switch a page to the start page.
* Use Alt-Left and Alt-Right to go to previous and next page (just like inside Delphi)
* Use Control-T to open a new tab
* Use Control-F to search inside a web page
* Use Control-L to type a new URL
* In order to add a custom search, right click on the URL and edit search engines.

Thursday 25 September 2008

The next big thing: local apps

Imagine an application, that is faster, snappier, more productive, prettier and much cheaper to develop than web 2.0 apps. It integrates perfectly with other applications that you use, and it even works offline. Sound impressive? Well, it's here. It's called a GUI app. It has actually been around for many, many years, even before the world wide web was invented.

Why did I write this? Sometimes it seems that people don't know this.

Thursday 11 September 2008

Why you need Delphi 2009

Many developers don't consider Unicode necessary. After all, the ansistring type takes half the amount of memory and is often guilty of making Delphi apps much faster than .net and java apps, and you don't think that you need those additional characters anyway.

However, there are a few characters that may already have caused you problems. For instance, the euro symbol (€). It does not exist in the ISO-8859-1 character set! In Windows-1252, it is encoded as #$80, and in ISO 8859-15 it is encoded as #$A4. You may even use an utf-8 converter, that basically converts the ansi bits into utf-8 bits, without repositioning the euro symbol - and that's also wrong, because in utf-8, you need to use the Unicode code point #$20AC. If you're exchanging CSV or ansi files, but want to include the euro symbol, you may experience problems with other character sets. For instance, in cyrillic character sets, euro has position #$88. All this gets much easier with Delphi 2009.

Does it get really, really easy? That depends on your ambitions. Many believe that Unicode is about a 16-bit character encoding, but that's wrong. That would only cover 16 bits, and Unicode has more than that. It is possible to have special symbols with more than 16-bit, and in order to handle that, Windows (and .net and Java) uses UTF-16 encoding, where one symbol may use 2 or 4 bytes. Most programmers will ignore that fact, and your program will still work nicely with Unicode. CodeGear even implemented a widechar which is only 16-bit, and uses that in Delphi 2009 VCL. The Unicode standard ensures that all normal characters are encoded using 16 bit, and will usually do. In almost all normal systems, one character is 16 bit, and that's really, really easy.

One example of a group of characters that is not 16-bit could be musical symbols.

Thursday 28 August 2008

Goodbye Quicksort, hello Mergesort

Sorting is not the bottleneck in most modern applications, but we use it a lot, and sometimes you need to sort data where comparison keys cannot fit in RAM. Therefore, it makes sense to pick the best sorting algorithm out there.

The typical choice is Quicksort, because it is perceived as being the fastest out there. It is easily implemented in a way that performs very well on most CPUs, performs extremely fast on almost sorted data, but it does have an o(n*n) worst case performance, no matter what.

Mergesort, on the other hand, has o(n*log(n)) as worst case, and a good implementation has o(n) as best case. The extra memory used is small compared to modern hardware, and using the right swap algorithm, it requires between o(1) and o(n) swaps, after finding the right order. Mergesort is a stable algorithm and it parallelizes well, if needed. The amount of memory needed can be determined at the beginning of the algorithm, and it would typically be 2*n*log2(n) bits, where n is the number of items to sort.

As RAM amounts increase and we get multiple CPUs, I believe that Mergesort will start to replace Quicksort, especially in cases where comparisons are expensive.

Tuesday 5 August 2008

Delphi Workshop in Nyborg

I have been asked to do a presentation about refactoring and source code cleanup at the Delphi Workshop in Nyborg, Denmark, on september 10th. I look forward to see some of you there!

Tuesday 29 July 2008

Google knol - perfect for programmers

Many sites are reporting about Google knol as a competitor to Wikipedia - I see it as something completely different: An easy way to write down some knowledge that you may have gained, in order to share it with others. My first knol is a copy of one of my blog posts: Win32 thread names in Delphi. This is a kind of information that doesn't belong into Wikipedia, but can be helpful to others.

I can only recommend all programmers to have a look at the knol principle, because it's an easy way to offload information in a way that makes you able to find it again easily, in case that you don't have anywhere else to put it. And maybe somebody will provide extra information that makes you learn more about the topic that you're working on.

Monday 28 July 2008

Language barriers when outsourcing programming

In case you're working with outsourcing, you may have experienced communication problems. English is my 3rd language, and when I communicate with people from other parts of the world, who have English as their 2nd or 3rd language, the message is not always understood as intended.

In order to communicate well, you need to know the most frequent causes of problems. This post describes some of these - when communicating in English with people that have Russian or a related language as their primary language.

Russian is a different mental model, so it requires some effort to communicate properly in English, and if the sum of language effort and programming effort has a maximum, it means that very complex programming issues may lead to more misunderstandings in language.

A very frequent problem is the lack of "a" and "the" in Russian. Example: "Make the program create a file for output. Export graphics into the file." Disregarding "the" means that you may end up having two files instead of one. A solution could be to replace "the" with "that" when you are referring to a specific file.

The next problem is about tenses. Russian basically has past, present and future, and completed and uncompleted verbs. It does not have the same tenses as English, and some English constructs are not made using tenses in Russian. Tenses are important when describing work processes. Example: "Yesterday at the meeting, I had given you instructions..." (the instructions were given before the meeting) If your programmer uses automated translation tools to translate this to Russian, the result is "Yesterday at the meeting I gave you instructions..." - which is totally incorrect and causes confusion.

Basic construction of sentences is also a problem. It can be very difficult to understand complex English sentences like "Moving the button up could cause some problems with the layout that was requested at the meeting.". It takes quite some analytical work to decompose this sentence into it's basic building blocks if your primary mental language model is not close to English.

Vocabulary is obviously a problem - if you don't know a word, you need to look it up. But even if you know a dictionary's translation of a word, you can get into trouble. Example: "Gossip was the main business of the evening". The word "Business" can be translated in many ways to russian, and in this sentence, it makes a big difference if you translate to "делом" (what to do) or "бизнес" (bisness=commerce).

Correct English. If you don't write correct English, it can be very hard to understand it, and it can be impossible for automatic translation tools to help out the reader. For instance, "The discusion was about maintanance." (2 spelling mistakes!) auto-translates to something like "xyz was about hobbies", whereas "The discussion was about maintenance" is autotranslated correctly, using my favorite translation tool.

Special thanks to Rikke, Anatoly and Alexander for their contributions to this post.

Thursday 10 July 2008

Win32 thread names in the Delphi IDE

The Delphi help only mentions how to do this using C++, and Google doesn't provide the solution in Pascal easily, so I thought that I'd better publish the solution here. In order to see names for your threads in the Delphi IDE while debugging your Win32 application, call SetCurrentThreadName() in your TThread.Execute method:


procedure SetCurrentThreadName(const Name: string);
type
  TThreadNameInfo =
    record
      RecType: LongWord;
      Name: PChar;
      ThreadID: LongWord;
      Flags: LongWord;
    end;
var
  info:TThreadNameInfo;
begin
  // This code is extremely strange, but it's the documented way of doing it!
  info.RecType:=$1000;
  info.Name:=PChar(Name);
  info.ThreadID:=$FFFFFFFF;
  info.Flags:=0;
  try
    RaiseException($406D1388, 0, 
      SizeOf(info) div SizeOf(LongWord), PDWord(@info));
  except
  end;
end;

Let's hope it gets easier to find on Google now.

Tuesday 8 July 2008

Code style of old age programmers

It seems that I have the same age as Jeff Atwood, but somehow not, when I read his latest post. Somehow I feel "been there, done that", because I would almost have agreed with him 10 years ago, while still being a freelance programmer, helping out in various programming teams. However, today I'd definitely say that maintainability is much more important than most programmers want to acknowledge.

Most programmers spend most of their time maintaining code, not writing new code. They may spend time on maintaining their own code - but it's still about maintenance. And actually, they spend much more time per SLOC when doing maintenance, than when writing new code. In other words, in order to be really productive, it's the maintenance part that needs to be optimized, unless you're doing a quick and dirty application that nobody is going to use (yeah, right!). The perfect source code is when it works perfectly AND cannot be easier to maintain. This includes simplicity, of course, but it also includes comments, understandable variable names, well defined context and well defined invariants.

Tuesday 1 July 2008

When to hide or disable menu items and buttons

I often disagree with Joel, also in his latest post about menu items. Hiding and disabling buttons and menu items is usually done in order to prevent, that the user triggers an event, and it's perfectly ok to do in some cases:

Hide buttons that the user wouldn't search for. For instance, a non-administrative user doesn't need to see the Admin button.
Disable buttons where it is obvious to the user why it isn't clickable. For instance, a well implemented undo functionality doesn't need to have it's menu item enabled before the user starts editing. That would actually confuse some users.
In all other cases, make the button visible and enabled, but show a message that explains to the user, why this functionality is currently not available, and what the user can do to make it available.

Saturday 28 June 2008

Intentionally add development costs

It is a common misunderstanding, that you can optimize a process to become optimal by optimizing subprocesses.

A good analogy is to find the highest hill by walking around in an environment with 100 meter visibility. If you always try to walk upwards, you may find a local hill. If you go downwards a bit, you may find a bigger hill. There are many good mathematical books on this topic.

When dealing with programmers, it's a bit more complex. However, every time your organization reorganizes itself, you probably experience exactly this: Initially higher costs, but in the long run costs are significantly lower. One reason is, that information is distributed.

In programming, the environment constantly changes. Your organization changes, your customers change, the technology changes. It's like having the hills change height all the time. This is where it is important to pick a hill to stand on. Pick a hill that is one of the highest, but not too far from the other hills. You will find such one easily, if you change hill frequently.

This is what a continuous improvement process is all about. It adds extra costs, but it ensures long term competitiveness.

Using Delphi can be described as being on a hill in a "Delphi" group of hills. Using Java and .net can be described in the same way. Erlang is a very small group of hills. One of the good things about Delphi is, that every time Borland/CodeGear add a new hill, it's close to the previous hills, and not too far from the Microsoft hills. This keeps the costs of continuous improvement down.

Wednesday 25 June 2008

Mary Poppendieck: 100% Scrum is not 100% Agile

Yesterday, we had a workshop with Mary at the IT University in Copenhagen. In case you don't know Mary, she is one of the most important people in Agile and Lean software development today. The workshop was arranged by BestBrains.

I can strongly recommend following Mary's advices. She is experienced with real life software development, with managing, and has many good points. And it's nice to hear her confirm, that being 100% Scrum is not being 100% Agile. I totally agree to that.

Thursday 5 June 2008

Use more than 4GB RAM with Win32

If you need to allocate more than 2GB RAM, some of this will probably be used for caching undo-levels, caching data retrieved via the network, calculation results etc. The Windows API actually has a feature, that may help you use as much RAM as possible, even more RAM than Win32 can access, in case you're using a 64-bit operating system. The trick is to use the CreateFile function with the flags FILE_ATTRIBUTE_TEMPORARY and FILE_FLAG_DELETE_ON_CLOSE. Adding FILE_FLAG_RANDOM_ACCESS can also be beneficial.

From the Win32 documentation: "Specifying the FILE_ATTRIBUTE_TEMPORARY attribute causes file systems to avoid writing data back to mass storage if sufficient cache memory is available, because an application deletes a temporary file after a handle is closed. In that case, the system can entirely avoid writing the data. Although it doesn't directly control data caching in the same way as the previously mentioned flags, the FILE_ATTRIBUTE_TEMPORARY attribute does tell the system to hold as much as possible in the system cache without writing and therefore may be of concern for certain applications."

The FILE_FLAG_DELETE_ON_CLOSE is useful to ensure, that the file does not exist on the harddisk, when the application has stopped, or has been aborted. The Win32 file APIs support large files, so there is basically no limit to the amount of data stored this way.

Compared to AWE, this method works on all Win32 platforms, doesn't use more RAM than what makes overall sense for the PC's current use and doesn't require the application to run with "Lock Pages in Memory" privilege.

Monday 2 June 2008

Multi-core CPUs are the result of many years of parallelization

Many blogs are currently discussing, whether multi-core CPUs are really necessary, or if other measures could be possible.

They seem to forget that parallelization has been going on for many years. It's not just the separation of the GPU from the CPU, and the math coprocessor, but we got DMA (unloading memory transfers from the CPU so that it can do other things), more bits per operation (4 bit, 8 bit, 16 bit, 32 bit, 64 bit), hyperthreading, parallelization of access to RAM, CPU instruction pipelines with instruction reordering in order to make all parts of the CPU work at the same time, etc. Even harddisks have become parallelized, featuring tagged command queueing that makes parallel operations faster. And every time you parallelize something, it usually means more transistors or more latency. Latency kills performance, which means that the solutions, that get implemented, are usually performance trade-offs.

It is many years ago, that an integer multiplication took 76 clock cycles. Yes, it did. I don't know how fast CPUs are today, but my guess is, that it doesn't take more than 1 clock cycle today. Doing a floating point division was at one time very fast - except that you needed to move the numbers to the math coprocessor before it could execute the division. Increased speed but increased latency.

When you compare performance differences between 1998 and 2008, you will notice that parallelizable operations like graphics, sound, huge data amounts etc. have improved a lot in speed. If the GPU can offload the CPU, the speed increase is huge. However, some things have not improved as much. It still takes 10ms for a hard disk to move the head, and if you have a 2008 dual-CPU machine where both CPUs write to the same RAM area, performance is usually slower than a 1998 single-CPU machine.

Most of the "easy" optimizations, that did not involve cooperation with programmers, are now fully exploited, and now we need the cooperation of programmers to achieve the next levels of performance. There are three options: Not exploiting parallelism, doing it yourself, using a platform that delivers parallelism to you without great efforts. Java is trying to do its part, and so will many other frameworks. But big improvements require programmers that understand parallelism, and can create it.

Is this significantly different from 1998? No. Good programmers have always created faster applications than less good programmers.

Friday 30 May 2008

64MB RAM, 200MHz, NT4 outperforms XP

Why is a standalone 10Mbps network with 10-20 200MHz PCs, running NT4 on 64MB RAM, able to outperform much larger networks with state of the art Windows XP machines? Well, it surely outperforms the systems of many larger organizations if you want to run a normal database application with a good server.

Thinking of it, it does make sense: Modern desktop CPUs are not the bottleneck any more, harddisk seek time has not improved significantly and without other applications and network traffic, 10Mbps LAN is faster and more stable than many WANs - it surely has a lower latency because of the geographic limit. Also, Windows XP (and Vista?) haven't introduced anything really revolutionary with regard to speed, so the bottleneck in many systems is really the latency on the WAN.

Conclusion: For some GUI applications, performance was not improved by the improvements in desktop PC performance during the last 10 years, because network latency is the most important performance bottleneck.

Friday 16 May 2008

GNU gettext for Delphi 2008 available

If you're experimenting with beta versions of Delphi 2008, there is now a version of GNU Gettext for Delphi available that works with the new UnicodeString string type, and supports Unicode everywhere.

It won't be released as part of the official package before I get my hands on a Delphi 2008 myself, but it has been designed to use UnicodeString according to the information made public by CodeGear.

In order to use it, use the latest official version, but replace gnugettext.pas with this version directly from the source code repository.

Tuesday 13 May 2008

The factor 2 principle

All ideas that intend to improve something less than 2 times are unambitious.

Monday 5 May 2008

Goodbye UTF-16

Google's data seem to indicate that UTF-8 is about to take over the WWW. I guess most Linux distributions already made the switch to UTF-8, making us-ascii, iso-8859-1, UCS-2, UTF-16 and others seem like technology of the past. I wonder who is still using these? I do. I'm a Windows programmer. :-(

Tuesday 29 April 2008

Localizing "file(s)" using ngettext

We all know the problem: We have a list of files, and we want to have a label that indicates the number of files.

In English, it's quite straightforward:

Label1.Caption:=IntToStr(FileCount)+' file(s)';

The usually internationalized version looks like this:

resourcestring FileCountStr='%d file(s)';

Label1.Caption:=Format(FileCountStr,[FileCount]);

Looks easy, right? It's not. In Polish, there are three plural forms:

1 plik
2 pliki
5 plików

Even french is different with regard to plural forms. In English, we use the plural form for zero (0 files, 1 file, 2 files), whereas the french use singular (0 fichier, 1 fichier, 2 fichiers).

With GNU gettext for Delphi, the solution is quite straightforward:

Label1.Caption:=Format(ngettext('%d file','%d files',FileCount),[FileCount]);

Ngettext() will automatically select the right plural form in the local language, even if there are 1, 2, 3 or 4 plural forms, and the translations tools (poedit and kbabel) provide good features for the translator to specify each plural form in good ways.

Saturday 26 April 2008

Agile is just a step in the right direction

Whenever I hear the term waterfall model, I start wondering: Which organizations would actually do that? Who has ever done programming without changing the specs along the way? Maybe the contracts or formal specs weren't changed during the project, but I still lack to see a big system that was actually programmed exactly as it was originally specified.

My first experience with XP was, that somebody put words on how things were actually done, added a few more techniques, and wrote a book. I really enjoyed that. Then came Agile as a term, because Scrum, RUP etc. were all trying to fix the same problem, and the Agile Manifesto. Brilliant.

But today it seems that something is going wrong. I see more and more Scrum fanatics, which are actually trying to discuss "is 14 days or 21 days the best iteration length?" That's actually a violation of the agile manifesto: You need to respond to change, and focus on people and interactions, instead of focusing on a process.

Looking retrospectively at the last many years, I can only conclude, that it's not a question of choosing a method for developing software. It's about being good at understanding and managing software projects.

I don't think that "Agile" is the last buzzword in software development. To some people it has already gotten the meaning "less predictability", whereas to others it means "Iterations". Others immediately think "Scrum". To me, Agile means better management. One day, someone will write a new book, with a new title, and that title will contain a new buzzword.

Thursday 24 April 2008

Delphi usage increases

I don't trust statistics that I didn't forge myself, but I found this statistics page interesting.

Wednesday 23 April 2008

Bloat grows more than PC speed

According to Infoworld, Microsoft Office is getting more bloated.

Bloat is not a problem by itself - everything that is written in a high level language is basically bloat. The amount of bloat is determined not only by the programmer, but also by the decision process that leads to the next product release. How important is performance in the sales process? Does the organization provide an environment that makes high-performance solutions possible?

If you want to create software that has good performance, start by asking your boss if he wants the application to perform well. If the answer is yes, the next question will be "how well?". Be prepared to answer questions about costs and benefits.

If it doesn't make economically sense to remove bloat, you need to spend your time on something more valuable. Fortunately for me, I work in a company that considers the user experience to be very important :-)

Thursday 3 April 2008

SYLK - storing spreadsheets in your source code repository

How do you collaborate with other programmers using a spreadsheet, if that spreadsheet needs to be stored in your source code version control repository? A good choice is the SYLK file format. It is easy to read in a text editor, is compact, supports merging of multiple edits or branches easily, and it supports formulas (tested with OpenOffice.org 2.4).

Wednesday 26 March 2008

Don't use TStringList for machine-readable text

TStringList is one of the most used classes in Delphi. It is very convenient for storing strings for the user (TMemo.Lines), storing parameters for components (TIBDatabase.Params), objects and many other items.

However, there are several problems with it. It's slow when sorting data (it uses Win32 API for comparing strings), but the dangerous part is that sorting and indexing is localized. This means that this code fails on my computer, but works on an American PC:

sl:=TStringList.Create;
sl.Sorted:=True;
sl.Add ('AA');
sl.Add ('AB');
Assert (sl.Strings[0]='AA');

The reason is simple. This is the Danish alphabet:

ABCDEFGHIKLMNOPQRSTUVXYZÆØÅ

By tradition, the last letter Å can also be written AA, and you can see how these two ways of spelling are mixed well on the homepage of the city of Århus. The correct sorter in Danish language is therefore:

Aachen
Aalto
Berlin
Copenhagen
Dresden
Essen
Frederikshavn
Aabenraa
Aalborg
Aarhus

In the first two words, AA means A and then A. In the last three words, AA means Å, which is the last letter in the alphabet. However, Windows doesn't know when AA means Å and when it means A A, so it always assumes that AA means Å, and always puts AA last.

Let's assume that you want to use a TStringList to save some kinds of codes in a specific order, like ATC codes. The first codes are:

A01AA01 Sodium fluoride
A01AA02 Sodium monofluorophosphate
A01AA03 Olaflur
A01AA04 Stannous fluoride
A01AA30 Combinations
A01AA51 Sodium fluoride, combinations
A01AB02 Hydrogen peroxide
A01AB03 Chlorhexidine
A01AB04 Amphotericin B
A01AB05 Polynoxylin

This is the Danish TStringList (and Windows) sort order:

A01AB04 Amphotericin B
A01AC02 Dexamethasone
A01AA30 Combinations
A02AB03 Aluminium phosphate
A02BA05 Niperotidine
A02AA05 Magnesium silicate

If you want to avoid that, then don't use TStringList.

Tuesday 25 March 2008

Multithreading in Java 7 - oh my god

I just saw this one about the new features in Java 7:

http://www.ibm.com/developerworks/java/library/j-jtp03048.html

First, the MergeSort example doesn't seem to compile. Correct me, if I'm wrong, I didn't try it. Secondly, they use MergeSort as an example of how to exploit multiple CPUs for sorting. Java 7 has the nice feature, that it can now decide at runtime, how many threads should be used to solve a particular problem (see the coInvoke part).

However, there is this tricky constant, SEQUENTIAL_THRESHOLD, which is used to decide whether to enforce sequential processing or not. How do you set this value? Well, you set it at design time, even though the example was meant to show how Java adapts at runtime...

The next thing is that the whole array is passed as parameter. No matter what programming language you use, this is a bad design. If Java doesn't copy the memory, you may have 2 CPUs looking at the same RAM area. If Java has a runtime optimization that detects that 2 CPUs are looking at the same area, and decides to copy the data, it will copy too much data...

I'm not sure this example would perform better on a 4-CPU machine than on a single-CPU machine with the same CPUs...

The basic problem in all this is, that it is extremely hard to find real world examples of parallelization of algorithms that can be optimized to any kind of parallel hardware. Good multithreading must be done on a functionality level, not on the algorithm level.

Also, every time we add multithreading to code, we make it more complex. In other words, it comes at a cost. I predict that some of the future performance gains don't come from making algorithms more threaded, but from changing data structures, reducing memory footprint and simple optimizations. As the price of more performance increases, efforts will be spent where most speed can be gained at the lowest price.

Just imagine how fast Commodore 64 basic would run on a modern CPU... and how slow Vista is.

Monday 10 March 2008

Never modify source code in weekends

I just released some code last saturday. What I didn't notice, was the TortoiseSVN inserted a localized date into the source code in a $LastChangedDate:$ text, even though I deliverately use it with English user interface.

In Danish, there are non-ascii characters in the weekday names for saturday (lørdag) and sunday (søndag), so this made the source code file become a non-ascii file, which basically broke it for some users. Today I rereleased the files, and because it's monday (mandag), the problem is gone :-)

I guess that was another lesson on how not to localize.

Saturday 8 March 2008

Delphi - the green choice for the environment

The internet is starting to use a significant part of the world's electrical power, and increasingly complex algorithms are driving our economy. Some blogs even have started to discuss software engineering and global warming.

As software engineers, our choices have impacts on the energy usage. Delphi/Win32 still uses 8-bit character sets, which is faster, uses less hardware and is therefore a greener choice than UTF-16 based platforms like Java or .net.

How important is this? It's not important, at all. Any improvement in a specific part of a program, that is usually irrelevant unless you improve it at least 10 times. What about rewriting your software for low energy devices? Also not a good choice, because the real environmental problem with computers is to produce them - so don't make your customers buy new computers and think you saved the world.

If you want to do something for the environment, remove your focus from technology and focus on how to solve end-user problems well. Much energy is wasted elsewhere because of bad software.

Thursday 21 February 2008

Tip of the day about localization (Windows Vista)

The tip of the day is: Don't translate filenames the wrong way.

Sunday 10 February 2008

How to comment source code

Teamwork is so much different than having 1 programmer write code alone. The reason is, that in a team, source code is read so many times, that readability is much more important than how long time it takes to write the code.

I just ran across this link, where the author says, that if everybody comments, "others" get as much out of it as yourself. However, the main problem is not how much the individual programmer gets out of it - it's the team performance, that is important.

In order to optimize team performance, all members of the team should be able to have the best possibilities to edit all parts of the source code, and that requires comments.

My experience shows, that good comments are characterized by:

* They are automatable (for documentation, localization etc.)
* They don't take significant time to read
* They do not repeat information that you can read easily from the source code (don't write "This is function xxx which takes parameters yyy")
* They include important design decisions, and explain problems that the author spent time on, while programming.
* They document purposes and examples (in case of unit testing and some other cases, examples can be omitted)
* They document a group of source code lines, and not each source code lines.

Thursday 7 February 2008

The flaw in the Agile manifesto

Most programmers are introverts, and that's why the Manifesto for Agile Software Development tells you to think like an extrovert. What about extrovert programmers? They actually need the manifesto reversed...

Wednesday 2 January 2008

Delphi networking on LinkedIn

In case you're a LinkedIn user, please note that Fabrizio Bitti has created a "Powered by Delphi" group.