Compas Pascal: 2007

Monday, 24 December 2007

How to get good ideas

Sometimes, a good idea can save a lot of time in programming. A simple change of the specifications can literally save more than half of the programming hours. But how do you get these good ideas?

I get good ideas, when I get fed with the right information in a situation where I have plenty of time to think about possibilities.

One way to get this information, is from another person, a partner. It depends on the situation, what kind of person is most suitable. For an introvert, trust is very important, so that the introvert thoughts are not disturbed by extroverted worrying about the situation.

In order to be a good partner in this, you must be prepared to accept, that the things you say are not always treated as truth. Only by uncovering incorrect perceptions of reality, you can create a new understanding which can provide business value.

A good partner doesn't say "That's wrong", but tries to explain, why she thinks it is wrong, and accepts the idea as good, until it has been proven otherwise.

Sometimes, you cannot find a good partner, either because your field of expertise is too advanced, people around you are not prepared to do it, or just because you're not in the mood to talk with other people. In these cases, you must trust your own intuition and develop your idea until the point, where you have enough documentation to persuade others easily.

A recent study in Danish companies shows, that almost all good ideas were not created at work. I guess there's room for improvement, here.

Merry Christmas!

Saturday, 22 December 2007

Soft hyphen in ISO-8859-1 and Unicode

I just came across this blog post about Unicode and ISO 8859-1 being unclear on how to show a soft hyphen.

The article contains links to other blog posts and documents about this topic.

I will not give a resume on the problem - read the articles if you're interested. However, I have a strong opinion on the topic: The character set standard should not define the application.

Sometimes you want to create an editor, that exactly shows the contents of a file, so that the user is able to see all bytes in the file precisely. And sometimes the editor has another purpose, like making it simple to create a sales brochure.

In the first case, a soft hyphen should be visible to the user. Think notepad... the character 0xAD should be clearly visible in notepad, no matter where you put it. In the second case, a soft hyphen character can be used to implement an application-specific soft hyphen functionality, where the hyphen is only shown when it makes sense according to the application's purpose.

Some of the articles even mention the use of soft hyphens in HTML. That's really out of scope, since HTML already redefines the layout of so much. It seems somebody has forgotten that the primary purpose of HTML is to render things differently.

Friday, 14 December 2007

Bundling the internet connection with software

Mobile phone service providers are now discussing to pay the mobile phone companies for data traffic generated by their online services.

What happens if the phone service provider doesn't pay? Is this the first step towards a situation, where the feature set of your phone is dictated by your phone service provider?

It has been hard to sell software for a long time, without bundling support, hardware or online services. However, this time we're moving towards a situation where the internet connection itself is bundled with functionality. Imagine that you could no longer use Google if you switched to another ISP...

Thursday, 13 December 2007

Don't use passwords. Use passphrases.

We still use passwords everywhere, and they're usually stored as hash values in the database of the service that we log into. I ran into this story about a guy, who looked up the md5 hash value on Google and this way reverse engineered a password. His conclusion is, not to use a password that anybody else on this planet may have used.

The reason that this is a problem, is that many users use the same passwords in multiple places, so if you know their password in one place, you can probably log into other services using that password. If you store all passwords as hash values, and you lose these hash values to people that may abuse them, it is important that they cannot get the original password from it. There are many ways to crack passwords, and lostpassword.com is a good site to know, if you want to know how easy it is to crack passwords.

But how fast can md5 hashes be cracked? Let's try to imagine that we produce all thinkable passwords and generate their md5 hashes, and then use the resulting list as a lookup table, sorted by md5 hash. Let's make a few presumptions:

The password is only using lowercase letters and digits, 36 different characters in total.
It is totally random.

Let's say the password has the length n. The md5 hash is 32 bytes, so each lookup item is size=32+n. There will be 36ⁿ records, using (32+n)×36ⁿ bytes of space. How long would it take to find a password for an md5 value? With binary lookup it would use c = log₂(36ⁿ) = n×log₂(36) = n×5 lookups. This is the space needed for various values of n, assuming that a lookup takes 20ms:

n=5 uses 2GB crack time: 100ms
n=6 uses 82GB crack time: 120ms
n=7 uses 3TB crack time: 140ms
n=8 uses 112TB crack time: 160ms
n=9 uses 4163TB crack time: 180ms
n=10 uses 1×10¹⁷ bytes
n=15 uses 1×10²⁵ bytes
n=20 uses 1×10³² bytes

You can buy 1TB drives today, so these are realistic amounts of storage up to n=10. If you want to use a good password, you should therefore ensure, that it's at least 10 characters, and if you want to be well protected, also in the future, go for at least 15 characters.

As you can see, these are bad passwords:

j4fsk2
this is fun
my dog ate my homework (somebody else probably used that, too)

These are good passwords:

slashdot8fischk (15 characters, spelling errors etc.)
roskilde/1997/annie (25 characters, but who is Annie and why Roskilde?)

It is a good thing that a long password can be typed very fast, so it usually needs to contain some real-life words, but make sure to pick some words that other's wouldn't use.

As a programmer, you can help your users make better passwords by providing more space to type the password. Usability research has shown, that this actually helps, although I cannot remember the source for that information. Some systems also use the word "passphrase" instead of password in order to encourage users to type more characters.

Wednesday, 12 December 2007

Delphi interfaces and implementation

One of the things that makes Delphi unique is the division of all source code files ("unit"s) into several sections: interface, implementation, initialization and finalization.

Everything that other units need to know, is put into the interface section at the top. This reduces the time it takes to understand how to use a file significantly - you don't have to scroll through implementation details. The interface section contains const, function declarations and types, but no statements. Class members are declared, but methods are not implemented here.

The implementation section contains the implementation of all the items from the interface section. The initialization section contains statements that should be run before the application starts, which are local to this unit, and the finalization section can clean up resources in a similar way.

Delphi's compiler contains many neat features to enable faster compilation. For instance, a change in the implementation section, or a change of a typed const in the interface section, will not enforce recompilation of other source code files. Only significant changes in the interface section will make other parts of the source code recompile.

You can benefit a lot from these sections, if you manage to keep the interface section small, making sure that it has few lines of code. If a file has one function, and nothing else, in its interface section, it is much easier to use than if you have a huge class type with lots of private and protected members in the interface section. Unfortunately, this also means that the full benefits are not achieved if you do OOP the traditional way. Sometimes, it even makes sense to write a small unit, which has a very simple interface section, but where all implementation is about making calls to another unit that has a very difficult interface section.

Example:

unit SimpleApi;

interface

function CalculateSomething (parameter:string):string;



implementation

uses
  SysUtils, ComplexApi;   // Which other source files are used/linked

function CalculateSomething (parameter:string):string;
var
  object:TComplexApi;
begin
  object:=TComplexApi.Create;
  try
    Result:=object.CalculateSomething (parameter);
  finally
    FreeAndNil (object);
  end;
end;


initialization
  // This is where you could write code to initialize
  // something for this unit at application start


finalization
  // This is where you could write code to clean up stuff
  // after the application has stopped running


end.

Friday, 7 December 2007

The teddy bear principle in programming

It's very simple: Put a teddy bear on your desk. When you have a programming problem, explain what you are doing to the teddy bear. Eureka is a likely outcome of this method.

Tuesday, 27 November 2007

CodeRage impressions

Having participated on a few sessions at CodeGear's online CodeRage conference, I can only say that this is a very good way to make conferences. There are lots of advantages over a traditional conference and the costs are much lower. You can chat with other attendees during a session, asking questions, you can leave the session without disturbing if it gets too boring, and you can do other work if the topic is easy for you. On the downside, you don't get those extra days in a geographically remote location and you don't get the beers afterwards, but I guess we'll see this evolve a lot in the coming years. Unfortunately, there's also the downside that CodeRage was arranged for U.S. time, meaning that it's after business hours in Denmark.

However, I can only recommend to sign up and participate in this kind of conferences.

Wednesday, 21 November 2007

Date and Time in programming

Sometimes I wonder if non-programmers know how complicated time is. Let's have a resume of the basics:

There are 23, 24 or 25 hours on one day (daylight saving time change days)
An hour has 60 minutes.
A minute has 60, 61 or 62 seconds (leap seconds).
A day has 1380, 1440 or 1500 minutes.
A week has 7 days.
A day has 82800, 86400, 86401, 86402 or 90000 seconds.
A month has 28, 29, 30 or 31 days. In some systems, a month is standardized to be 30 days.
A year has 365 or 366 days, but in some systems, it's standardized to be 360 days.
A year has 52 or 53 weeks.
Even though we have an ISO standard for weeks, end users don't agree on the starting weekday for a week.
Some dates don't exist, and for historical dates, the offset between different geographical regions was not about hours, but about days. The russian October Revolution actually happened in November, according to most European calendars of that time, but it was October in Russia.
This is all about the Christian calendar. There are other calendars out there...

Then there's local time and UTC time:

Local time deviates from UTC time in a number of hours, which can be fractional in some rare cases
For all practical purposes, GMT is the same as UTC, and GMT is not the local time of London (London uses GMT+1 in the summer).
UTC time offset is a function that takes the UTC time stamp and geographical location as parameter, and UTC time offset is often historically different for two different cities in the same country.
A time zone can be specifed using a GMT offset, just like time stamps.
A time zone can include several regions with different daylight saving rules.

This wouldn't be so complicated if we didn't have to make calculations based on this. Timestamps are usually stored these ways:

A floating point number indicating the number of days since a specific date at midnight
An integer or floating point number indicating the number of seconds since a specific date at midnight
Year, Month, Day, Hour, Minutes, Seconds as separate values

Variations of these may occur, for instance, a time stamp may be an integer number of milliseconds or even microseconds, instead of seconds, but it's still the same idea.

UTC time offsets are usually specified this way:

Number of hours difference between the local time and UTC, at the time of the timestamp
Geographic location (like 'Europe/Copenhagen')
Time zone (like 'CET')

Daylight saving is usually handled these ways:

The internal clock of the PC uses local time, and changes. If you use virtualization or dual-boot, you may risk that it changes twice, giving incorrect time.
The internal clock of the PC uses UTC time, and does not change.
Daylight saving time is usually handled using locally stored information, which may get outdated, so that the computer actually miscalculates the time by one hour.

Leap seconds are usually handled these ways:

They're implemented nicely, and the software needs to know about them.
They're not implemented, so the software doesn't need to know about them, but instead, the clock is adjusted, and the software needs to be capable of handling a clock that doesn't move for 1-2 seconds.

Clock precision:

Most PCs today have some kind of clock synchronization over the internet, which yields a sub-second precision. However, don't count on your clock to be 1ms precise.
PCs often have their clocks adjusted, so that you need to make sure that your software can survive a clock, that moves backwards.

Now, how do we calculate age? If you were born on february 29th 1980, and an election for parliament is held on february 28th 1998, are you allowed to vote? Probably not. What if something has to be done on a day, that may not be later than on your birthday? Then february 28th would be the last day. So you cannot use a GetBirthDay(BirthDate,Age) function for these two cases, since those two problems result in two different dates.

What about statistics? Here you have a lot of other problems:

Total numbers per month don't make sense for February, which changes it's length every 4 years.
Numbers per day, for a month usually don't make sense either, because the number of weekend days in a month is varying, and numbers often depend on weekdays.

Friday, 16 November 2007

How to get girls into programming

The Sun tools team have blogged about the perils of abstraction. They say something like "we need to stop thinking abstract everything".

I don't know where they got this from - not all programmers try to abstract everything. Some programmers hate abstraction, and love the detail, and no, they are not unintelligent. In fact, some of these guys and girls can be brilliant programmers, creating much more user friendly applications that users love.

As every psychologist will know, humans' brains are not wired the same way. We have strong preferences for ways of thinking, and the same information is not handled the same way in different brains. If you could have two identical people with different brain wirings but the same knowledge, and you put them into exactly the same situation, they would extract different knowledge from that situation.

The masterminds behind software architecture often favor abstract thinking over details. They are good at spotting abstract information, creating abstract knowledge from experiences etc., but they usually don't put much value into minor details, like "it looks ugly" or "that's not what the customer said". If you put 5 abstract-thinking people together in a team, you will get a result that is abstract and possibly horrible.

If you want a well designed product, architecture, specs etc., you need to involve people with different brain skills. Psychologists say, that our sexes have different brain skills (T/F) that relate to exactly this problem.

I believe the biggest problem in IT is the lack of product quality, and not the lack of girls. But I do believe that these two problems are closely related, and solved using the same management techniques.

Thursday, 15 November 2007

Flash RAM instead of harddisks

Flash drives have become larger and less expensive, and it doesn't take a lot of experimenting to find out, that a laptop can become faster, quieter, more robust and have a longer battery life, if you replace your harddisk with flash RAM. And then there's the fact, that good quality flash RAM systems outlive even very expensive harddisks easily.

What do flash drives mean for software developers? Here are a couple of consequences:

When multiple threads compete for disk access, responsiveness will benefit greatly. This means that background threads that access disks will be more likely.
Disk space becomes more expensive for a while, favoring apps that don't waste space too much.
It becomes less necessary to prefetch small amounts of data from the local disk. For some applications, this can reduce RAM usage.
Reduced seek time means that different file formats may become optimal. This includes different ways of indexing, but it may also mean less redundancy in file formats.

Because of the huge benefits of flash RAM, less disk space will be considered acceptable, and often, this makes it realistic to have more RAM in the PC, than there is flash RAM. This makes it obvious to cache everything on the flash - and it can be cached by the OS file system or by the application.

Friday, 9 November 2007

The price of using GUIDs in databases

There has been some discussion about the use of GUIDs lately. A GUID is a 128-bit integer that is picked randomly, and that is obviously a good thing, if your database needs more than 2⁶⁴=18×10¹⁸ records, but because it is 128 bit, you can be quite sure that this random number has not already been used somewhere else. The difference between an autoincrementing 128 bit integer and GUID is, that GUID values are always picked randomly.

It makes sense to apply GUIDs when:

No specific order is required
128 bit is not considered a waste of space
A very small chance of not succeeding to pick a unique number is ok
Values cannot be produced in one place, or having no specific order is a feature

Microsoft recommends to use GUIDs as primary key because it enables replication between different databases. When you do that, the chance of having conflicts is very small - for instance, two databases with each 1 billion records, can have these merged easily, and the chance of primary key conflict is only 10⁹×(10⁹/10³⁸)=10^-20.

However, it comes at a price. There is a big chance, that two records, that are added shortly after each other, are related. For instance, if you want to save an invoice, there may be 5 records that describe items on that invoice, which are added as part of the same transaction. If a database server uses autoincrementing integer values as primary key, and fully or partially physically sorts records by this primary key, these 5 records will probably go into 1 or 2 places on the harddisk. If GUIDs were used, they would be stored in 5 different places on the harddisk. This is one of the reasons why GUID-based databases are usually on servers that have more RAM than they have data - they need to cache everything.

Another price comes when debugging. You need more IQ to debug code than to write code, so it is important that you optimize for debugging. It must be easy to see, that the data stored in the database is correct. GUIDs are not always the easiest key to read, especially not in developer databases, that tend to have very little data, and therefore very small numbers in autoincrementing integer fields.

Friday, 2 November 2007

Best Practice in Software Development

IBM has a nice page on Best Practice in software development. It's amazing what such a page doesn't list. For instance, UML is the only method listed for design, even though there are alternatives and UML has known caveats.

It also mentions "Keep it simple" and "Information hiding" as some of the most important principles. I totally disagree. I consider "Make complex things easy to use" as the most important principle. It is ok for things to be complex, and it is ok not to hide information, but it is unforgivable to create something that is too complicated for others to use. A software developer should spend most of his/her time on making complex things easier to use for others.

Best Practice methods require preconditions and they are absent, too. There are different kinds of software development projects, different kinds of project teams, and they require different methods. There's a huge difference between developing control software for a moon rocket, developing search algorithms or creating user interfaces for database applications. Unfortunately, it seems that most attempts to define Best Practice forget about preconditions.

Saturday, 27 October 2007

What Delphi needs

This is my current wish list for Delphi

Platforms:
* Unicode everywhere, using utf-8 or utf-16, but not using the current widestring implementation
* SilverLight, Flash or something like it for thin-client apps using client-side and server-side code, using asp.net as server platform

Language features:
* More focus on units, less focus on classes
* Automatic but deterministic destruction and deallocation of objects
* Compile-time option that makes cyclic unit dependencies illegal
* GNU gettext for Delphi as internationalization system
* Compiler should assist the programmer in ensuring full multithreading capability of parts of code, by looking at dependencies, and by having language constructs that indicate multithreading capabilities in source code that looks as if it doesn't have it.

IDE features:
* Project-specific packages with relative paths, enabling easy branching of packages.

Wednesday, 24 October 2007

Back from today's DAPUG meeting

I've just arrived back from today's DAPUG meeting (the Danish CodeGear user group). It was very good - focus was on telling news from various conferences, QA/QC and various development techniques. It is clear that CodeGear is gaining momentum again and regaining their competitive edge, and Delphi 2007 has very positive feedback from those, who have upgraded. Also, it was obvious, that we had some topics, on which we can spend some more time, so I hope that we will see another DAPUG meeting in january.

Sunday, 21 October 2007

Microsoft Tech Fest in Denmark is also for Delphi businesses

Microsoft has traditionally bought companies that use Microsoft technology, but according to this interview, Ballmer said "We will do some buying of companies that are built around open-source products". So I guess they're not restricted in any way to Microsoft development tools. If you have an innovative company with buyout as possible exit strategy, the Microsoft Tech Fest may be for you.

Saturday, 20 October 2007

Multi core CPUs - what does it mean to Delphi?

It seems as if the entire IT industry agrees, that more performance in PCs will be achieved by using more cores in CPUs. Is it true, and what are the implications for Delphi programmers?

Let's start with the assumption about multiple cores. In order to match CPU I/O count performance in a modern PC, you would have to make more than 50 harddisks work in parallel. RAM also has its performance limits, especially as core count grows, so multiple cores are really more about CPU-bound calculations than about anything else. The harddisk problem can be solved by using network attached storage with arrays of harddisks, and the RAM limits can be solved by letting each CPU have its own RAM (like NUMA or other architectures). These solutions are obviously good for servers, which have many concurrent requests, where each request can be served by one thread. This doesn't have much influence on how we write source code, and Delphi does the job very well.

If you want greater performance on a desktop PC, we're usually talking games and simulations or data handling software (database applications). I'm not much into games and simulations and will focus on client/server database applications. The typical performance problems with database applications are about data retrieval over the network, sorting, lookups and filtering. If you have a performance problem in such an application, you will usually not want it to be 2 times faster, but 10 times faster. Given the architectural problems in a desktop PC, this is not possible by using multithreading.

What is needed, is to minimize the time from a user action until something happens. You can do this by caching data, doing things in the background, and by preparing data for user actions that you expect to happen. Multithreading is a very good tool for doing this, so enabling multithreading in applications is something we need - but it's not because the CPUs are going to be multi-core. There are many ways that CodeGear can make Delphi support these techniques, either using technologies like MIDAS, or by modifying the language slightly. But there is no reason to fundamentally change the programming language.

Friday, 19 October 2007

CMMI, Six Sigma, Agile, ...

When programmers are told to spend time on various programming techniques and software development management techniques, it removes focus from the actual programming. Managers tend to impose too many techniques and methods, and programmers try not to spend too much time on them.

One of the problems is, that the brain has a limited capacity. If you're doing something difficult, it doesn't help you if somebody tells you to do it in a way that's even more difficult. You may think you have to compromise: Do you want to use brainpower on programming techniques or on programming?

The answer is different: You want to spend your team's brains on solving the customer's problems, and reduce the amount of brainpower needed to do programming and programming techniques.

Wednesday, 17 October 2007

How to become a manager

I meet a lot of programmers who want to become managers. However, their motive is not good - they want to become managers because they have specific ideas they want to do in programming.

If you want to become a good manager, then you need to make sure first, that your are genuinely interested in administrative work. Your value will be measured by the work of other programmers, so your own personal programming skills are really not that important, as long as you are able to keep the team capable and focused on what other parts of your company think is valuable.

Monday, 15 October 2007

Which character set?

There are many reasons to choose a particular character set. Some believe that choosing widestring everywhere solves all problems. It doesn't. Widestring is slow because it needs to be Microsoft BSTR compatible, and it's also complicated. A widestring can contain UCS-2 encoding and UTF-16 encoding, and with UTF-16, you can have 4 bytes per character.

One of the problems is the Unicode standard. It allows a character to be built in more than one way. There is no one-to-one match between the binary representation and the look of a character. Therefore, if you want to do Unicode, text handling becomes complicated, no matter what you do.

A good way to choose a character set, is by performance and compatibility. How much space does it use, and is it compatible with other software systems. UTF-8 uses very little space if most characters are ASCII characters. If you add some ISO-8859-1 characters, like in many west european languages, it's still very compact. It only gets less optimal than other character sets, if you want to encode chinese, japanese and other languages that don't use ASCII characters a lot.

UTF-8 is also very compatible with other systems. It's a defacto standard for XML files, it only exists in one version (unlike UTF-16 with exists as UTF-16BE and UTF-16LE), and it encodes 31 bit, much more than UTF-16's 20 bit. UTF-8 is also compatible with zip filename encoding (unlike UTF-16 and UCS-2 which is not), and UTF-8 texts can be handled by many applications that were not originally designed to do so.

Linux already installs with UTF-8 as default, for most distributions and locales. This makes it possible to zip files in Moscow, send them to Copenhagen, unzip them, and all filenames are preserved. This doesn't work on Windows.

Delphi, being a Windows tool, uses Windows 8-bit and 16-bit character sets by default, in ansistrings and widestrings. There's also an utf8string, but it's actually the same as an ansistring. You can convert from widestring to utf8 and back using utf8encode() and utf8decode().

If you store and transmit unicode information using utf-8, most of you will experience a reduction in space usage and a reduction in transmission time.

One of the very nice features of utf-8 is the ability to be autodetected. Utf-8 does not allow all possible bit combinations, and the bit combinations that are being used, are usually extremely unlikely in other 8-bit encodings. For 99% of all applications, it is safe to apply autodetection to utf-8.

Saturday, 13 October 2007

Why Delphi?

I have seen many explanations on why to use a specific software development tool, and I was actually planning to write a post on why we have chosen Delphi as our main software development tool. There are many good reasons to list, however, after attending a number of conferences, and hearing about several companies who switched tools years into the project, from C++ to .net, from .net to Java and others, I need to write it in a different way.

The choice of tool depends on the problem that you try to solve. You need to ask yourself, what the problem is. If you're doing commercial software development, the problem is usually related to money. You need to ask yourself: Which tool provides most value to the company, in the short run, medium term and in the long run.

That question seems easy, but it isn't. If this question is answered without reading the business plan or involving business strategies, you made a mistake. If the customer's weren't involved in providing information to make the decision, you made a mistake. And once you made a decision, make sure to test this using a Devil's advocate, which is not a software developer. It doesn't need to take a lot of time, but it needs to be done right.

I have one important tip: before evaluating how a tool matches your problems, analyze the tool's features, and sort them into the categories: Critical for solving this problem / nice to have / don't use. The last one is important: it should contain all the features that are in conflict with your business plan, for instance by locking into something you don't want to be locked into, or by storing vital business information in bad ways.

Wednesday, 10 October 2007

Nohau QA/QC seminar review

I just came back from Nohau's seminar about QA/QC, and it was a good experience. It's nice to attend a seminar, that's free and not just a presentation of software, but also tries to raise the general knowledge level of the audience. If you haven't introduced QA/QC measures in your organization, yet, you should seriously be considering doing that.

I noticed, that most speakers were asserting that good quality requires the use of agile software development techniques. It's nice to see the software development business maturing.