Sometimes, a good idea can save a lot of time in programming. A simple change of the specifications can literally save more than half of the programming hours. But how do you get these good ideas?
I get good ideas, when I get fed with the right information in a situation where I have plenty of time to think about possibilities.
One way to get this information, is from another person, a partner. It depends on the situation, what kind of person is most suitable. For an introvert, trust is very important, so that the introvert thoughts are not disturbed by extroverted worrying about the situation.
In order to be a good partner in this, you must be prepared to accept, that the things you say are not always treated as truth. Only by uncovering incorrect perceptions of reality, you can create a new understanding which can provide business value.
A good partner doesn't say "That's wrong", but tries to explain, why she thinks it is wrong, and accepts the idea as good, until it has been proven otherwise.
Sometimes, you cannot find a good partner, either because your field of expertise is too advanced, people around you are not prepared to do it, or just because you're not in the mood to talk with other people. In these cases, you must trust your own intuition and develop your idea until the point, where you have enough documentation to persuade others easily.
A recent study in Danish companies shows, that almost all good ideas were not created at work. I guess there's room for improvement, here.
Merry Christmas!
Monday, 24 December 2007
Saturday, 22 December 2007
Soft hyphen in ISO-8859-1 and Unicode
I just came across this blog post about Unicode and ISO 8859-1 being unclear on how to show a soft hyphen.
The article contains links to other blog posts and documents about this topic.
I will not give a resume on the problem - read the articles if you're interested. However, I have a strong opinion on the topic: The character set standard should not define the application.
Sometimes you want to create an editor, that exactly shows the contents of a file, so that the user is able to see all bytes in the file precisely. And sometimes the editor has another purpose, like making it simple to create a sales brochure.
In the first case, a soft hyphen should be visible to the user. Think notepad... the character 0xAD should be clearly visible in notepad, no matter where you put it. In the second case, a soft hyphen character can be used to implement an application-specific soft hyphen functionality, where the hyphen is only shown when it makes sense according to the application's purpose.
Some of the articles even mention the use of soft hyphens in HTML. That's really out of scope, since HTML already redefines the layout of so much. It seems somebody has forgotten that the primary purpose of HTML is to render things differently.
The article contains links to other blog posts and documents about this topic.
I will not give a resume on the problem - read the articles if you're interested. However, I have a strong opinion on the topic: The character set standard should not define the application.
Sometimes you want to create an editor, that exactly shows the contents of a file, so that the user is able to see all bytes in the file precisely. And sometimes the editor has another purpose, like making it simple to create a sales brochure.
In the first case, a soft hyphen should be visible to the user. Think notepad... the character 0xAD should be clearly visible in notepad, no matter where you put it. In the second case, a soft hyphen character can be used to implement an application-specific soft hyphen functionality, where the hyphen is only shown when it makes sense according to the application's purpose.
Some of the articles even mention the use of soft hyphens in HTML. That's really out of scope, since HTML already redefines the layout of so much. It seems somebody has forgotten that the primary purpose of HTML is to render things differently.
Friday, 14 December 2007
Bundling the internet connection with software
Mobile phone service providers are now discussing to pay the mobile phone companies for data traffic generated by their online services.
What happens if the phone service provider doesn't pay? Is this the first step towards a situation, where the feature set of your phone is dictated by your phone service provider?
It has been hard to sell software for a long time, without bundling support, hardware or online services. However, this time we're moving towards a situation where the internet connection itself is bundled with functionality. Imagine that you could no longer use Google if you switched to another ISP...
What happens if the phone service provider doesn't pay? Is this the first step towards a situation, where the feature set of your phone is dictated by your phone service provider?
It has been hard to sell software for a long time, without bundling support, hardware or online services. However, this time we're moving towards a situation where the internet connection itself is bundled with functionality. Imagine that you could no longer use Google if you switched to another ISP...
Thursday, 13 December 2007
Don't use passwords. Use passphrases.
We still use passwords everywhere, and they're usually stored as hash values in the database of the service that we log into. I ran into this story about a guy, who looked up the md5 hash value on Google and this way reverse engineered a password. His conclusion is, not to use a password that anybody else on this planet may have used.
The reason that this is a problem, is that many users use the same passwords in multiple places, so if you know their password in one place, you can probably log into other services using that password. If you store all passwords as hash values, and you lose these hash values to people that may abuse them, it is important that they cannot get the original password from it. There are many ways to crack passwords, and lostpassword.com is a good site to know, if you want to know how easy it is to crack passwords.
But how fast can md5 hashes be cracked? Let's try to imagine that we produce all thinkable passwords and generate their md5 hashes, and then use the resulting list as a lookup table, sorted by md5 hash. Let's make a few presumptions:
As you can see, these are bad passwords:
As a programmer, you can help your users make better passwords by providing more space to type the password. Usability research has shown, that this actually helps, although I cannot remember the source for that information. Some systems also use the word "passphrase" instead of password in order to encourage users to type more characters.
The reason that this is a problem, is that many users use the same passwords in multiple places, so if you know their password in one place, you can probably log into other services using that password. If you store all passwords as hash values, and you lose these hash values to people that may abuse them, it is important that they cannot get the original password from it. There are many ways to crack passwords, and lostpassword.com is a good site to know, if you want to know how easy it is to crack passwords.
But how fast can md5 hashes be cracked? Let's try to imagine that we produce all thinkable passwords and generate their md5 hashes, and then use the resulting list as a lookup table, sorted by md5 hash. Let's make a few presumptions:
- The password is only using lowercase letters and digits, 36 different characters in total.
- It is totally random.
- n=5 uses 2GB crack time: 100ms
- n=6 uses 82GB crack time: 120ms
- n=7 uses 3TB crack time: 140ms
- n=8 uses 112TB crack time: 160ms
- n=9 uses 4163TB crack time: 180ms
- n=10 uses 1×1017 bytes
- n=15 uses 1×1025 bytes
- n=20 uses 1×1032 bytes
As you can see, these are bad passwords:
- j4fsk2
- this is fun
- my dog ate my homework (somebody else probably used that, too)
- slashdot8fischk (15 characters, spelling errors etc.)
- roskilde/1997/annie (25 characters, but who is Annie and why Roskilde?)
As a programmer, you can help your users make better passwords by providing more space to type the password. Usability research has shown, that this actually helps, although I cannot remember the source for that information. Some systems also use the word "passphrase" instead of password in order to encourage users to type more characters.
Wednesday, 12 December 2007
Delphi interfaces and implementation
One of the things that makes Delphi unique is the division of all source code files ("unit"s) into several sections: interface, implementation, initialization and finalization.
Everything that other units need to know, is put into the interface section at the top. This reduces the time it takes to understand how to use a file significantly - you don't have to scroll through implementation details. The interface section contains const, function declarations and types, but no statements. Class members are declared, but methods are not implemented here.
The implementation section contains the implementation of all the items from the interface section. The initialization section contains statements that should be run before the application starts, which are local to this unit, and the finalization section can clean up resources in a similar way.
Delphi's compiler contains many neat features to enable faster compilation. For instance, a change in the implementation section, or a change of a typed const in the interface section, will not enforce recompilation of other source code files. Only significant changes in the interface section will make other parts of the source code recompile.
You can benefit a lot from these sections, if you manage to keep the interface section small, making sure that it has few lines of code. If a file has one function, and nothing else, in its interface section, it is much easier to use than if you have a huge class type with lots of private and protected members in the interface section. Unfortunately, this also means that the full benefits are not achieved if you do OOP the traditional way. Sometimes, it even makes sense to write a small unit, which has a very simple interface section, but where all implementation is about making calls to another unit that has a very difficult interface section.
Example:
Everything that other units need to know, is put into the interface section at the top. This reduces the time it takes to understand how to use a file significantly - you don't have to scroll through implementation details. The interface section contains const, function declarations and types, but no statements. Class members are declared, but methods are not implemented here.
The implementation section contains the implementation of all the items from the interface section. The initialization section contains statements that should be run before the application starts, which are local to this unit, and the finalization section can clean up resources in a similar way.
Delphi's compiler contains many neat features to enable faster compilation. For instance, a change in the implementation section, or a change of a typed const in the interface section, will not enforce recompilation of other source code files. Only significant changes in the interface section will make other parts of the source code recompile.
You can benefit a lot from these sections, if you manage to keep the interface section small, making sure that it has few lines of code. If a file has one function, and nothing else, in its interface section, it is much easier to use than if you have a huge class type with lots of private and protected members in the interface section. Unfortunately, this also means that the full benefits are not achieved if you do OOP the traditional way. Sometimes, it even makes sense to write a small unit, which has a very simple interface section, but where all implementation is about making calls to another unit that has a very difficult interface section.
Example:
unit SimpleApi;
interface
function CalculateSomething (parameter:string):string;
implementation
uses
SysUtils, ComplexApi; // Which other source files are used/linked
function CalculateSomething (parameter:string):string;
var
object:TComplexApi;
begin
object:=TComplexApi.Create;
try
Result:=object.CalculateSomething (parameter);
finally
FreeAndNil (object);
end;
end;
initialization
// This is where you could write code to initialize
// something for this unit at application start
finalization
// This is where you could write code to clean up stuff
// after the application has stopped running
end.
Friday, 7 December 2007
The teddy bear principle in programming
It's very simple: Put a teddy bear on your desk. When you have a programming problem, explain what you are doing to the teddy bear. Eureka is a likely outcome of this method.
Tuesday, 27 November 2007
CodeRage impressions
Having participated on a few sessions at CodeGear's online CodeRage conference, I can only say that this is a very good way to make conferences. There are lots of advantages over a traditional conference and the costs are much lower. You can chat with other attendees during a session, asking questions, you can leave the session without disturbing if it gets too boring, and you can do other work if the topic is easy for you. On the downside, you don't get those extra days in a geographically remote location and you don't get the beers afterwards, but I guess we'll see this evolve a lot in the coming years. Unfortunately, there's also the downside that CodeRage was arranged for U.S. time, meaning that it's after business hours in Denmark.
However, I can only recommend to sign up and participate in this kind of conferences.
However, I can only recommend to sign up and participate in this kind of conferences.
Wednesday, 21 November 2007
Date and Time in programming
Sometimes I wonder if non-programmers know how complicated time is. Let's have a resume of the basics:
UTC time offsets are usually specified this way:
What about statistics? Here you have a lot of other problems:
In Windows, the standard functions to convert between local time and UTC time do not handle historical timestamps well, but documentation isn't good at explaining that. On Windows, you should always make days 24 hours and 86400 seconds, and always use local time, unless you really know what you're doing.
Here are some examples of bad time related functionality:
Linux works very differently - it counts seconds everywhere and uses geographic locations for GMT offset calculcations. This works extremely well, and Linux only converts to year/month/day representations when interacting with the user. However, even though linux can support leap seconds, most apps probably won't work well if you enable it.
This blog post doesn't cover all the kinds of trouble we programmers face, there's much more. My advice is to try to keep things simple and prepare for the worst.
- There are 23, 24 or 25 hours on one day (daylight saving time change days)
- An hour has 60 minutes.
- A minute has 60, 61 or 62 seconds (leap seconds).
- A day has 1380, 1440 or 1500 minutes.
- A week has 7 days.
- A day has 82800, 86400, 86401, 86402 or 90000 seconds.
- A month has 28, 29, 30 or 31 days. In some systems, a month is standardized to be 30 days.
- A year has 365 or 366 days, but in some systems, it's standardized to be 360 days.
- A year has 52 or 53 weeks.
- Even though we have an ISO standard for weeks, end users don't agree on the starting weekday for a week.
- Some dates don't exist, and for historical dates, the offset between different geographical regions was not about hours, but about days. The russian October Revolution actually happened in November, according to most European calendars of that time, but it was October in Russia.
- This is all about the Christian calendar. There are other calendars out there...
- Local time deviates from UTC time in a number of hours, which can be fractional in some rare cases
- For all practical purposes, GMT is the same as UTC, and GMT is not the local time of London (London uses GMT+1 in the summer).
- UTC time offset is a function that takes the UTC time stamp and geographical location as parameter, and UTC time offset is often historically different for two different cities in the same country.
- A time zone can be specifed using a GMT offset, just like time stamps.
- A time zone can include several regions with different daylight saving rules.
- A floating point number indicating the number of days since a specific date at midnight
- An integer or floating point number indicating the number of seconds since a specific date at midnight
- Year, Month, Day, Hour, Minutes, Seconds as separate values
UTC time offsets are usually specified this way:
- Number of hours difference between the local time and UTC, at the time of the timestamp
- Geographic location (like 'Europe/Copenhagen')
- Time zone (like 'CET')
- The internal clock of the PC uses local time, and changes. If you use virtualization or dual-boot, you may risk that it changes twice, giving incorrect time.
- The internal clock of the PC uses UTC time, and does not change.
- Daylight saving time is usually handled using locally stored information, which may get outdated, so that the computer actually miscalculates the time by one hour.
- They're implemented nicely, and the software needs to know about them.
- They're not implemented, so the software doesn't need to know about them, but instead, the clock is adjusted, and the software needs to be capable of handling a clock that doesn't move for 1-2 seconds.
- Most PCs today have some kind of clock synchronization over the internet, which yields a sub-second precision. However, don't count on your clock to be 1ms precise.
- PCs often have their clocks adjusted, so that you need to make sure that your software can survive a clock, that moves backwards.
What about statistics? Here you have a lot of other problems:
- Total numbers per month don't make sense for February, which changes it's length every 4 years.
- Numbers per day, for a month usually don't make sense either, because the number of weekend days in a month is varying, and numbers often depend on weekdays.
- Comparing. If you have two timestamps from different sources, you need to define many things to be able to say timestamp1=timestamp2.
- Round-off errors often mean, that timestamp1+timeinterval-timeinterval<>timestamp1. When you have deadlines, it can become very tricky to decide, if a deadline has been reached, or not.
- Uniqueness of timestamps: Some programmers want to use timestamps as primary key. Some database systems even support that, but what happens when you transport these timestamps to other parts of your software, will they still be unique?
- Many people don't understand the difference between GMT offsets for time zones and time stamps. Example: In Denmark, which is located in the GMT+1 time zone, we use GMT+2 time stamps during the summer.
- Terms like "CET" have many meanings, depending on who you ask. It can be the time in Germany in winter, it can be the current time in Germany, and it can describe the time zone which includes France and Germany, which did not have the same daylight saving time rules 30 years ago. In case of the "CET time zone", a historical timestamp may be useless without knowing if it applies to a location in France or Germany.
In Windows, the standard functions to convert between local time and UTC time do not handle historical timestamps well, but documentation isn't good at explaining that. On Windows, you should always make days 24 hours and 86400 seconds, and always use local time, unless you really know what you're doing.
Here are some examples of bad time related functionality:
- DateTime.IsDaylightSavingTime Method - The text says "Indicates whether this instance of DateTime is within the Daylight Saving Time range for the current time zone.", but what if there are two different daylight saving time ranges for the time zone? The problem is, that the documentation sets time zone = daylight saving time rule set.
- GetDynamicTimeZoneInformation - The information returned by this information can be invalid just after returning. This function basically doesn't make sense, it should have taken a time stamp as parameter.
- TimeZone.GetUtcOffset Method - It calculates the UTC offset from the local time. However, once a year, the same local time repeats itself for one hour, with two different UTC offsets. So this method doesn't make sense.
Linux works very differently - it counts seconds everywhere and uses geographic locations for GMT offset calculcations. This works extremely well, and Linux only converts to year/month/day representations when interacting with the user. However, even though linux can support leap seconds, most apps probably won't work well if you enable it.
This blog post doesn't cover all the kinds of trouble we programmers face, there's much more. My advice is to try to keep things simple and prepare for the worst.
Friday, 16 November 2007
How to get girls into programming
The Sun tools team have blogged about the perils of abstraction. They say something like "we need to stop thinking abstract everything".
I don't know where they got this from - not all programmers try to abstract everything. Some programmers hate abstraction, and love the detail, and no, they are not unintelligent. In fact, some of these guys and girls can be brilliant programmers, creating much more user friendly applications that users love.
As every psychologist will know, humans' brains are not wired the same way. We have strong preferences for ways of thinking, and the same information is not handled the same way in different brains. If you could have two identical people with different brain wirings but the same knowledge, and you put them into exactly the same situation, they would extract different knowledge from that situation.
The masterminds behind software architecture often favor abstract thinking over details. They are good at spotting abstract information, creating abstract knowledge from experiences etc., but they usually don't put much value into minor details, like "it looks ugly" or "that's not what the customer said". If you put 5 abstract-thinking people together in a team, you will get a result that is abstract and possibly horrible.
If you want a well designed product, architecture, specs etc., you need to involve people with different brain skills. Psychologists say, that our sexes have different brain skills (T/F) that relate to exactly this problem.
I believe the biggest problem in IT is the lack of product quality, and not the lack of girls. But I do believe that these two problems are closely related, and solved using the same management techniques.
I don't know where they got this from - not all programmers try to abstract everything. Some programmers hate abstraction, and love the detail, and no, they are not unintelligent. In fact, some of these guys and girls can be brilliant programmers, creating much more user friendly applications that users love.
As every psychologist will know, humans' brains are not wired the same way. We have strong preferences for ways of thinking, and the same information is not handled the same way in different brains. If you could have two identical people with different brain wirings but the same knowledge, and you put them into exactly the same situation, they would extract different knowledge from that situation.
The masterminds behind software architecture often favor abstract thinking over details. They are good at spotting abstract information, creating abstract knowledge from experiences etc., but they usually don't put much value into minor details, like "it looks ugly" or "that's not what the customer said". If you put 5 abstract-thinking people together in a team, you will get a result that is abstract and possibly horrible.
If you want a well designed product, architecture, specs etc., you need to involve people with different brain skills. Psychologists say, that our sexes have different brain skills (T/F) that relate to exactly this problem.
I believe the biggest problem in IT is the lack of product quality, and not the lack of girls. But I do believe that these two problems are closely related, and solved using the same management techniques.
Thursday, 15 November 2007
Flash RAM instead of harddisks
Flash drives have become larger and less expensive, and it doesn't take a lot of experimenting to find out, that a laptop can become faster, quieter, more robust and have a longer battery life, if you replace your harddisk with flash RAM. And then there's the fact, that good quality flash RAM systems outlive even very expensive harddisks easily.
What do flash drives mean for software developers? Here are a couple of consequences:
What do flash drives mean for software developers? Here are a couple of consequences:
- When multiple threads compete for disk access, responsiveness will benefit greatly. This means that background threads that access disks will be more likely.
- Disk space becomes more expensive for a while, favoring apps that don't waste space too much.
- It becomes less necessary to prefetch small amounts of data from the local disk. For some applications, this can reduce RAM usage.
- Reduced seek time means that different file formats may become optimal. This includes different ways of indexing, but it may also mean less redundancy in file formats.
Friday, 9 November 2007
The price of using GUIDs in databases
There has been some discussion about the use of GUIDs lately. A GUID is a 128-bit integer that is picked randomly, and that is obviously a good thing, if your database needs more than 264=18×1018 records, but because it is 128 bit, you can be quite sure that this random number has not already been used somewhere else. The difference between an autoincrementing 128 bit integer and GUID is, that GUID values are always picked randomly.
It makes sense to apply GUIDs when:
However, it comes at a price. There is a big chance, that two records, that are added shortly after each other, are related. For instance, if you want to save an invoice, there may be 5 records that describe items on that invoice, which are added as part of the same transaction. If a database server uses autoincrementing integer values as primary key, and fully or partially physically sorts records by this primary key, these 5 records will probably go into 1 or 2 places on the harddisk. If GUIDs were used, they would be stored in 5 different places on the harddisk. This is one of the reasons why GUID-based databases are usually on servers that have more RAM than they have data - they need to cache everything.
Another price comes when debugging. You need more IQ to debug code than to write code, so it is important that you optimize for debugging. It must be easy to see, that the data stored in the database is correct. GUIDs are not always the easiest key to read, especially not in developer databases, that tend to have very little data, and therefore very small numbers in autoincrementing integer fields.
It makes sense to apply GUIDs when:
- No specific order is required
- 128 bit is not considered a waste of space
- A very small chance of not succeeding to pick a unique number is ok
- Values cannot be produced in one place, or having no specific order is a feature
However, it comes at a price. There is a big chance, that two records, that are added shortly after each other, are related. For instance, if you want to save an invoice, there may be 5 records that describe items on that invoice, which are added as part of the same transaction. If a database server uses autoincrementing integer values as primary key, and fully or partially physically sorts records by this primary key, these 5 records will probably go into 1 or 2 places on the harddisk. If GUIDs were used, they would be stored in 5 different places on the harddisk. This is one of the reasons why GUID-based databases are usually on servers that have more RAM than they have data - they need to cache everything.
Another price comes when debugging. You need more IQ to debug code than to write code, so it is important that you optimize for debugging. It must be easy to see, that the data stored in the database is correct. GUIDs are not always the easiest key to read, especially not in developer databases, that tend to have very little data, and therefore very small numbers in autoincrementing integer fields.
Friday, 2 November 2007
Best Practice in Software Development
IBM has a nice page on Best Practice in software development. It's amazing what such a page doesn't list. For instance, UML is the only method listed for design, even though there are alternatives and UML has known caveats.
It also mentions "Keep it simple" and "Information hiding" as some of the most important principles. I totally disagree. I consider "Make complex things easy to use" as the most important principle. It is ok for things to be complex, and it is ok not to hide information, but it is unforgivable to create something that is too complicated for others to use. A software developer should spend most of his/her time on making complex things easier to use for others.
Best Practice methods require preconditions and they are absent, too. There are different kinds of software development projects, different kinds of project teams, and they require different methods. There's a huge difference between developing control software for a moon rocket, developing search algorithms or creating user interfaces for database applications. Unfortunately, it seems that most attempts to define Best Practice forget about preconditions.
It also mentions "Keep it simple" and "Information hiding" as some of the most important principles. I totally disagree. I consider "Make complex things easy to use" as the most important principle. It is ok for things to be complex, and it is ok not to hide information, but it is unforgivable to create something that is too complicated for others to use. A software developer should spend most of his/her time on making complex things easier to use for others.
Best Practice methods require preconditions and they are absent, too. There are different kinds of software development projects, different kinds of project teams, and they require different methods. There's a huge difference between developing control software for a moon rocket, developing search algorithms or creating user interfaces for database applications. Unfortunately, it seems that most attempts to define Best Practice forget about preconditions.
Saturday, 27 October 2007
What Delphi needs
This is my current wish list for Delphi
Platforms:
* Unicode everywhere, using utf-8 or utf-16, but not using the current widestring implementation
* SilverLight, Flash or something like it for thin-client apps using client-side and server-side code, using asp.net as server platform
Language features:
* More focus on units, less focus on classes
* Automatic but deterministic destruction and deallocation of objects
* Compile-time option that makes cyclic unit dependencies illegal
* GNU gettext for Delphi as internationalization system
* Compiler should assist the programmer in ensuring full multithreading capability of parts of code, by looking at dependencies, and by having language constructs that indicate multithreading capabilities in source code that looks as if it doesn't have it.
IDE features:
* Project-specific packages with relative paths, enabling easy branching of packages.
Platforms:
* Unicode everywhere, using utf-8 or utf-16, but not using the current widestring implementation
* SilverLight, Flash or something like it for thin-client apps using client-side and server-side code, using asp.net as server platform
Language features:
* More focus on units, less focus on classes
* Automatic but deterministic destruction and deallocation of objects
* Compile-time option that makes cyclic unit dependencies illegal
* GNU gettext for Delphi as internationalization system
* Compiler should assist the programmer in ensuring full multithreading capability of parts of code, by looking at dependencies, and by having language constructs that indicate multithreading capabilities in source code that looks as if it doesn't have it.
IDE features:
* Project-specific packages with relative paths, enabling easy branching of packages.
Wednesday, 24 October 2007
Back from today's DAPUG meeting
I've just arrived back from today's DAPUG meeting (the Danish CodeGear user group). It was very good - focus was on telling news from various conferences, QA/QC and various development techniques. It is clear that CodeGear is gaining momentum again and regaining their competitive edge, and Delphi 2007 has very positive feedback from those, who have upgraded. Also, it was obvious, that we had some topics, on which we can spend some more time, so I hope that we will see another DAPUG meeting in january.
Sunday, 21 October 2007
Microsoft Tech Fest in Denmark is also for Delphi businesses
Microsoft has traditionally bought companies that use Microsoft technology, but according to this interview, Ballmer said "We will do some buying of companies that are built around open-source products". So I guess they're not restricted in any way to Microsoft development tools. If you have an innovative company with buyout as possible exit strategy, the Microsoft Tech Fest may be for you.
Saturday, 20 October 2007
Multi core CPUs - what does it mean to Delphi?
It seems as if the entire IT industry agrees, that more performance in PCs will be achieved by using more cores in CPUs. Is it true, and what are the implications for Delphi programmers?
Let's start with the assumption about multiple cores. In order to match CPU I/O count performance in a modern PC, you would have to make more than 50 harddisks work in parallel. RAM also has its performance limits, especially as core count grows, so multiple cores are really more about CPU-bound calculations than about anything else. The harddisk problem can be solved by using network attached storage with arrays of harddisks, and the RAM limits can be solved by letting each CPU have its own RAM (like NUMA or other architectures). These solutions are obviously good for servers, which have many concurrent requests, where each request can be served by one thread. This doesn't have much influence on how we write source code, and Delphi does the job very well.
If you want greater performance on a desktop PC, we're usually talking games and simulations or data handling software (database applications). I'm not much into games and simulations and will focus on client/server database applications. The typical performance problems with database applications are about data retrieval over the network, sorting, lookups and filtering. If you have a performance problem in such an application, you will usually not want it to be 2 times faster, but 10 times faster. Given the architectural problems in a desktop PC, this is not possible by using multithreading.
What is needed, is to minimize the time from a user action until something happens. You can do this by caching data, doing things in the background, and by preparing data for user actions that you expect to happen. Multithreading is a very good tool for doing this, so enabling multithreading in applications is something we need - but it's not because the CPUs are going to be multi-core. There are many ways that CodeGear can make Delphi support these techniques, either using technologies like MIDAS, or by modifying the language slightly. But there is no reason to fundamentally change the programming language.
Let's start with the assumption about multiple cores. In order to match CPU I/O count performance in a modern PC, you would have to make more than 50 harddisks work in parallel. RAM also has its performance limits, especially as core count grows, so multiple cores are really more about CPU-bound calculations than about anything else. The harddisk problem can be solved by using network attached storage with arrays of harddisks, and the RAM limits can be solved by letting each CPU have its own RAM (like NUMA or other architectures). These solutions are obviously good for servers, which have many concurrent requests, where each request can be served by one thread. This doesn't have much influence on how we write source code, and Delphi does the job very well.
If you want greater performance on a desktop PC, we're usually talking games and simulations or data handling software (database applications). I'm not much into games and simulations and will focus on client/server database applications. The typical performance problems with database applications are about data retrieval over the network, sorting, lookups and filtering. If you have a performance problem in such an application, you will usually not want it to be 2 times faster, but 10 times faster. Given the architectural problems in a desktop PC, this is not possible by using multithreading.
What is needed, is to minimize the time from a user action until something happens. You can do this by caching data, doing things in the background, and by preparing data for user actions that you expect to happen. Multithreading is a very good tool for doing this, so enabling multithreading in applications is something we need - but it's not because the CPUs are going to be multi-core. There are many ways that CodeGear can make Delphi support these techniques, either using technologies like MIDAS, or by modifying the language slightly. But there is no reason to fundamentally change the programming language.
Friday, 19 October 2007
CMMI, Six Sigma, Agile, ...
When programmers are told to spend time on various programming techniques and software development management techniques, it removes focus from the actual programming. Managers tend to impose too many techniques and methods, and programmers try not to spend too much time on them.
One of the problems is, that the brain has a limited capacity. If you're doing something difficult, it doesn't help you if somebody tells you to do it in a way that's even more difficult. You may think you have to compromise: Do you want to use brainpower on programming techniques or on programming?
The answer is different: You want to spend your team's brains on solving the customer's problems, and reduce the amount of brainpower needed to do programming and programming techniques.
One of the problems is, that the brain has a limited capacity. If you're doing something difficult, it doesn't help you if somebody tells you to do it in a way that's even more difficult. You may think you have to compromise: Do you want to use brainpower on programming techniques or on programming?
The answer is different: You want to spend your team's brains on solving the customer's problems, and reduce the amount of brainpower needed to do programming and programming techniques.
Wednesday, 17 October 2007
How to become a manager
I meet a lot of programmers who want to become managers. However, their motive is not good - they want to become managers because they have specific ideas they want to do in programming.
If you want to become a good manager, then you need to make sure first, that your are genuinely interested in administrative work. Your value will be measured by the work of other programmers, so your own personal programming skills are really not that important, as long as you are able to keep the team capable and focused on what other parts of your company think is valuable.
If you want to become a good manager, then you need to make sure first, that your are genuinely interested in administrative work. Your value will be measured by the work of other programmers, so your own personal programming skills are really not that important, as long as you are able to keep the team capable and focused on what other parts of your company think is valuable.
Monday, 15 October 2007
Which character set?
There are many reasons to choose a particular character set. Some believe that choosing widestring everywhere solves all problems. It doesn't. Widestring is slow because it needs to be Microsoft BSTR compatible, and it's also complicated. A widestring can contain UCS-2 encoding and UTF-16 encoding, and with UTF-16, you can have 4 bytes per character.
One of the problems is the Unicode standard. It allows a character to be built in more than one way. There is no one-to-one match between the binary representation and the look of a character. Therefore, if you want to do Unicode, text handling becomes complicated, no matter what you do.
A good way to choose a character set, is by performance and compatibility. How much space does it use, and is it compatible with other software systems. UTF-8 uses very little space if most characters are ASCII characters. If you add some ISO-8859-1 characters, like in many west european languages, it's still very compact. It only gets less optimal than other character sets, if you want to encode chinese, japanese and other languages that don't use ASCII characters a lot.
UTF-8 is also very compatible with other systems. It's a defacto standard for XML files, it only exists in one version (unlike UTF-16 with exists as UTF-16BE and UTF-16LE), and it encodes 31 bit, much more than UTF-16's 20 bit. UTF-8 is also compatible with zip filename encoding (unlike UTF-16 and UCS-2 which is not), and UTF-8 texts can be handled by many applications that were not originally designed to do so.
Linux already installs with UTF-8 as default, for most distributions and locales. This makes it possible to zip files in Moscow, send them to Copenhagen, unzip them, and all filenames are preserved. This doesn't work on Windows.
Delphi, being a Windows tool, uses Windows 8-bit and 16-bit character sets by default, in ansistrings and widestrings. There's also an utf8string, but it's actually the same as an ansistring. You can convert from widestring to utf8 and back using utf8encode() and utf8decode().
If you store and transmit unicode information using utf-8, most of you will experience a reduction in space usage and a reduction in transmission time.
One of the very nice features of utf-8 is the ability to be autodetected. Utf-8 does not allow all possible bit combinations, and the bit combinations that are being used, are usually extremely unlikely in other 8-bit encodings. For 99% of all applications, it is safe to apply autodetection to utf-8.
One of the problems is the Unicode standard. It allows a character to be built in more than one way. There is no one-to-one match between the binary representation and the look of a character. Therefore, if you want to do Unicode, text handling becomes complicated, no matter what you do.
A good way to choose a character set, is by performance and compatibility. How much space does it use, and is it compatible with other software systems. UTF-8 uses very little space if most characters are ASCII characters. If you add some ISO-8859-1 characters, like in many west european languages, it's still very compact. It only gets less optimal than other character sets, if you want to encode chinese, japanese and other languages that don't use ASCII characters a lot.
UTF-8 is also very compatible with other systems. It's a defacto standard for XML files, it only exists in one version (unlike UTF-16 with exists as UTF-16BE and UTF-16LE), and it encodes 31 bit, much more than UTF-16's 20 bit. UTF-8 is also compatible with zip filename encoding (unlike UTF-16 and UCS-2 which is not), and UTF-8 texts can be handled by many applications that were not originally designed to do so.
Linux already installs with UTF-8 as default, for most distributions and locales. This makes it possible to zip files in Moscow, send them to Copenhagen, unzip them, and all filenames are preserved. This doesn't work on Windows.
Delphi, being a Windows tool, uses Windows 8-bit and 16-bit character sets by default, in ansistrings and widestrings. There's also an utf8string, but it's actually the same as an ansistring. You can convert from widestring to utf8 and back using utf8encode() and utf8decode().
If you store and transmit unicode information using utf-8, most of you will experience a reduction in space usage and a reduction in transmission time.
One of the very nice features of utf-8 is the ability to be autodetected. Utf-8 does not allow all possible bit combinations, and the bit combinations that are being used, are usually extremely unlikely in other 8-bit encodings. For 99% of all applications, it is safe to apply autodetection to utf-8.
Saturday, 13 October 2007
Why Delphi?
I have seen many explanations on why to use a specific software development tool, and I was actually planning to write a post on why we have chosen Delphi as our main software development tool. There are many good reasons to list, however, after attending a number of conferences, and hearing about several companies who switched tools years into the project, from C++ to .net, from .net to Java and others, I need to write it in a different way.
The choice of tool depends on the problem that you try to solve. You need to ask yourself, what the problem is. If you're doing commercial software development, the problem is usually related to money. You need to ask yourself: Which tool provides most value to the company, in the short run, medium term and in the long run.
That question seems easy, but it isn't. If this question is answered without reading the business plan or involving business strategies, you made a mistake. If the customer's weren't involved in providing information to make the decision, you made a mistake. And once you made a decision, make sure to test this using a Devil's advocate, which is not a software developer. It doesn't need to take a lot of time, but it needs to be done right.
I have one important tip: before evaluating how a tool matches your problems, analyze the tool's features, and sort them into the categories: Critical for solving this problem / nice to have / don't use. The last one is important: it should contain all the features that are in conflict with your business plan, for instance by locking into something you don't want to be locked into, or by storing vital business information in bad ways.
The choice of tool depends on the problem that you try to solve. You need to ask yourself, what the problem is. If you're doing commercial software development, the problem is usually related to money. You need to ask yourself: Which tool provides most value to the company, in the short run, medium term and in the long run.
That question seems easy, but it isn't. If this question is answered without reading the business plan or involving business strategies, you made a mistake. If the customer's weren't involved in providing information to make the decision, you made a mistake. And once you made a decision, make sure to test this using a Devil's advocate, which is not a software developer. It doesn't need to take a lot of time, but it needs to be done right.
I have one important tip: before evaluating how a tool matches your problems, analyze the tool's features, and sort them into the categories: Critical for solving this problem / nice to have / don't use. The last one is important: it should contain all the features that are in conflict with your business plan, for instance by locking into something you don't want to be locked into, or by storing vital business information in bad ways.
Wednesday, 10 October 2007
Nohau QA/QC seminar review
I just came back from Nohau's seminar about QA/QC, and it was a good experience. It's nice to attend a seminar, that's free and not just a presentation of software, but also tries to raise the general knowledge level of the audience. If you haven't introduced QA/QC measures in your organization, yet, you should seriously be considering doing that.
I noticed, that most speakers were asserting that good quality requires the use of agile software development techniques. It's nice to see the software development business maturing.
I noticed, that most speakers were asserting that good quality requires the use of agile software development techniques. It's nice to see the software development business maturing.
Subscribe to:
Posts (Atom)