Wednesday 21 November 2007

Date and Time in programming

Sometimes I wonder if non-programmers know how complicated time is. Let's have a resume of the basics:
  • There are 23, 24 or 25 hours on one day (daylight saving time change days)
  • An hour has 60 minutes.
  • A minute has 60, 61 or 62 seconds (leap seconds).
  • A day has 1380, 1440 or 1500 minutes.
  • A week has 7 days.
  • A day has 82800, 86400, 86401, 86402 or 90000 seconds.
  • A month has 28, 29, 30 or 31 days. In some systems, a month is standardized to be 30 days.
  • A year has 365 or 366 days, but in some systems, it's standardized to be 360 days.
  • A year has 52 or 53 weeks.
  • Even though we have an ISO standard for weeks, end users don't agree on the starting weekday for a week.
  • Some dates don't exist, and for historical dates, the offset between different geographical regions was not about hours, but about days. The russian October Revolution actually happened in November, according to most European calendars of that time, but it was October in Russia.
  • This is all about the Christian calendar. There are other calendars out there...
Then there's local time and UTC time:
  • Local time deviates from UTC time in a number of hours, which can be fractional in some rare cases
  • For all practical purposes, GMT is the same as UTC, and GMT is not the local time of London (London uses GMT+1 in the summer).
  • UTC time offset is a function that takes the UTC time stamp and geographical location as parameter, and UTC time offset is often historically different for two different cities in the same country.
  • A time zone can be specifed using a GMT offset, just like time stamps.
  • A time zone can include several regions with different daylight saving rules.
This wouldn't be so complicated if we didn't have to make calculations based on this. Timestamps are usually stored these ways:
  • A floating point number indicating the number of days since a specific date at midnight
  • An integer or floating point number indicating the number of seconds since a specific date at midnight
  • Year, Month, Day, Hour, Minutes, Seconds as separate values
Variations of these may occur, for instance, a time stamp may be an integer number of milliseconds or even microseconds, instead of seconds, but it's still the same idea.

UTC time offsets are usually specified this way:
  • Number of hours difference between the local time and UTC, at the time of the timestamp
  • Geographic location (like 'Europe/Copenhagen')
  • Time zone (like 'CET')
Daylight saving is usually handled these ways:
  • The internal clock of the PC uses local time, and changes. If you use virtualization or dual-boot, you may risk that it changes twice, giving incorrect time.
  • The internal clock of the PC uses UTC time, and does not change.
  • Daylight saving time is usually handled using locally stored information, which may get outdated, so that the computer actually miscalculates the time by one hour.
Leap seconds are usually handled these ways:
  • They're implemented nicely, and the software needs to know about them.
  • They're not implemented, so the software doesn't need to know about them, but instead, the clock is adjusted, and the software needs to be capable of handling a clock that doesn't move for 1-2 seconds.
Clock precision:
  • Most PCs today have some kind of clock synchronization over the internet, which yields a sub-second precision. However, don't count on your clock to be 1ms precise.
  • PCs often have their clocks adjusted, so that you need to make sure that your software can survive a clock, that moves backwards.
Now, how do we calculate age? If you were born on february 29th 1980, and an election for parliament is held on february 28th 1998, are you allowed to vote? Probably not. What if something has to be done on a day, that may not be later than on your birthday? Then february 28th would be the last day. So you cannot use a GetBirthDay(BirthDate,Age) function for these two cases, since those two problems result in two different dates.

What about statistics? Here you have a lot of other problems:
  • Total numbers per month don't make sense for February, which changes it's length every 4 years.
  • Numbers per day, for a month usually don't make sense either, because the number of weekend days in a month is varying, and numbers often depend on weekdays.
Other programming problems
  • Comparing. If you have two timestamps from different sources, you need to define many things to be able to say timestamp1=timestamp2.
  • Round-off errors often mean, that timestamp1+timeinterval-timeinterval<>timestamp1. When you have deadlines, it can become very tricky to decide, if a deadline has been reached, or not.
  • Uniqueness of timestamps: Some programmers want to use timestamps as primary key. Some database systems even support that, but what happens when you transport these timestamps to other parts of your software, will they still be unique?
The worst problem is the specification of timezones and daylight saving time:
  • Many people don't understand the difference between GMT offsets for time zones and time stamps. Example: In Denmark, which is located in the GMT+1 time zone, we use GMT+2 time stamps during the summer.
  • Terms like "CET" have many meanings, depending on who you ask. It can be the time in Germany in winter, it can be the current time in Germany, and it can describe the time zone which includes France and Germany, which did not have the same daylight saving time rules 30 years ago. In case of the "CET time zone", a historical timestamp may be useless without knowing if it applies to a location in France or Germany.
Is that it? No. Here's the absolutely biggest problem: When programmers give names to variables and functions, they don't precisely describe what they do. In Delphi, there is a constant named SysUtils.MinsPerDay. It is defined as MinsPerDay=24*60. Does that make sense? For some programs, yes. For others, definitely not.

In Windows, the standard functions to convert between local time and UTC time do not handle historical timestamps well, but documentation isn't good at explaining that. On Windows, you should always make days 24 hours and 86400 seconds, and always use local time, unless you really know what you're doing.

Here are some examples of bad time related functionality:
  • DateTime.IsDaylightSavingTime Method - The text says "Indicates whether this instance of DateTime is within the Daylight Saving Time range for the current time zone.", but what if there are two different daylight saving time ranges for the time zone? The problem is, that the documentation sets time zone = daylight saving time rule set.
  • GetDynamicTimeZoneInformation - The information returned by this information can be invalid just after returning. This function basically doesn't make sense, it should have taken a time stamp as parameter.
  • TimeZone.GetUtcOffset Method - It calculates the UTC offset from the local time. However, once a year, the same local time repeats itself for one hour, with two different UTC offsets. So this method doesn't make sense.
There are many more examples out there.

Linux works very differently - it counts seconds everywhere and uses geographic locations for GMT offset calculcations. This works extremely well, and Linux only converts to year/month/day representations when interacting with the user. However, even though linux can support leap seconds, most apps probably won't work well if you enable it.

This blog post doesn't cover all the kinds of trouble we programmers face, there's much more. My advice is to try to keep things simple and prepare for the worst.

7 comments:

HeartWare said...

Personally, I am annoyed at Microsoft's inability to handle DST properly.

Twice a year, my synchronizers runs amok because all files' time stamps suddenly change by one hour. How come a file created at 10:00 on May 3rd suddenly was created at 11:00 (or is it 09:00) on May 3rd? That annoys the **** out of me...

I have been forced to make specialized code when comparing time stamps on files (to see if they should be updated) and ignore the difference if the difference is exactly one hour.

Files' time stamps should NOT change just because we enter/leave DST. The time the file was created/modified hasn't changed, so why has the time stamp?

Anonymous said...

Wasn't it nice to be able to use the hour and minute of a file to show the major and minor version numbers... Even Borland used to do it. Certainly it doesn't work any more at least not reliably due to fat32 vs ntfs and most of all XP and vista vs older win32 versions.

Anonymous said...

This is GREAT. I plan on sending this link to several of my customers so they understand how hard it is to do "easy" things. Thanks.

Anonymous said...

I have read this with great interest. Thank.

Stuart Kelly said...

Great article.

Anonymous said...

Nice article. However, there are some (minor) mistakes, and there is a (much too) long list of missing "features" in the wonderful world of time handling, as you noted at the end.

Some of the strangest I know of:
— There is no such thing as a 62-second minute (it was an error done while developping Posix, back in 1986); on the other hand, we could see 59-second minutes, if Earth continue to speed up its rotation as it does...

— There have been half-hour DST, so in addition to 23-, 24- and 25-hour, we can have 23.5- and 24.5-hour days, etc.

— in the industrial world (factories), we usually stick to weeks (of 7-day). But sometimes we want to also align to year boundaries; that is creating weird "incomplete" week, numbered 00 or 53.

— GMT (Greenwich mean time) is really UT1+12, since astromers (which designed it) count day duration from noon (when themselves sleep), not from midnight

— Everyday more often, timestamps are stored as both an indication of elapsed duration (as you describe) and some indication of a timezone; at least that is what is now recommanded by the best standards in the area; what you are describing is called a floating timestamp; the indication of the reference timezone might be implied, for example "Unix time" implies UTC.

— 'Europe/Copenhagen' (a zoneinfo or Olson's database identifier) actually carries more information than just the geographic location; for example, it records that between 1894 and 1942, the clocks there were 50 min 20 sec ahead of GMT.

— API GetDynamicTimeZoneInformation with the year indication is called GetTimeZoneInformationForYear (and requires Vista SP1)

And finally, I wholeheartfully agree with heartware, about the "fix" Microsoft put in the NTFS file driver, *ing every timestamps twice a year, just because they wanted to align the observed behaviours with what happen on FAT volumes... The gory details are in http://support.microsoft.com/kb/158588, which you should mix with the fact that most other libraries out there (including MS's) does not follow the fix (which prevent compatibility with Windows 9x)

Lars D said...

Antoine, thank you for a very nice post. I never spent time on date/time storage on NTFS volumes, so your information on that is new to me.

However, I disagree about that there is no such thing as a 62 second minute. The reason is, that some time protocols support it. In other words, even though the earth rotation has never caused a minute to be 62 seconds, your application should support it if you want to claim compatibility with such a protocol.

Every time I design date and time into something, I create a model for how it will work. The model must specify whether leap hours, leap seconds etc. are supported, how change of PC time is supported, what happens when the client and the server's timestamps don't match etc. The model often ends up being very simple, like counting days with a floating point value or counting seconds the way a typical Linux distribution does it.