It makes sense to apply GUIDs when:
- No specific order is required
- 128 bit is not considered a waste of space
- A very small chance of not succeeding to pick a unique number is ok
- Values cannot be produced in one place, or having no specific order is a feature
However, it comes at a price. There is a big chance, that two records, that are added shortly after each other, are related. For instance, if you want to save an invoice, there may be 5 records that describe items on that invoice, which are added as part of the same transaction. If a database server uses autoincrementing integer values as primary key, and fully or partially physically sorts records by this primary key, these 5 records will probably go into 1 or 2 places on the harddisk. If GUIDs were used, they would be stored in 5 different places on the harddisk. This is one of the reasons why GUID-based databases are usually on servers that have more RAM than they have data - they need to cache everything.
Another price comes when debugging. You need more IQ to debug code than to write code, so it is important that you optimize for debugging. It must be easy to see, that the data stored in the database is correct. GUIDs are not always the easiest key to read, especially not in developer databases, that tend to have very little data, and therefore very small numbers in autoincrementing integer fields.
2 comments:
Physical arrangement on the disk depends on your clustered index, which doesn't have to be on the primary key. Cluster it by createdate, and records inserted close together in time will be close on the disk, no matter what your primary key is.
This isn't necessarily what you want...if you have a lot of inserts, it can be better to cluster by something else so you don't get a "hotspot." In fact, in that situation a cluster on a guid might even be a good idea.
Generally tho, if there is any column for which you typically extract a range of data, use that for your clustered index.
If you don't cluster by GUIDs, but want to look up records by GUIDs, then the database will need to make a separate index for the GUIDs, and in that index, the GUID values will be sorted by GUID values.
If you want to look up 5 different GUID values in a secondary index, the harddisk will need to find these in 5 different places. So even if the actual records are located together, you didn't solve the problem, you just added extra lookups and made everything slower.
Post a Comment