02.21.08

Use your database, don’t abuse it

Posted in Technology, MySQL at 11:49 pm by Stoner

Jay Pipes blogged about this earlier this month. I read his post, nodded a knowing nod and finished my cup of coffee. It wasn’t until yesterday when it really struck me. I was looking through the database schema of a popular forum software package (won’t say which) when I came across this column definition in a central table (by ‘central’ I mean one that typically holds a lot of rows):

ip varchar(100),

One hundred characters to store a at-most 15-character IPv4 string? I know what you’re about to say, “But, it’s a varchar so you’re not really using 100 bytes.” Well, you *might* not be. Jay’s discussion on how MySQL uses temporary tables illustrates how you can use all 100 bytes in memory - not a good thing.

So, the minimum number of bytes you will be using is 7 if each octet is a single digit (don’t forget the ‘.’ separators.) At most, you use 15 bytes. If you store the IP as an unsigned integer, you use, at most, 4 bytes per row. Those bytes really start to add up, especially when pulling the data into memory to satisfy queries.

But wait…what if it stores the fully qualified domain name? I didn’t dive into the PHP to see if it stores IP addresses or hostname or both in this column. Is 100 characters enough? Probably. Some of the hostnames from the major ISPs are decently long but I don’t think I’ve seen one that’s 100 characters. I did set up a site for a client that was 61 characters long (thank goodness for bookmarks.)

This begs the question: is it better to store the IP address or hostname…or both? Resolving an IP to hostname or vice versa is an “expensive” process, whether you do it in your code at the time of storage or as a batch process later on. This is a question for another day.

Where does all this lead? You need to clearly define the data you want to store, then be frugal with the data type used to store it.