06.26.08
Posted in Technology, Linux, MySQL at 4:07 pm by Stoner
A bit of advice to anyone wanting to write an article on MySQL that includes setting up users: familiarize yourself with the concept of Least Privileges. That is, only grant those privileges absolutely necessary to do a job and nothing more. I just finished reading an article on how to set up RSyslog to log to a MySQL database. Halfway through the article is a listing showing the grant statement. I’ll share just the fun part:
grant ALL ON Syslog.* …
My first reaction when I see a “grant all” is to ask: why? Why does an application need every database privilege? Well, I finished the article, then went to the RSyslog web site and spent all of 4 minutes researching why the app needs so much privilege. As it turns out, it doesn’t. Right there in blank and white:
“It is sufficient to grant it INSERT privileges to the systemevents table, only.”
Whew. Okay. The app doesn’t need all those privileges. When I checked the author’s credentials, he presents himself as someone who manages firewalls, antispam and antivirus systems - someone you would hope would be security-conscious and aware of privileges. Apparently not - or at least, not when it comes to databases. I couldn’t find an email address to write to the author nor does the web site have a form to post feedback on articles so I can’t point this out nor the fact that he completely left out how to configure RSyslog to connect to MySQL.
<sigh>
Permalink
04.17.08
Posted in Technology, MySQL at 4:31 pm by Stoner
I was curious about some of the databases I get to work with. “How do my clients store their data? What data types are most prevalent?” Well, a few keystrokes later, I had my answers:
SELECT c.data_type, count(c.data_type) AS frequency
FROM information_schema.columns AS c
INNER JOIN information_schema.tables AS t
ON c.table_schema = t.table_schema AND
c.table_name = t.table_name
WHERE c.table_schema NOT IN ('information_schema','mysql') AND
t.table_type = 'base table'
GROUP BY data_type;
Which gave me a nice “data type distribution” table:
| data_type |
frequency |
| blob |
7 |
| char |
611 |
| date |
85 |
| datetime |
125 |
| decimal |
133 |
| double unsigned |
1 |
| enum |
677 |
| float |
5 |
| int |
2334 |
| mediumblob |
21 |
| mediumint |
3 |
| set |
9 |
| smallint |
7 |
| text |
57 |
| time |
551 |
| timestamp |
20 |
| tinyint |
6 |
| varchar |
946 |
Not content with just that, I whipped up another statement to show me the top 10 tables that used a particular data type (replace the WHATEVER with the data type you’re interested in) :
SELECT c.table_schema, c.table_name, count(c.data_type) AS count
FROM information_schema.columns c
INNER JOIN information_schema.tables AS t
ON c.table_schema = t.table_schema AND
c.table_name = t.table_name
WHERE c.table_schema NOT IN ('information_schema','mysql') AND
t.table_type = 'base table' AND
c.data_type = 'WHATEVER'
GROUP BY c.table_schema, c.table_name
ORDER BY count DESC
LIMIT 10;
Which showed me:
| table_schema |
table_name |
count |
| database_1 |
object_payment |
14 |
| database_2 |
object_payment |
12 |
| database_3 |
object_payment |
11 |
| database_2 |
object_space |
9 |
| database_1 |
object_space |
9 |
| database_1 |
aggregation_table_1 |
8 |
| database_2 |
aggregation_table_1 |
8 |
| database_2 |
aggregation_table_3 |
7 |
| database_1 |
aggregation_table_4 |
7 |
| database_1 |
object_user |
7 |
Finally, I wrote them up into stored procedure form (and included a 3rd that gives the data type distribution for specified databases) and committed them to my project on Sourceforge.
Permalink
03.18.08
Posted in Technology, MySQL at 8:48 pm by Stoner
I downloaded Maatkit 1877 and installed it on a data archive server, then did a parallel dump of the data. I also did a regular dump of the same data using mysqldump. This isn’t a benchmark or even a real comparison, just me playing with Maatkit to get a feel for it and see how it can make my life easier. First, some stats about the database:
mysql> select table_name, table_rows from information_schema.tables where table_schema = 'somedb';
+------------+------------+
| table_name | table_rows |
+------------+------------+
| table1 | 24758808 |
| table2 | 33713412 |
| table3 | 221633443 |
| table4 | 56869734 |
| table5 | 14017081 |
| table6 | 77559278 |
| table7 | 48453337 |
+--------+------------+
mysql> call database_size_profile('somedb')\G
*************************** 1. row ***************************
myisam_index_size: 15603748864
myisam_data_size: 14895580924
innodb_index_size: NULL
innodb_data_size: NULL
total_index_size: 15603748864
total_data_size: 14895580924
database_size: 30499329788
1 row in set (0.00 sec)
The dump using maatkit
$ mk-parallel-dump --tab --databases somedb --no-gzip
default: 7 tables, 7 chunks, 7 successes, 0 failures, 1239.59 wall-clock time, 3558.11 dump time
$ ls -l default/somedb/
-rw-rw-rw- 1286 Mar 18 18:25 table1.000000.sql
-rw-rw-rw- 847081157 Mar 18 18:28 table1.000000.txt
-rw-rw-rw- 1274 Mar 18 18:25 table2.000000.sql
-rw-rw-rw- 1131602870 Mar 18 18:29 table2.000000.txt
-rw-rw-rw- 1304 Mar 18 18:25 table3.000000.sql
-rw-rw-rw- 8508494974 Mar 18 18:45 table3.000000.txt
-rw-rw-rw- 1318 Mar 18 18:25 table4.000000.sql
-rw-rw-rw- 2327666926 Mar 18 18:32 table4.000000.txt
-rw-rw-rw- 1213 Mar 18 18:28 table5.000000.sql
-rw-rw-rw- 441408946 Mar 18 18:29 table5.000000.txt
-rw-rw-rw- 1431 Mar 18 18:29 table6.000000.sql
-rw-rw-rw- 4001690723 Mar 18 18:45 table6.000000.txt
-rw-rw-rw- 1339 Mar 18 18:29 table7.000000.sql
-rw-rw-rw- 1625718296 Mar 18 18:35 table7.000000.txt
$ du -sh default/
18G default/
The dump using mysqldump.
$ mysqldump --user=joe -p --routines --trigger --quick --complete-insert --master-data=2 somedb > somedb.18mar08.sql
Elapsed time to dump: 2676
$ ls -l
-rw-r--r-- 21905573763 Mar 18 19:49 somedb.18mar08.sql
The server has two dual-core athlons and 8 gigs of ram, reading from a fiber-attached SAN and writing to local SCSI drives (RAID-1.) I haven’t performed a restore yet. That’s next but not tonight. I have a code push to deal with first.
Permalink
03.07.08
Posted in Technology, MySQL at 8:52 pm by Stoner
I learned something new about myisamchk. When doing a recovery of the data file, it creates not one but two temporary files. I knew about the TMD it creates in $datadir when rebuilding the MYD file but it creates a second temporary file using an old and rather annoying technique (which involves creating a file using the standard C ‘open()’ call, then deleting the file without closing the file descriptor.) I discovered this when myisamchk told me “Disk is full writing ‘/tmp/STQ3p9UF’ (Errcode: 28).” I looked in /tmp and didn’t see that file. So I did a
# lsof | grep delete
and low-and-behold, there it was. You know, as a sys admin, that really annoys me. I don’t care that it creates temporary files to do its thing. In fact, I expect it, but I at least want to know about it so I can plan accordingly. My /tmp partition only had 2.5 gigs of space and the table’s MYD was over 4 gigs in size. If I had known about this extra temp file, I would have used –tmpdir=/some/place/else before I started and not had myisamchk crap out on me in the first place.
Permalink
02.26.08
Posted in Technology, Linux, MySQL at 5:49 pm by Stoner
At $WORK we typically use Redhat Cluster to make MySQL highly available. We have a number of 2-node clusters (active/passive) and it works nicely, for the most part. One thing I find very annoying is RHCS has no notification framework to notify you when a service is relocated to another node (because of failure or otherwise.)
In a perfect world, the only reason MySQL would pass between nodes is for server maintenance, under the control of a human. In reality, crap happens. Redhat Cluster manages all this, but I still want to know that a failover happened (is hardware going bad? did a software bug cause a failure? etc.)
Most monitoring tools weren’t designed to report on a clustered service. I can add a check that connects to MySQL using the cluster-managed IP - but that only tells me if MySQL has failed completely. The failover between nodes generally takes under a minute so it’s easy for that type of monitoring check to miss a failover. Parsing syslog output is less than optimal because syslog is unstructured text. Blech. Process and disk checks can’t be used on each node because they’ll only exist on one node, the active node.
My current solution is a bit of a kludge but it works. We configure MySQL to use standard names for binary logs and error logs (they never change, regardless of which node the database is running on.) The only file name that isn’t standardized is the pid file, which defaults to the hostname of the server running MySQL. So, with that bit of info, I wrote a script that connects to MySQL using the cluster-managed IP and issues the query “show variables like ‘pid_file’” and parses the result. It then compares the hostname against a stored value in a file, which contains the hostname of the current active node. If the hostname changed, then MySQL has changed nodes (whether due to failure or manual relocation.) This changing of hostnames is considered an error condition and the monitoring system sends alerts appropriately. The NOC staff consult procedures to determine the level of failure and call people if necessary. Clearing the error requires editing the file and setting the current hostname as the active node.
What would be nice would be some informational functions in MySQL to give a bit more data about the server itself, things like hostname, IP address MySQL is bound to, port, etc. Then I could use “select hostname()” or similar instead of relying on a file name. Before someone says “you connected to the database so you know all that already,” not all of this information is known to a database connection, particularly if your application uses a database connection pool or the connection goes through a proxy, masking the real database.
Permalink
02.21.08
Posted in Technology, MySQL at 11:49 pm by Stoner
Jay Pipes blogged about this earlier this month. I read his post, nodded a knowing nod and finished my cup of coffee. It wasn’t until yesterday when it really struck me. I was looking through the database schema of a popular forum software package (won’t say which) when I came across this column definition in a central table (by ‘central’ I mean one that typically holds a lot of rows):
ip varchar(100),
One hundred characters to store a at-most 15-character IPv4 string? I know what you’re about to say, “But, it’s a varchar so you’re not really using 100 bytes.” Well, you *might* not be. Jay’s discussion on how MySQL uses temporary tables illustrates how you can use all 100 bytes in memory - not a good thing.
So, the minimum number of bytes you will be using is 7 if each octet is a single digit (don’t forget the ‘.’ separators.) At most, you use 15 bytes. If you store the IP as an unsigned integer, you use, at most, 4 bytes per row. Those bytes really start to add up, especially when pulling the data into memory to satisfy queries.
But wait…what if it stores the fully qualified domain name? I didn’t dive into the PHP to see if it stores IP addresses or hostname or both in this column. Is 100 characters enough? Probably. Some of the hostnames from the major ISPs are decently long but I don’t think I’ve seen one that’s 100 characters. I did set up a site for a client that was 61 characters long (thank goodness for bookmarks.)
This begs the question: is it better to store the IP address or hostname…or both? Resolving an IP to hostname or vice versa is an “expensive” process, whether you do it in your code at the time of storage or as a batch process later on. This is a question for another day.
Where does all this lead? You need to clearly define the data you want to store, then be frugal with the data type used to store it.
Permalink
01.29.08
Posted in Technology, MySQL at 9:35 pm by Stoner
Finally got around to creating a project on Sourceforge: MyHelper. Right now, it’s just some stored procedures and functions for determining data and index sizes for databases using the MyISAM and InnoDB engines. Each one is individually licensed (using the BSD license.) I have 2 more stored procedures to add (return a list of tables with no primary key and a list of tables with the primary key) but I haven’t committed my changes to Subversion yet because I’m still testing them. I’ll be adding more code as I refine the stuff I’ve written over the years (their current state is highly environment-specific so I need to “genericise” them - if that’s a word.)
A word about licensing. I’m opting to license each script/procedure/function individually. This is because the project, as a whole, is made up of individual pieces that may or may not have a dependency on each other and the project is not a “software package” that you install and use. Each script, stored procedure or function may stand on its own and you have the option to use what you want. It’s an ala carte repository of stuff. Use what you want, leave behind what you don’t need. Putting a license on the whole package didn’t make sense given this distribution model so each has its own license.
So, if you develop with MySQL or manage MySQL databases, check ‘em out and let me know what you think.
Permalink
08.17.07
Posted in Technology, MySQL at 2:32 pm by Stoner
MySQL has a unique feature whereby you can select the storage engine for each table. If you don’t know what a storage engine is, it is responsible for managing the physical data on disk - from data files and indexes to the memory buffers used to access the data and the locking needed to prevent multiple people from clobbering each other’s changes. This is both a boon and a bane for Systems Administrators and DBAs.
It’s a great feature because developers, Sys Admins and DBAs can select the engine with the properties they require. For example, the InnoDB engine supports ACID-compliant transactions, foreign keys and row-level locking. The MyISAM engine, while being lightning fast on reads and supporting concurrent inserts, does not have ACID-compliant transactions or foreign keys and implements table-level locking. Other engines are available, such as memory-based tables, clustered tables, federated tables (reference a table on server B from server A) and more. This level of flexibility can’t be found in any other database server.
This flexibility comes with a stiff price, though. If you mix engines in a single database then you must be aware of several limitations:
- Foreign keys support in the InnoDB engine does not extend to other engines, meaning you can’t have a InnoDB-based table define a foreign key on a MyISAM-based table.
- Likewise, transaction support in the InnoDB engine does not extend to tables defined using another engine.
- Some buffers are engine-specific so you may need more memory than if you used a single engine for all tables.
Permalink
07.13.07
Posted in Technology, Life, MySQL at 12:04 pm by Stoner
Ok…I’ve been crazy busy with job and home stuff. Doesn’t mean I stop thinking about stuff.
- Do we really need another text editor/Personal Information Manager/MP3 player? I monitor Gnomefiles and Portable Freeware RSS feeds and I’m constantly seeing these types of apps being created, promising to be a better than anything out there with whiz-bang features not seen anywhere else. Got news for you, you aint showing us nuttin’ new.
- The First Bank of Delaware needs to DIAF for offering loans at 99.25% interest. That’s right, 99.25%! It’s done through their CashCall marketing arm. Yes. The one offered up by has-been Gary Coleman. They’re preying on people who can’t manage their money.
- MySQL-Proxy is damn cool.
- MySQL-Proxy combined with memcached rocks da bomb!
- Managing hundreds of servers with Microsoft Office documents and Microsoft Sharepoint portal totally sucks nuts. I’d rather slash my wrists with a dull blade, then pour salt in the wounds.
Permalink
06.20.07
Posted in Technology, Life, MySQL at 11:09 pm by Stoner
Been a while since I wrote anything…not for lack of anything to write, just real busy. I’m finding myself having to force myself away from the computer to maintain my sanity.
On the home front, I’m going to be installing a ceiling fan in the only bedroom that doesn’t have one. I had to go buy a metal junction box because all the boxes in this house are the blue plastic ones. That’s fine and dandy but I’m not going to hang a ceiling fan from one. Don’t know if I’ll need it but I also bought a brace so I can mount the box to a rafter if it isn’t close enough to one already. After that, I’m going to start ripping down the ceiling in the basement in preparation for remodeling it. It’ll be a nice mess.
Then I’m off to California for a week - business related. I fly out Sunday morning at 7:30 and fly back Saturday.
Attended a MySQL webinar (man, do I hate that word) on replication and scaling out MySQL. I’m very familiar with replication but wanted to see what the MySQL folks are up to. Also got approval (and advance payment!!!) for MySQL training in Washington, D.C. in July. I don’t have to fork over a couple grand and submit an expense report after - which is nice.
Someone implemented an idea I had been working on - namely backing up MySQL binary logs on a MySQL replication master server. I had a shell of such a script written in Perl - this guy did it in bash. Six of one/half a dozen of the other - I can just reuse his code and not reinvent the wheel. Open Source is beautiful that way.
Found two scripts for making practical sense of the MySQL “show status” command. Problem is, they generate flat, unparsable text and don’t retain the info for later trending. Pity. I’ve started work on that project so hopefully I can start to get a better understanding of how our MySQL databases are performing over time.
Permalink