04.17
I was curious about some of the databases I get to work with. “How do my clients store their data? What data types are most prevalent?” Well, a few keystrokes later, I had my answers:
SELECT c.data_type, count(c.data_type) AS frequency
FROM information_schema.columns AS c
INNER JOIN information_schema.tables AS t
ON c.table_schema = t.table_schema AND
c.table_name = t.table_name
WHERE c.table_schema NOT IN ('information_schema','mysql') AND
t.table_type = 'base table'
GROUP BY data_type;
Which gave me a nice “data type distribution” table:
| data_type | frequency |
|---|---|
| blob | 7 |
| char | 611 |
| date | 85 |
| datetime | 125 |
| decimal | 133 |
| double unsigned | 1 |
| enum | 677 |
| float | 5 |
| int | 2334 |
| mediumblob | 21 |
| mediumint | 3 |
| set | 9 |
| smallint | 7 |
| text | 57 |
| time | 551 |
| timestamp | 20 |
| tinyint | 6 |
| varchar | 946 |
Not content with just that, I whipped up another statement to show me the top 10 tables that used a particular data type (replace the WHATEVER with the data type you’re interested in) :
SELECT c.table_schema, c.table_name, count(c.data_type) AS count
FROM information_schema.columns c
INNER JOIN information_schema.tables AS t
ON c.table_schema = t.table_schema AND
c.table_name = t.table_name
WHERE c.table_schema NOT IN ('information_schema','mysql') AND
t.table_type = 'base table' AND
c.data_type = 'WHATEVER'
GROUP BY c.table_schema, c.table_name
ORDER BY count DESC
LIMIT 10;
Which showed me:
| table_schema | table_name | count |
|---|---|---|
| database_1 | object_payment | 14 |
| database_2 | object_payment | 12 |
| database_3 | object_payment | 11 |
| database_2 | object_space | 9 |
| database_1 | object_space | 9 |
| database_1 | aggregation_table_1 | 8 |
| database_2 | aggregation_table_1 | 8 |
| database_2 | aggregation_table_3 | 7 |
| database_1 | aggregation_table_4 | 7 |
| database_1 | object_user | 7 |
Finally, I wrote them up into stored procedure form (and included a 3rd that gives the data type distribution for specified databases) and committed them to my project on Sourceforge.
In the first query, you have a typo – at the end of the WHERE, it should be t.table_type, not c.table_type.
That’s what I get for trying to do too many things at the same time. Thanks for pointing it out.